ibuf can be reused for multiple iterations of the loop. Specifically:
deflate() overwrites s.avail_in to show how much of the input buffer
has not been processed yet - and sometimes leaves 'avail_in > 0', in
which case ibuf will be processed again during the loop's subsequent
iteration.
But if we declare ibuf within the loop, then (in theory) we get a new
(and uninitialised) buffer for every iteration. In practice, my compiler
seems to resue the same buffer - meaning that this code does work - but
it doesn't seem safe to rely on this behaviour. MSAN correctly catches
this issue - as soon as we hit the 's.avail_in > 0' condition, we end up
reading from what seems to be uninitialised memory.
Therefore, we move ibuf out of the loop, making this reuse safe.
See MSAN output from t1050-large below - the interesting part is the
ibuf creation at the end, although there's a lot of indirection before
we reach the read from unitialised memory:
==11294==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7f75db58fb1c in crc32_little crc32.c:283:9
#1 0x7f75db58d5b3 in crc32_z crc32.c:220:20
#2 0x7f75db59668c in crc32 crc32.c:242:12
#3 0x8c94f8 in hashwrite csum-file.c:101:15
#4 0x825faf in stream_to_pack bulk-checkin.c:154:5
#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#7 0xa7cff2 in index_stream object-file.c:2234:9
#8 0xa7bff7 in index_fd object-file.c:2256:9
#9 0xa7d22d in index_path object-file.c:2274:7
#10 0xb3c8c9 in add_to_index read-cache.c:802:7
#11 0xb3e039 in add_file_to_index read-cache.c:835:9
#12 0x4a99c3 in add_files add.c:458:7
#13 0x4a7276 in cmd_add add.c:670:18
#14 0x4a1e76 in run_builtin git.c:461:11
#15 0x49e1e7 in handle_builtin git.c:714:3
#16 0x4a0c08 in run_argv git.c:781:4
#17 0x49d5a8 in cmd_main git.c:912:19
#18 0x7974da in main common-main.c:52:11
#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#20 0x421bd9 in _start start.S:120
Uninitialized value was stored to memory at
#0 0x7f75db58fa6b in crc32_little crc32.c:283:9
#1 0x7f75db58d5b3 in crc32_z crc32.c:220:20
#2 0x7f75db59668c in crc32 crc32.c:242:12
#3 0x8c94f8 in hashwrite csum-file.c:101:15
#4 0x825faf in stream_to_pack bulk-checkin.c:154:5
#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#7 0xa7cff2 in index_stream object-file.c:2234:9
#8 0xa7bff7 in index_fd object-file.c:2256:9
#9 0xa7d22d in index_path object-file.c:2274:7
#10 0xb3c8c9 in add_to_index read-cache.c:802:7
#11 0xb3e039 in add_file_to_index read-cache.c:835:9
#12 0x4a99c3 in add_files add.c:458:7
#13 0x4a7276 in cmd_add add.c:670:18
#14 0x4a1e76 in run_builtin git.c:461:11
#15 0x49e1e7 in handle_builtin git.c:714:3
#16 0x4a0c08 in run_argv git.c:781:4
#17 0x49d5a8 in cmd_main git.c:912:19
#18 0x7974da in main common-main.c:52:11
#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
#1 0x7f75db5c2011 in flush_pending deflate.c:746:5
#2 0x7f75db5cafa0 in deflate_stored deflate.c:1815:9
#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
#4 0xd80b7f in git_deflate zlib.c:244:12
#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#8 0xa7cff2 in index_stream object-file.c:2234:9
#9 0xa7bff7 in index_fd object-file.c:2256:9
#10 0xa7d22d in index_path object-file.c:2274:7
#11 0xb3c8c9 in add_to_index read-cache.c:802:7
#12 0xb3e039 in add_file_to_index read-cache.c:835:9
#13 0x4a99c3 in add_files add.c:458:7
#14 0x4a7276 in cmd_add add.c:670:18
#15 0x4a1e76 in run_builtin git.c:461:11
#16 0x49e1e7 in handle_builtin git.c:714:3
#17 0x4a0c08 in run_argv git.c:781:4
#18 0x49d5a8 in cmd_main git.c:912:19
#19 0x7974da in main common-main.c:52:11
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
#1 0x7f75db644241 in _tr_stored_block trees.c:873:5
#2 0x7f75db5cad7c in deflate_stored deflate.c:1813:9
#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
#4 0xd80b7f in git_deflate zlib.c:244:12
#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#8 0xa7cff2 in index_stream object-file.c:2234:9
#9 0xa7bff7 in index_fd object-file.c:2256:9
#10 0xa7d22d in index_path object-file.c:2274:7
#11 0xb3c8c9 in add_to_index read-cache.c:802:7
#12 0xb3e039 in add_file_to_index read-cache.c:835:9
#13 0x4a99c3 in add_files add.c:458:7
#14 0x4a7276 in cmd_add add.c:670:18
#15 0x4a1e76 in run_builtin git.c:461:11
#16 0x49e1e7 in handle_builtin git.c:714:3
#17 0x4a0c08 in run_argv git.c:781:4
#18 0x49d5a8 in cmd_main git.c:912:19
#19 0x7974da in main common-main.c:52:11
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
#1 0x7f75db5c8fcf in deflate_stored deflate.c:1783:9
#2 0x7f75db5bb7d2 in deflate deflate.c:1005:34
#3 0xd80b7f in git_deflate zlib.c:244:12
#4 0x825dff in stream_to_pack bulk-checkin.c:140:12
#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#7 0xa7cff2 in index_stream object-file.c:2234:9
#8 0xa7bff7 in index_fd object-file.c:2256:9
#9 0xa7d22d in index_path object-file.c:2274:7
#10 0xb3c8c9 in add_to_index read-cache.c:802:7
#11 0xb3e039 in add_file_to_index read-cache.c:835:9
#12 0x4a99c3 in add_files add.c:458:7
#13 0x4a7276 in cmd_add add.c:670:18
#14 0x4a1e76 in run_builtin git.c:461:11
#15 0x49e1e7 in handle_builtin git.c:714:3
#16 0x4a0c08 in run_argv git.c:781:4
#17 0x49d5a8 in cmd_main git.c:912:19
#18 0x7974da in main common-main.c:52:11
#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)
Uninitialized value was stored to memory at
#0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
#1 0x7f75db5ea545 in read_buf deflate.c:1181:5
#2 0x7f75db5c97f7 in deflate_stored deflate.c:1791:9
#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
#4 0xd80b7f in git_deflate zlib.c:244:12
#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
#8 0xa7cff2 in index_stream object-file.c:2234:9
#9 0xa7bff7 in index_fd object-file.c:2256:9
#10 0xa7d22d in index_path object-file.c:2274:7
#11 0xb3c8c9 in add_to_index read-cache.c:802:7
#12 0xb3e039 in add_file_to_index read-cache.c:835:9
#13 0x4a99c3 in add_files add.c:458:7
#14 0x4a7276 in cmd_add add.c:670:18
#15 0x4a1e76 in run_builtin git.c:461:11
#16 0x49e1e7 in handle_builtin git.c:714:3
#17 0x4a0c08 in run_argv git.c:781:4
#18 0x49d5a8 in cmd_main git.c:912:19
#19 0x7974da in main common-main.c:52:11
Uninitialized value was created by an allocation of 'ibuf' in the stack frame of function 'stream_to_pack'
#0 0x825710 in stream_to_pack bulk-checkin.c:101
SUMMARY: MemorySanitizer: use-of-uninitialized-value crc32.c:283:9 in crc32_little
Exiting
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're hashing a value which is going to be an object ID, we want to
zero-pad that value if necessary. To do so, use the final_oid_fn
instead of the final_fn anytime we're going to create an object ID to
ensure we perform this operation.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We declare a "struct hashfile_checkpoint" but only sometimes actually
call hashfile_checkpoint() on it. That makes it not immediately obvious
that it's valid when we later access its members.
In fact, the code is fine: we fill it in unconditionally in the while(1)
loop as long as "idx" is non-NULL. And then if "idx" is NULL, we exit
early from the function (because we're just computing the hash, not
actually writing), before we look at the struct.
However, this does seem to confuse gcc 9.2.1's -Wmaybe-uninitialized
when compiled with "-flto -O2" (probably because with LTO it can now
realize that our call to hashfile_truncate() does not set the members
either). Let's zero-initialize the struct to tell the compiler, as well
as any readers of the code, that all is well.
Reported-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only remaining callers of has_sha1_file() actually have an object_id
already. They can use the "object" variant, rather than dereferencing
the hash themselves.
The code changes here were completely generated by the included
coccinelle patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.
The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).
This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.
I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.
* js/use-bug-macro:
BUG_exit_code: fix sparse "symbol not declared" warning
Convert remaining die*(BUG) messages
Replace all die("BUG: ...") calls by BUG() ones
run-command: use BUG() to report bugs, not die()
test-tool: help verifying BUG() code paths
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.
* ds/commit-graph:
commit-graph: implement "--append" option
commit-graph: build graph from starting commits
commit-graph: read only from specific pack-indexes
commit: integrate commit graph with commit parsing
commit-graph: close under reachability
commit-graph: add core.commitGraph setting
commit-graph: implement git commit-graph read
commit-graph: implement git-commit-graph write
commit-graph: implement write_commit_graph()
commit-graph: create git-commit-graph builtin
graph: add commit graph design document
commit-graph: add format document
csum-file: refactor finalize_hashfile() method
csum-file: rename hashclose() to finalize_hashfile()
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we want to use a hashfile on the temporary file for a lockfile, then
we need finalize_hashfile() to fully write the trailing hash but also keep
the file descriptor open.
Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.
Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
See previous patch for explanation.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the index_bulk_checkin function, and the static functions it
calls, to use pointers to struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert uses of the direct SHA-1 functions to use the_hash_algo instead.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename struct sha1file to struct hashfile, along with all of its related
functions.
The transformation in this commit was made by global search-and-replace.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many callers of read_in_full() expect to see the exact
number of bytes requested, but their error handling lumps
together true read errors and short reads due to unexpected
EOF.
We can give more specific error messages by separating these
cases (showing errno when appropriate, and otherwise
describing the short read).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct pack_idx_entry to use struct object_id by changing the
definition and applying the following semantic patch, plus the standard
object_id transforms:
@@
struct pack_idx_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct pack_idx_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The encode_in_pack_object_header() writes a variable-length
header to an output buffer, but it doesn't actually know
long the buffer is. At first glance, this looks like it
might be possible to overflow.
In practice, this is probably impossible. The smallest
buffer we use is 10 bytes, which would hold the header for
an object up to 2^67 bytes. Obviously we're not likely to
see such an object, but we might worry that an object could
lie about its size (causing us to overflow before we realize
it does not actually have that many bytes). But the argument
is passed as a uintmax_t. Even on systems that have __int128
available, uintmax_t is typically restricted to 64-bit by
the ABI.
So it's unlikely that a system exists where this could be
exploited. Still, it's easy enough to use a normal out/len
pair and make sure we don't write too far. That protects the
hypothetical 128-bit system, makes it harder for callers to
accidentally specify a too-small buffer, and makes the
resulting code easier to audit.
Note that the one caller in fast-import tried to catch such
a case, but did so _after_ the call (at which point we'd
have already overflowed!). This check can now go away.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are three codepaths that use a variable whose name is
pack_compression_level to affect how objects and deltas sent to a
packfile is compressed. Unlike zlib_compression_level that controls
the loose object compression, however, this variable was static to
each of these codepaths. Two of them read the pack.compression
configuration variable, using core.compression as the default, and
one of them also allowed overriding it from the command line.
The other codepath in bulk-checkin did not pay any attention to the
configuration.
Unify the configuration parsing to git_default_config(), where we
implement the parsing of core.loosecompression and core.compression
and make the former override the latter, by moving code to parse
pack.compression and also allow core.compression to give default to
this variable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally use 32-byte buffers to format git's "type size"
header fields. These should not generally overflow unless
you can produce some truly gigantic objects (and our types
come from our internal array of constant strings). But it is
a good idea to use xsnprintf to make sure this is the case.
Note that we slightly modify the interface to
write_sha1_file_prepare, which nows uses "hdrlen" as an "in"
parameter as well as an "out" (on the way in it stores the
allocated size of the header, and on the way out it returns
the ultimate size of the header).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Identify parts of the code that knows that we use SHA-1 hash to
name our objects too much, and use (1) symbolic constants instead
of hardcoded 20 as byte count and/or (2) use struct object_id
instead of unsigned char [20] for object names.
* bc/object-id:
apply: convert threeway_stage to object_id
patch-id: convert to use struct object_id
commit: convert parts to struct object_id
diff: convert struct combine_diff_path to object_id
bulk-checkin.c: convert to use struct object_id
zip: use GIT_SHA1_HEXSZ for trailers
archive.c: convert to use struct object_id
bisect.c: convert leaf functions to use struct object_id
define utility functions for object IDs
define a structure for object IDs
Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CodingGuidelines states that the first #include in C files should be
git-compat-util.h or another header file that includes it, such as
cache.h or builtin.h.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old version fixes a maximum length on the buffer, which could be a problem
if one is not certain of the length of get_object_directory().
Using strbuf can avoid the protential bug.
Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Sun He <sunheehnus@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The deflate loop in bulk-checkin::stream_to_pack expects to get all bytes
from a file that it requests to read in a single function call. But it
used xread(), which does not give that guarantee. Replace it by
read_in_full().
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This extends the earlier approach to stream a large file directly from the
filesystem to its own packfile, and allows "git add" to send large files
directly into a single pack. Older code used to spawn fast-import, but the
new bulk-checkin API replaces it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>