If the user writes a message via --compose, send-email will pick up
various headers like "From", "Subject", etc and use them for other
patches as if they were specified on the command-line. But we don't
handle "To", "Cc", or "Bcc" this way; we just tell the user "those
aren't interpeted yet" and ignore them.
But it seems like an obvious thing to want, especially as the same
feature exists when the cover letter is generated separately by
format-patch. There it is gated behind the --to-cover option, but I
don't think we'd need the same control here; since we generate the
--compose template ourselves based on the existing input, if the user
leaves the lines unchanged then the behavior remains the same.
So let's fill in the implementation; like those other headers we already
handle, we just need to assign to the initial_* variables. The only
difference in this case is that they are arrays, so we'll feed them
through parse_address_line() to split them (just like we would when
reading a single string via prompting).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit b6049542b9.
Prior to that commit, we read the results of the user editing the
"--compose" message in a loop, picking out parts we cared about, and
streaming the result out to a ".final" file. That commit split the
reading/interpreting into two phases; we'd now read into a hash, and
then pick things out of the hash.
The goal was making the code more readable. And in some ways it did,
because the ugly regexes are confined to the reading phase. But it also
introduced several bugs, because now the two phases need to match each
other. In particular:
- we pick out headers like "Subject: foo" with a case-insensitive
regex, and then use the user-provided header name as the key in a
case-sensitive hash. So if the user wrote "subject: foo", we'd no
longer recognize it as a subject.
- the namespace for the hash keys conflates header names with meta
information like "body". If you put "body: foo" in your message, it
would be misinterpreted as the actual message body (nobody is likely
to do that in practice, but it seems like an unnecessary danger).
- the handling for to/cc/bcc is totally broken. The behavior before
that commit is to recognize and skip those headers, with a note to
the user that they are not yet handled. Not great, but OK. But
after the patch, the reading side now splits the addresses into a
perl array-ref. But the interpreting side doesn't handle this at
all, and blindly prints the stringified array-ref value. This leads
to garbage like:
(mbox) Adding to: ARRAY (0x555b4345c428) from line 'To: ARRAY(0x555b4345c428)'
error: unable to extract a valid address from: ARRAY (0x555b4345c428)
What to do with this address? ([q]uit|[d]rop|[e]dit):
Probably not a huge deal, since nobody should even try to use those
headers in the first place (since they were not implemented). But
the new behavior is worse, and indicative of the sorts of problems
that come from having the two layers.
The revert had a few conflicts, due to later work in this area from
15dc3b9161 (send-email: rename variable for clarity, 2018-03-04) and
d11c943c78 (send-email: support separate Reply-To address, 2018-03-04).
I've ported the changes from those commits over as part of the conflict
resolution.
The new tests show the bugs. Note the use of GIT_SEND_EMAIL_NOTTY in the
second one. Without it, the test is happy to reach outside the test
harness to the developer's actual terminal (when run with the buggy
state before this patch).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for git-send-email lists the headers handled specially
by --compose in a way that implies that this is the complete set of
headers that are special. But one more was added by d11c943c78
(send-email: support separate Reply-To address, 2018-03-04) and never
documented.
Let's add it, and reword the documentation slightly to avoid having to
specify the list of headers twice (as it is growing and will continue to
do so as we add new features).
If you read the code, you may notice that we also handle MIME-Version
specially, in that we'll avoid over-writing user-provided MIME headers.
I don't think this is worth mentioning, as it's what you'd expect to
happen (as opposed to the other headers, which are picked up to be used
in later emails). And certainly this feature existed when the
documentation was expanded in 01d3861217 (git-send-email.txt: describe
--compose better, 2009-03-16), and we chose not to mention it then.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Die gracefully when `git grep --no-index` is run outside of a Git
repository and the path is outside the directory tree.
If you are not in a Git repository and say:
git grep --no-index search ..
You trigger a `BUG`:
BUG: environment.c:213: git environment hasn't been setup
Aborted (core dumped)
Because `..` is a valid path which is treated as a pathspec. Then
`pathspec` figures out that it is not in the current directory tree. The
`BUG` is triggered when `pathspec` tries to advise the user about how the
path is not in the current (non-existing) repository.
Reported-by: ks1322 ks1322 <ks1322@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-p4.py would attempt to put a symlink in LFS if its file extension
matched git-p4.largeFileExtensions.
Git LFS doesn't store symlinks because smudge/clean filters don't handle
symlinks. They never get passed to the filter process nor the
smudge/clean filters, nor could that occur without a change to the
protocol or command-line interface. Unless Git learned how to send them
to the filters, Git LFS would have a hard time using them in any useful
way.
Git LFS's goal is to move large files out of the repository history, and
symlinks are functionally limited to 4 KiB or a similar size on most
systems.
Signed-off-by: Matthew McClain <mmcclain@noprivs.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests in t7601 use "test -f" and "test ! -f" to see if a path
exists or is missing.
Use test_path_is_file and test_path_is_missing helper functions to
clarify these tests a bit better. This especially matters for the
"missing" case because "test ! -f F" will be happy if "F" exists as a
directory, but the intent of the test is that "F" should not exist, even
as a directory. The updated code expresses this better.
Signed-off-by: Dorcas AnonoLitunya <anonolitunya@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git am` passes the value given to its `--whitespace` option through
to the underlying `git apply`, and the value is called <action> over
there. Fix the documentation for the command that calls the value
<option> to say <action> instead.
Note that the option help given by `git am -h` already calls the
value <action>, so there is no need to make a matching change there.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix "git merge-tree" to stop segfaulting when the --attr-source
option is used.
* jc/merge-ort-attr-index-fix:
merge-ort: initialize repo in index state
"git repack" learned "--max-cruft-size" to prevent cruft packs from
growing without bounds.
* tb/repack-max-cruft-size:
repack: free existing_cruft array after use
builtin/repack.c: avoid making cruft packs preferred
builtin/repack.c: implement support for `--max-cruft-size`
builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
t7700: split cruft-related tests to t7704
These error messages say "new_index" as if that spelling has some
significance to the end users (e.g. the file "$GIT_DIR/new_index"
has some issues), but that is not the case at all. The i18n folks
were made to include the word literally in the translated messages,
which was not a good idea at all. Spell it "new index", as we are
just telling the users that we failed to create a new index file.
The term is expected to be translated to the end-users' languages,
not left as if it were a literal file name.
This dates all the way back to the first re-implemenation of "git
commit" command in C (the scripted version did not have such wording
in its error messages), in f5bbc322 (Port git commit to C.,
2007-11-08).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As described in the CodingGuidelines document, a single line message
given to die() and its friends should not capitalize its first word,
and should not add full-stop at the end.
Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for geometric repacking mentions a "--unpacked" option
that supposedly changes how loose objects are rolled up. This option has
never existed, and the implied behaviour, namely to include all unpacked
objects into the resulting packfile, is in fact the default behaviour.
Correct the documentation to not mention this option.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `-g` switch is a shorthand for `--geometric=` and allows the user to
specify the geometric. The documentation is wrong though and indicates
that the syntax for the shorthand is `-g=<factor>`. In fact though, the
option must be specified without the equals sign via `-g<factor>`.
Fix the syntax accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test t5319.88 ("reader bounds-checks large offset table") can fail
intermittently. The failure mode looks like this:
1. An earlier test sets up "objects64", a directory that can be used
to produce a midx with a corrupted large-offsets table. To get the
large offsets, it corrupts the normal ".idx" file to have a fake
large offset, and then builds a midx from that.
That midx now has a large offset table, which is what we want. But
we also have a .idx on disk that has a corrupted entry. We'll call
the object with the corrupted large-offset "X".
2. In t5319.88, we further corrupt the midx by reducing the size of
the large-offset chunk (because our goal is to make sure we do not
do an out-of-bounds read on it).
3. We then enumerate all of the objects with "cat-file --batch-check
--batch-all-objects", expecting to see a complaint when we try to
show object X. We use --batch-all-objects because our objects64
repo doesn't actually have any refs (but if we check them all, one
of them will be the failing one). The default batch-check format
includes %(objecttype) and %(objectsize), both of which require us
to access the actual pack data (and thus requires looking at the
offset).
4a. Usually, this succeeds. We try to output object X, do a lookup via
the midx for the type/size lookup, and run into the corrupt
large-offset table.
4b. But sometimes we hit a different error. If another object points
to X as a delta base, then trying to find the type of that object
requires walking the delta chain to the base entry (since only the
base has the concrete type; deltas themselves are either OFS_DELTA
or REF_DELTA).
Normally this would not require separate offset lookups at all, as
deltas are usually stored as OFS_DELTA, specifying the relative
offset to the base. But the corrupt idx created in step 1 is done
directly with "git pack-objects" and does not pass the
--delta-base-offset option, meaning we have REF_DELTA entries!
Those do have to consult an index to find the location of the base
object, and they use the pack .idx to do this. The same pack .idx
that we know is corrupted from step 1!
Git does notice the error, but it does so by seeing the corrupt
.idx file, not the corrupt midx file, and the error it reports is
different, causing the test to fail.
The set of objects created in the test is deterministic. But the delta
selection seems not to be (which is not too surprising, as it is
multi-threaded). I have seen the failure in Windows CI but haven't
reproduced it locally (not even with --stress). Re-running a failed
Windows CI job tends to work. But when I download and examine the trash
directory from a failed run, it shows a different set of deltas than I
get locally. But the exact source of non-determinism isn't that
important; our test should be robust against any order.
There are a few options to fix this:
a. It would be OK for the "objects64" setup to "unbreak" the .idx file
after generating the midx. But then it would be hard for subsequent
tests to reuse it, since it is the corrupted idx that forces the
midx to have a large offset table.
b. The "objects64" setup could use --delta-base-offset. This would fix
our problem, but earlier tests have many hard-coded offsets. Using
OFS_DELTA would change the locations of objects in the pack (this
might even be OK because I think most of the offsets are within the
.idx file, but it seems brittle and I'm afraid to touch it).
c. Our cat-file output is in oid order by default. Since we store
bases before deltas, if we went in pack order (using the
"--unordered" flag), we'd always see our corrupt X before any delta
which depends on it. But using "--unordered" means we skip the midx
entirely. That makes sense, since it is just enumerating all of
the packs, using the offsets found in their .idx files directly.
So it doesn't work for our test.
d. We could ask directly about object X, rather than enumerating all
of them. But that requires further hard-coding of the oid (both
sha1 and sha256) of object X. I'd prefer not to introduce more
brittleness.
e. We can use a --batch-check format that looks at the pack data, but
doesn't have to chase deltas. The problem in this case is
%(objecttype), which has to walk to the base. But %(objectsize)
does not; we can get the value directly from the delta itself.
Another option would be %(deltabase), where we report the REF_DELTA
name but don't look at its data.
I've gone with option (e) here. It's kind of subtle, but it's simple and
has no side effects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --merge-base X other args..." insisted that X must be a
commit and errored out when given an annotated tag that peels to a
commit, but we only need it to be a committish. This has been
corrected.
* ar/diff-index-merge-base-fix:
diff: fix --merge-base with annotated tags
In .gitmodules files, submodules are keyed by their names, and the
path to the submodule whose name is $name is specified by the
submodule.$name.path variable. There were a few codepaths that
mixed the name and path up when consulting the submodule database,
which have been corrected. It took long for these bugs to be found
as the name of a submodule initially is the same as its path, and
the problem does not surface until it is moved to a different path,
which apparently happens very rarely.
* js/submodule-fix-misuse-of-path-and-name:
t7420: test that we correctly handle renamed submodules
t7419: test that we correctly handle renamed submodules
t7419, t7420: use test_cmp_config instead of grepping .gitmodules
t7419: actually test the branch switching
submodule--helper: return error from set-url when modifying failed
submodule--helper: use submodule_from_path in set-{url,branch}
Leakfix.
* jk/commit-graph-leak-fixes:
commit-graph: clear oidset after finishing write
commit-graph: free write-context base_graph_name during cleanup
commit-graph: free write-context entries before overwriting
commit-graph: free graph struct that was not added to chain
commit-graph: delay base_graph assignment in add_graph_to_chain()
commit-graph: free all elements of graph chain
commit-graph: move slab-clearing to close_commit_graph()
merge: free result of repo_get_merge_bases()
commit-reach: free temporary list in get_octopus_merge_bases()
t6700: mark test as leak-free
Test coverage for trailers has been improved.
* la/trailer-test-and-doc-updates:
trailer doc: <token> is a <key> or <keyAlias>, not both
trailer doc: separator within key suppresses default separator
trailer doc: emphasize the effect of configuration variables
trailer --unfold help: prefer "reformat" over "join"
trailer --parse docs: add explanation for its usefulness
trailer --only-input: prefer "configuration variables" over "rules"
trailer --parse help: expose aliased options
trailer --no-divider help: describe usual "---" meaning
trailer: trailer location is a place, not an action
trailer doc: narrow down scope of --where and related flags
trailer: add tests to check defaulting behavior with --no-* flags
trailer test description: this tests --where=after, not --where=before
trailer tests: make test cases self-contained
The index stores file sizes using a uint32_t. This causes any file
that is a multiple of 2^32 to have a cached file size of zero.
Zero is a special value used by racily clean. This causes git to
rehash every file that is a multiple of 2^32 every time git status
or git commit is run.
This patch mitigates the problem by making all files that are a
multiple of 2^32 appear to have a size of 1<<31 instead of zero.
The value of 1<<31 is chosen to keep it as far away from zero
as possible to help prevent things getting mixed up with unpatched
versions of git.
An example would be to have a 2^32 sized file in the index of
patched git. Patched git would save the file as 2^31 in the cache.
An unpatched git would very much see the file has changed in size
and force it to rehash the file, which is safe. The file would
have to grow or shrink by exactly 2^31 and retain all of its
ctime, mtime, and other attributes for old git to not notice
the change.
This patch does not change the behavior of any file that is not
an exact multiple of 2^32.
Signed-off-by: Jason D. Hatton <jhatton@globalfinishing.com>
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we're going to work with some large files which will
be at least 4 GiB in size. To take advantage of the sparseness
functionality on most Unix systems and avoid running the system out of
disk, it would be convenient to use truncate(2) to simply create a
sparse file of sufficient size.
However, the GNU truncate(1) utility isn't portable, so let's write a
tiny test helper that does the work for us.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
44451a2 (attr: teach "--attr-source=<tree>" global option to "git",
2023-05-06) provided the ability to pass in a treeish as the attr
source. In the context of serving Git repositories as bare repos like we
do at GitLab however, it would be easier to point --attr-source to HEAD
for all commands by setting it once.
Add a new config attr.tree that allows this.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The motivation for 44451a2e5e (attr: teach "--attr-source=<tree>" global
option to "git" , 2023-05-06), was to make it possible to use
gitattributes with bare repositories.
To make it easier to read gitattributes in bare repositories however,
let's just make HEAD:.gitattributes the default. This is in line with
how mailmap works, 8c473cecfd (mailmap: default mailmap.blob in bare
repositories, 2012-12-13).
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GitHub CI workflow has learned to trigger Coverity check.
* js/ci-coverity:
coverity: detect and report when the token or project is incorrect
coverity: allow running on macOS
coverity: support building on Windows
coverity: allow overriding the Coverity project
coverity: cache the Coverity Build Tool
ci: add a GitHub workflow to submit Coverity scans
Just like OPT_FILENAME() does, "git grep -f <path>" should treat
the <path> relative to the original $cwd by paying attention to the
prefix the command is given.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git stash store" is meant to store what "git stash create"
produces, as these two are implementation details of the end-user
facing "git stash save" command. Even though it is clearly
documented as such, users would try silly things like "git stash
store HEAD" to render their stash unusable.
Worse yet, because "git stash drop" does not allow such a stash
entry to be removed, "git stash clear" would be the only way to
recover from such a mishap. Reuse the logic that allows "drop" to
refrain from working on such a stash entry to teach "store" to avoid
storing an object that is not a stash entry in the first place.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When mostly the same set of options are to be used to perform
multiple merges, one instance of the merge_options structure may
want to be created and used by copying from the same template
instance. We saw such a use recently in "git merge-tree".
Let's make the pattern official by introducing copy_merge_options()
as a supported way to make a copy of the structure, and also give
clear_merge_options() to release any resources held by a copied
instance. Currently we only make a shallow copy, so the former is a
mere structure assignment while the latter is a no-op, but this may
change in the future as the members of merge_options structure
evolve.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shuffle some bits across headers and sources to prepare for
libification effort.
* cw/prelim-cleanup:
parse: separate out parsing functions from config.h
config: correct bad boolean env value error message
wrapper: reduce scope of remove_or_warn()
hex-ll: separate out non-hash-algo functions
The "streaming" interface used for bulk-checkin codepath has been
narrowed to take only blob objects for now, with no real loss of
functionality.
* eb/limit-bulk-checkin-to-blobs:
bulk-checkin: only support blobs in index_bulk_checkin
Some references are special in the context of worktrees as they are
considered to be per-worktree instead of shared across all of the
worktrees. Most importantly, this includes "refs/worktree/" that have
explicitly been designed such that users can create per-woorktree refs.
But there are also special references that have an associated meaning
like "refs/bisect/", which is used to track state of git-bisect(1).
These special per-worktree references are documented in git-worktree(1),
but one instance is missing. In a9be29c981 (sequencer: make refs
generated by the `label` command worktree-local, 2018-04-25), we have
converted "refs/rewritten/" to be a per-worktree reference as well.
These references are used by our sequencer infrastructure to generate
labels for rebased commits. So in order to allow for multiple concurrent
rebases to happen in different worktrees, these references need to be
tracked per worktree.
We forgot to update our documentation to mention these new per-worktree
references, which is fixed by this patch.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are no callers left, and we don't want anybody to add new ones (they
should use the not-unsafe version instead). So let's drop the function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The BIDX chunk tells us the offsets at which each commit's Bloom filters
can be found in the BDAT chunk. We compute the length of each filter by
checking the offsets of neighbors and subtracting them.
If the offsets are out of order, then we'll get a negative length, which
we then store as a very large unsigned value. This can cause us to read
out-of-bounds memory, as we access the hash data modulo "filter->len *
BITS_PER_WORD".
We can easily detect this case when loading the individual filters.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We load the bloom_filter_indexes chunk using pair_chunk(), so we have no
idea how big it is. This can lead to out-of-bounds reads if it is
smaller than expected, since we index it based on the number of commits
found elsewhere in the graph file.
We can check the chunk size up front, like we do for CDAT and other
chunks with one fixed-size record per commit.
The test case demonstrates the problem. It actually won't segfault,
because we end up reading random data from the follow-on chunk (BDAT in
this case), and the bounds checks added in the previous patch complain.
But this is by no means assured, and you can craft a commit-graph file
with BIDX at the end (or a smaller BDAT) that does segfault.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When loading Bloom filters from a commit-graph file, we use the offset
values in the BIDX chunk to index into the memory mapped for the BDAT
chunk. But since we don't record how big the BDAT chunk is, we just
trust that the BIDX offsets won't cause us to read outside of the chunk
memory. A corrupted or malicious commit-graph file will cause us to
segfault (in practice this isn't a very interesting attack, since
commit-graph files are local-only, and the worst case is an
out-of-bounds read).
We can't fix this by checking the chunk size during parsing, since the
data in the BDAT chunk doesn't have a fixed size (that's why we need the
BIDX in the first place). So we'll fix it in two parts:
1. Record the BDAT chunk size during parsing, and then later check
that the BIDX offsets we look up are within bounds.
2. Because the offsets are relative to the end of the BDAT header, we
must also make sure that the BDAT chunk is at least as large as the
expected header size. Otherwise, we overflow when trying to move
past the header, even for an offset of "0". We can check this
early, during the parsing stage.
The error messages are rather verbose, but since this is not something
you'd expect to see outside of severe bugs or corruption, it makes sense
to err on the side of too many details. Sadly we can't mention the
filename during the chunk-parsing stage, as we haven't set g->filename
at this point, nor passed it down through the stack.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the generation entry in a commit-graph doesn't fit, we instead insert
an offset into a generation overflow chunk. But since we don't record
the size of the chunk, we may read outside the chunk if the offset we
find on disk is malicious or corrupted.
We can't check the size of the chunk up-front; it will vary based on how
many entries need overflow. So instead, we'll do a bounds-check before
accessing the chunk memory. Unfortunately there is no error-return from
this function, so we'll just have to die(), which is what it does for
other forms of corruption.
As with other cases, we can drop the st_mult() call, since we know our
bounds-checked value will fit within a size_t.
Before this patch, the test here actually "works" because we read
garbage data from the next chunk. And since that garbage data happens
not to provide a generation number which changes the output, it appears
to work. We could construct a case that actually segfaults or produces
wrong output, but it would be a bit tricky. For our purposes its
sufficient to check that we've detected the bounds error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We neither check nor record the size of the generations chunk we parse
from a commit-graph file. This should have one uint32_t for each commit
in the file; if it is smaller (due to corruption, etc), we may read
outside the mapped memory.
The included test segfaults without this patch, as it shrinks the size
considerably (and the chunk is near the end of the file, so we read off
the end of the array rather than accidentally reading another chunk).
We can fix this by checking the size up front (like we do for other
fixed-size chunks, like CDAT).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are loading a commit-graph chain, we check that each slice of the
chain points to the appropriate set of base graphs via its BASE chunk.
But since we don't record the size of the chunk, we may access
out-of-bounds memory if the file is corrupted.
Since we know the number of entries we expect to find (based on the
position within the commit-graph-chain file), we can just check the size
up front.
In theory this would also let us drop the st_mult() call a few lines
later when we actually access the memory, since we know that the
computed offset will fit in a size_t. But because the operands
"g->hash_len" and "n" have types "unsigned char" and "int", we'd have to
cast to size_t first. Leaving the st_mult() does that cast, and makes it
more obvious that we don't have an overflow problem.
Note that the test does not actually segfault before this patch, since
it just reads garbage from the chunk after BASE (and indeed, it even
rejects the file because that garbage does not have the expected hash
value). You could construct a file with BASE at the end that did
segfault, but corrupting the existing one is easy, and we can check
stderr for the expected message.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If an entry in a commit-graph file has more than 2 parents, the
fixed-size parent fields instead point to an offset within an "extra
edges" chunk. We blindly follow these, assuming that the chunk is
present and sufficiently large; this can lead to an out-of-bounds read
for a corrupt or malicious file.
We can fix this by recording the size of the chunk and adding a
bounds-check in fill_commit_in_graph(). There are a few tricky bits:
1. We'll switch from working with a pointer to an offset. This makes
some corner cases just fall out naturally:
a. If we did not find an EDGE chunk at all, our size will
correctly be zero (so everything is "out of bounds").
b. Comparing "size / 4" lets us make sure we have at least 4 bytes
to read, and we never compute a pointer more than one element
past the end of the array (computing a larger pointer is
probably OK in practice, but is technically undefined
behavior).
c. The current code casts to "uint32_t *". Replacing it with an
offset avoids any comparison between different types of pointer
(since the chunk is stored as "unsigned char *").
2. This is the first case in which fill_commit_in_graph() may return
anything but success. We need to make sure to roll back the
"parsed" flag (and any parents we might have added before running
out of buffer) so that the caller can cleanly fall back to
loading the commit object itself.
It's a little non-trivial to do this, and we might benefit from
factoring it out. But we can wait on that until we actually see a
second case where we return an error.
As a bonus, this lets us drop the st_mult() call. Since we've already
done a bounds check, we know there won't be any integer overflow (it
would imply our buffer is larger than a size_t can hold).
The included test does not actually segfault before this patch (though
you could construct a case where it does). Instead, it reads garbage
from the next chunk which results in it complaining about a bogus parent
id. This is sufficient for our needs, though (we care that the fallback
succeeds, and that stderr mentions the out-of-bounds read).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>