In git-upload-archive(1), we pass an array of arguments to
`write_archive()` to tell it what exactly to do. We don't ever clear the
vector though, causing a memory leak. Furthermore though, the call to
`write_archive()` may cause contents of the array to be modified, which
would cause us to leak memory to allocated strings held by it.
Fix the issue by having `write_archive()` create a shallow copy of
`argv` before parsing the arguments. Like this, we won't modify the
caller's array and can easily `strvec_clear()` it to plug these memory
leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This also made it clear that several .c files that depended upon path.h
were missing a #include for it; add the missing includes while at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo). However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id. Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.
This allows hash.h and object.h to be fairly small, minimal headers. It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's usually a bad idea for a builtin's cmd_foo() to ignore the "prefix"
argument it gets, as it needs to prepend that string when accessing any
paths given by the user.
But if a builtin does not ask for the git wrapper to run repository
setup (via the RUN_SETUP or RUN_SETUP_GENTLY flags), then we know the
prefix will always be NULL (it is adjusting for the chdir() done during
repo setup, but there cannot be one if we did not set up the repo). In
those cases it's OK to ignore "prefix", but it's worth annotating for a
few reasons:
1. It serves as documentation to somebody reading the code about what
we expect.
2. If the flags in git.c ever change, the run-time assertion may help
detect the problem (though only if the command is run from a
subdirectory of the repository).
3. It notes to the compiler that we are OK ignoring "prefix". In
particular, this silences -Wunused-parameter. It _could_ also help
the compiler generate better code (because it will know the prefix
is NULL), but in practice this is quite unlikely to matter.
Note that I've only added this annotation to commands which triggered
-Wunused-parameter. It would be correct to add it to any builtin which
doesn't ask for RUN_SETUP, but most of the rest of them do the sensible
thing with "prefix" by passing it to parse_options(). So they're much
more likely to just work if they ever switched to RUN_SETUP, and aren't
worth annotating.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various inconsistencies between command SYNOPSIS and the
corresponding -h output where our translatable labels didn't match
up.
In some cases we need to adjust the prose that follows the SYNOPSIS
accordingly, as it refers back to the changed label.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This pattern added [1] in seems to have been intentional, but since
[2] and [3] we've wanted do initialization of what's now the "struct
strvec" "args" and "env_array" members. Let's not trample on that
initialization here.
1. 1bc01efed1 (upload-archive: use start_command instead of fork,
2011-11-19)
2. c460c0ecdc (run-command: store an optional argv_array, 2014-05-15)
3. 9a583dc39e (run-command: add env_array, an optional argv_array for
env, 2014-10-19)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").
Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts all of the files in builtin/ to keep the diff to a
manageable size.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
and then selectively staging files with "git add builtin/". We'll deal
with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe 's/argv-array.h/strvec.h/'
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Initialize archivers as soon as possible when running git-archive.
Various non-obvious behavior depends on having the archivers
initialized, such as determining the desired archival format from the
provided filename.
Since 08716b3c11 ("archive: refactor file extension format-guessing",
2011-06-21), archive_format_from_filename() has used the registered
archivers to match filenames (provided via --output) to archival
formats. However, when git-archive is executed with --remote, format
detection happens before the archivers have been registered. This causes
archives from remotes to always be generated as TAR files, regardless of
the actual filename (unless an explicit --format is provided).
This patch fixes that behavior; archival format is determined properly
from the output filename, even when --remote is used.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normally upload-archive forks off upload-archive--writer to
do the real work, and relays any errors back over the
sideband channel. This is a good thing when the command is
properly invoked remotely via ssh or git-daemon. But it's
confusing to curious users who try "git upload-archive -h".
Let's catch this invocation early and give a real usage
message, rather than spewing "-h does not appear to be a git
repository" amidst packet-lines. The chance of a false
positive due to a real client asking for the repo "-h" is
quite small.
Likewise, we'll catch "-h" in upload-archive--writer. People
shouldn't be invoking it manually, but it doesn't hurt to
give a sane message if they do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since b9605bc4f2 ("config: only read .git/config from configured
repos", 2016-09-12), we do not read from ".git/config" unless we
know we are in a repository. "git archive" however didn't do the
repository discovery and instead relied on the old behaviour.
Teach the command to run a "gentle" version of repository discovery
so that local configuration variables are honoured.
[jc: stole tests from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.
packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we report an error to the client, we format it into a
fixed-size buffer using vsprintf(). This can't actually
overflow in practice, since we only format a very tame
subset of strings (mostly strerror() output). However, it's
hard to tell immediately, so let's just use a strbuf so
readers do not have to wonder.
We do add an allocation here, but the performance is not
important; the next step is to call die() anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:
1. The only variable-sized data in these lines is a ref
name, and refs tend to be a lot shorter than 1000
characters.
2. When sending ref lines, git-core always limits itself
to 1000 byte packets.
However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.
This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:
1. Other git implementations may have followed
protocol-common.txt and used a larger maximum size. We
don't bump into it in practice because it would involve
very long ref names.
2. We may want to increase the 1000-byte limit one day.
Since packets are transferred before any capabilities,
it's difficult to do this in a backwards-compatible
way. But if we bump the size of buffer the readers can
handle, eventually older versions of git will be
obsolete enough that we can justify bumping the
writers, as well. We don't have plans to do this
anytime soon, but there is no reason not to start the
clock ticking now.
Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage. That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current parsing scheme for upload-archive is to pack
arguments into a fixed-size buffer, separated by NULs, and
put a pointer to each argument in the buffer into a
fixed-size argv array.
This works fine, and the limits are high enough that nobody
reasonable is going to hit them, but it makes the code hard
to follow. Instead, let's just stuff the arguments into an
argv_array, which is much simpler. That lifts the "all
arguments must fit inside 4K together" limit.
We could also trivially lift the MAX_ARGS limitation (in
fact, we have to keep extra code to enforce it). But that
would mean a client could force us to allocate an arbitrary
amount of memory simply by sending us "argument" lines. By
limiting the MAX_ARGS, we limit an attacker to about 4
megabytes (64 times a maximum 64K packet buffer). That may
sound like a lot compared to the 4K limit, but it's not a
big deal compared to what git-archive will actually allocate
while working (e.g., to load blobs into memory). The
important thing is that it is bounded.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to the comment, enter_repo will modify its input.
However, this has not been the case since 1c64b48
(enter_repo: do not modify input, 2011-10-04). Drop the
now-useless copy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The POSIX-function fork is not supported on Windows. Use our
start_command API instead, respawning ourselves in a special
"writer" mode to follow the alternate code path.
Remove the NOT_MINGW-prereq for t5000, as git-archive --remote
now works.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The POSIX-function fork is not supported on Windows. Use our
start_command API instead.
As this is the last call-site that depends on the fork-stub in
compat/mingw.h, remove that as well.
Add an undocumented flag to git-archive that tells it that the
action originated from a remote, so features can be disabled.
Thanks to Jeff King for work on this part.
Remove the NOT_MINGW-prereq for t5000, as git-archive --remote
now works.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tar filters may be very expensive to run, so sites do
not want to expose them via upload-archive. This patch lets
users configure tar.<filter>.remote to turn them off.
By default, gzip filters are left on, as they are about as
expensive as creating zip archives.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The process for guessing an archive output format based on
the filename is something like this:
a. parse --output in cmd_archive; check the filename
against a static set of mapping heuristics (right now
it just matches ".zip" for zip files).
b. if found, stick a fake "--format=zip" at the beginning
of the arguments list (if the user did specify a
--format manually, the later option will override our
fake one)
c. if it's a remote call, ship the arguments to the remote
(including the fake), which will call write_archive on
their end
d. if it's local, ship the arguments to write_archive
locally
There are two problems:
1. The set of mappings is static and at too high a level.
The write_archive level is going to check config for
user-defined formats, some of which will specify
extensions. We need to delay lookup until those are
parsed, so we can match against them.
2. For a remote archive call, our set of mappings (or
formats) may not match the remote side's. This is OK in
practice right now, because all versions of git
understand "zip" and "tar". But as new formats are
added, there is going to be a mismatch between what the
client can do and what the remote server can do.
To fix (1), this patch refactors the location guessing to
happen at the write_archive level, instead of the
cmd_archive level. So instead of sticking a fake --format
field in the argv list, we actually pass a "name hint" down
the callchain; this hint is used at the appropriate time to
guess the format (if one hasn't been given already).
This patch leaves (2) unfixed. The name_hint is converted to
a "--format" option as before, and passed to the remote.
This means the local side's idea of how extensions map to
formats will take precedence.
Another option would be to pass the name hint to the remote
side and let the remote choose. This isn't a good idea for
two reasons:
1. There's no room in the protocol for passing that
information. We can pass a new argument, but older
versions of git on the server will choke on it.
2. Letting the remote side decide creates a silent
inconsistency in user experience. Consider the case
that the locally installed git knows about the "tar.gz"
format, but a remote server doesn't.
Running "git archive -o foo.tar.gz" will use the tar.gz
format. If we use --remote, and the local side chooses
the format, then we send "--format=tar.gz" to the
remote, which will complain about the unknown format.
But if we let the remote side choose the format, then
it will realize that it doesn't know about "tar.gz" and
output uncompressed tar without even issuing a warning.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of
[torvalds@nehalem git]$ em buil<tab>
Display all 180 possibilities? (y or n)
[torvalds@nehalem git]$ em builtin-sh
builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c
builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o
[torvalds@nehalem git]$ em builtin-shor<tab>
builtin-shortlog.c builtin-shortlog.o
[torvalds@nehalem git]$ em builtin-shortlog.c
you get
[torvalds@nehalem git]$ em buil<tab> [type]
builtin/ builtin.h
[torvalds@nehalem git]$ em builtin [auto-completes to]
[torvalds@nehalem git]$ em builtin/sh<tab> [type]
shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o
[torvalds@nehalem git]$ em builtin/sho [auto-completes to]
[torvalds@nehalem git]$ em builtin/shor<tab> [type]
shortlog.c shortlog.o
[torvalds@nehalem git]$ em builtin/shortlog.c
which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.
NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead. I think bash has some cut-off
around 100 choices or something.
So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion. But you can
simulate that by using 'ls' instead, or something similar.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>