Further shuffling of declarations across header files to streamline
file dependencies.
* cw/compat-util-header-cleanup:
git-compat-util: move alloc macros to git-compat-util.h
treewide: remove unnecessary includes for wrapper.h
kwset: move translation table from ctype
sane-ctype.h: create header for sane-ctype macros
git-compat-util: move wrapper.c funcs to its header
git-compat-util: move strbuf.c funcs to its header
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid the overhead of passing context to a callback function of
strbuf_expand() by using strbuf_expand_step() in a loop instead. It
requires explicit handling of %% and unrecognized placeholders, but is
simpler, more direct and avoids void pointers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo). However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id. Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.
This allows hash.h and object.h to be fairly small, minimal headers. It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Michael J Gruber noticed that connection via the git:// protocol no
longer worked after a recent header clean-up. This was caused by
funny interaction of few gotchas. First, a necessary definition
#define DEFAULT_GIT_PORT 9418
was made invisible to a place where
const char *port = STR(DEFAULT_GIT_PORT);
was expecting to turn the integer into "9418" with a clever STR()
macro, and ended up stringifying it to
const char *port = "DEFAULT_GIT_PORT";
without giving any chance to compilers to notice such a mistake.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is another step towards letting us remove the include of cache.h in
strbuf.c. It does mean that we also need to add includes of abspath.h
in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More work towards -Wunused.
* jk/unused-post-2.39-part2: (21 commits)
help: mark unused parameter in git_unknown_cmd_config()
run_processes_parallel: mark unused callback parameters
userformat_want_item(): mark unused parameter
for_each_commit_graft(): mark unused callback parameter
rewrite_parents(): mark unused callback parameter
fetch-pack: mark unused parameter in callback function
notes: mark unused callback parameters
prio-queue: mark unused parameters in comparison functions
for_each_object: mark unused callback parameters
list-objects: mark unused callback parameters
mark unused parameters in signal handlers
run-command: mark error routine parameters as unused
mark "pointless" data pointers in callbacks
ref-filter: mark unused callback parameters
http-backend: mark unused parameters in virtual functions
http-backend: mark argc/argv unused
object-name: mark unused parameters in disambiguate callbacks
serve: mark unused parameters in virtual functions
serve: use repository pointer to get config
ls-refs: drop config caching
...
Signal handlers receive their signal number as a parameter, but many
don't care what it is (because they only handle one signal, or because
their action is the same regardless of the signal). Mark such parameters
to silence -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The undecorated arguments to the 'git-daemon' command provide a list of
directories. When at least one directory is specified, then 'git-daemon'
only serves requests that are within that directory list. The boolean
'--strict-paths' option makes the list more explicit in that
subdirectories are no longer included.
The existing documentation and error messages around this directory list
refer to it and its behavior as a "whitelist". The word "whitelist" has
cultural implications that are not inclusive. Thankfully, it is not
difficult to reword and avoid its use. In the process, we can define the
purpose of this directory list directly.
In Documentation/git-daemon.txt, rewrite the OPTIONS section around the
'<directory>' option. Add additional clarity to the other options that
refer to these directories.
Some error messages can also be improved in daemon.c. The
'--strict-paths' option requires '<directory>' arguments, so refer to
that section of the documentation directly. A logerror() call points out
that a requested directory is not in the specified directory list. We
can use "list" here without any loss of information.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename .env_array member to .env in the child_process structure.
* ab/env-array:
run-command API users: use "env" not "env_array" in comments & names
run-command API: rename "env_array" to "env"
Start following-up on the rename mentioned in c7c4bdeccf (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".
The "env_array" name was picked in 19a583dc39 (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.
This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.
The rest of this is all a result of applying [1]:
* make contrib/coccinelle/run_command.cocci.patch
* patch -p1 <contrib/coccinelle/run_command.cocci.patch
* git add -u
1. cat contrib/coccinelle/run_command.pending.cocci
@@
struct child_process E;
@@
- E.env_array
+ E.env
@@
struct child_process *E;
@@
- E->env_array
+ E->env
I've avoided changing any comments and derived variable names here,
that will all be done in the next commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Release the strbuf containing the interpolated path after copying it to
a stack buffer and before erroring out if it's too long.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a pattern of hardcoding an "argv" array size, populating it and
assigning to the "argv" member of "struct child_process" to instead
use "strvec_push()" to add data to the "args" member.
As noted in the preceding commit this moves us further towards being
able to remove the "argv" member in a subsequent commit
These callers could have used strvec_pushl(), but moving to
strvec_push() makes the diff easier to read, and keeps the arguments
aligned as before.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Migrate those run-command API users that assign directly to the "argv"
member to use a strvec_pushv() of "args" instead.
In these cases it did not make sense to further refactor these
callers, e.g. daemon.c could be made to construct the arguments closer
to handle(), but that would require moving the construction from its
cmd_main() and pass "argv" through two intermediate functions.
It would be possible for a change like this to introduce a regression
if we were doing:
cp.argv = argv;
argv[1] = "foo";
And changed the code, as is being done here, to:
strvec_pushv(&cp.args, argv);
argv[1] = "foo";
But as viewing this change with the "-W" flag reveals none of these
functions modify variable that's being pushed afterwards in a way that
would introduce such a logic error.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function was added in 4981fe750b (pkt-line: share
buffer/descriptor reading implementation, 2013-02-23), but in
01f9ec64c8 (Use packet_reader instead of packet_read_line,
2018-12-29) the code that was using it was removed.
Since it's being removed we can in turn remove the "src" and "src_len"
arguments to packet_read(), all the remaining users just passed a
NULL/NULL pair to it.
That function is only a thin wrapper for packet_read_with_status()
which still needs those arguments, but for the thin packet_read()
convenience wrapper we can do away with it for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the hostinfo_init() function added in 01cec54e13 (daemon:
deglobalize hostname information, 2015-03-07) and instead initialize
the "struct hostinfo" with a macro.
This is the more idiomatic pattern in the codebase, and doesn't leave
us wondering when we see the *_init() function if this struct needs
more complex initialization than a macro can provide.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git daemon" has been tightened against systems that take backslash
as directory separator.
* rs/daemon-sanitize-dir-sep:
daemon: sanitize all directory separators
When sanitizing client-supplied strings on Windows, also strip off
backslashes, not just slashes.
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").
Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code which split an argv_array call across multiple lines, like:
argv_array_pushl(&args, "one argument",
"another argument", "and more",
NULL);
was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:
strvec_pushl(&args, "one argument",
"another argument", "and more",
NULL);
Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:
git jump grep 'strvec_.*,$'
and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts remaining files from the first half of the alphabet,
to keep the diff to a manageable size.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
and then selectively staging files with "git add '[abcdefghjkl]*'".
We'll deal with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent introduction of "--log-destination" option to "git daemon"
did not work well when the daemon was run under "--inetd" mode.
* lw/daemon-log-destination:
daemon.c: fix condition for redirecting stderr
Since the --log-destination option was added in 0c591cacb ("daemon: add
--log-destination=(stderr|syslog|none)", 2018-02-04) with the explicit
goal of allowing logging to stderr when running in inetd mode, we should
not always redirect stderr to /dev/null in inetd mode, but rather only
when stderr is not being used for logging.
Signed-off-by: Lucas Werkmeister <mail@lucaswerkmeister.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The log from "git daemon" can be redirected with a new option; one
relevant use case is to send the log to standard error (instead of
syslog) when running it from inetd.
* lw/daemon-log-destination:
daemon: add --log-destination=(stderr|syslog|none)
This new option can be used to override the implicit --syslog of
--inetd, or to disable all logging. (While --detach also implies
--syslog, --log-destination=stderr with --detach is useless since
--detach disassociates the process from the original stderr.) --syslog
is retained as an alias for --log-destination=syslog.
--log-destination always overrides implicit --syslog regardless of
option order. This is different than the “last one wins” logic that
applies to some implicit options elsewhere in Git, but should hopefully
be less confusing. (I also don’t know if *all* implicit options in Git
follow “last one wins”.)
The combination of --inetd with --log-destination=stderr is useful, for
instance, when running `git daemon` as an instanced systemd service
(with associated socket unit). In this case, log messages sent via
syslog are received by the journal daemon, but run the risk of being
processed at a time when the `git daemon` process has already exited
(especially if the process was very short-lived, e.g. due to client
error), so that the journal daemon can no longer read its cgroup and
attach the message to the correct systemd unit (see systemd/systemd#2913
[1]). Logging to stderr instead can solve this problem, because systemd
can connect stderr directly to the journal daemon, which then already
knows which unit is associated with this stream.
[1]: https://github.com/systemd/systemd/issues/2913
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Lucas Werkmeister <mail@lucaswerkmeister.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git-daemon gets a pktline request, we strip off any
trailing newline, replacing it with a NUL. Clients prior to
5ad312bede (in git v1.4.0) would send:
git-upload-pack repo.git\n
and we need to strip it off to understand their request.
After 5ad312bede, we send the host attribute but no newline,
like:
git-upload-pack repo.git\0host=example.com\0
Both of these are parsed correctly by git-daemon. But if
some client were to combine the two:
git-upload-pack repo.git\n\0host=example.com\0
we don't parse it correctly. The problem is that we use the
"len" variable to record the position of the NUL separator,
but then decrement it when we strip the newline. So we start
with:
git-upload-pack repo.git\n\0host=example.com\0
^-- len
and end up with:
git-upload-pack repo.git\0\0host=example.com\0
^-- len
This is arguably correct, since "len" tells us the length of
the initial string, but we don't actually use it for that.
What we do use it for is finding the offset of the extended
attributes; they used to be at len+1, but are now at len+2.
We can solve that by just leaving "len" where it is. We
don't have to care about the length of the shortened string,
since we just treat it like a C string.
No version of Git ever produced such a string, but it seems
like the daemon code meant to handle this case (and it seems
like a reasonable thing for somebody to do in a 3rd-party
implementation).
Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we receive a request with extended attributes after the
NUL, we try to write those attributes to the log. We do so
with a "%s" format specifier, which will only show
characters up to the first NUL.
That's enough for printing a "host=" specifier. But since
dfe422d04d (daemon: recognize hidden request arguments,
2017-10-16) we may have another NUL, followed by protocol
parameters, and those are not logged at all.
Let's cut out the attempt to show the whole string, and
instead log when we parse individual attributes. We could
leave the "extended attributes (%d bytes) exist" part of the
log, which in theory could alert us to attributes that fail
to parse. But anything we don't parse as a "host=" parameter
gets blindly added to the "protocol" attribute, so we'd see
it in that part of the log.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If receive a request like:
git-upload-pack /foo.git\0host=localhost
we mark the offset of the NUL byte as "len", and then log
the bytes after the NUL with a "%.*s" placeholder, using
"pktlen - len" as the length, and "line + len + 1" as the
start of the string.
This is off-by-one, since the start of the string skips past
the separating NUL byte, but the adjusted length includes
it. Fortunately this doesn't actually read past the end of
the buffer, since "%.*s" will stop when it hits a NUL. And
regardless of what is in the buffer, packet_read() will
always add an extra NUL terminator for safety.
As an aside, the git.git client sends an extra NUL after a
"host" field, too, so we'd generally hit that one first, not
the one added by packet_read(). You can see this in the test
output which reports 15 bytes, even though the string has
only 14 bytes of visible data. But the point is that even a
client sending unusual data could not get us to read past
the end of the buffer, so this is purely a cosmetic fix.
Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A normal request to git-daemon is structured as
"command path/to/repo\0host=..\0" and due to a bug introduced in
49ba83fb6 (Add virtualization support to git-daemon, 2006-09-19) we
aren't able to place any extra arguments (separated by NULs) besides the
host otherwise the parsing of those arguments would enter an infinite
loop. This bug was fixed in 73bb33a94 (daemon: Strictly parse the
"extra arg" part of the command, 2009-06-04) but a check was put in
place to disallow extra arguments so that new clients wouldn't trigger
this bug in older servers.
In order to get around this limitation teach git-daemon to recognize
additional request arguments hidden behind a second NUL byte. Requests
can then be structured like:
"command path/to/repo\0host=..\0\0version=1\0key=value\0". git-daemon
can then parse out the extra arguments and set 'GIT_PROTOCOL'
accordingly.
By placing these extra arguments behind a second NUL byte we can skirt
around both the infinite loop bug in 49ba83fb6 (Add virtualization
support to git-daemon, 2006-09-19) as well as the explicit disallowing
of extra arguments introduced in 73bb33a94 (daemon: Strictly parse the
"extra arg" part of the command, 2009-06-04) because both of these
versions of git-daemon check for a single NUL byte after the host
argument before terminating the argument parsing.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gethostname(2) may not NUL terminate the buffer if hostname does
not fit; unfortunately there is no easy way to see if our buffer
was too small, but at least this will make sure we will not end up
using garbage past the end of the buffer.
* dt/xgethostname-nul-termination:
xgethostname: handle long hostnames
use HOST_NAME_MAX to size buffers for gethostname(2)
POSIX limits the length of host names to HOST_NAME_MAX. Export the
fallback definition from daemon.c and use this constant to make all
buffers used with gethostname(2) big enough for any possible result
and a terminating NUL.
Inspired-by: David Turner <dturner@twosigma.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: David Turner <dturner@twosigma.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our struct child_process already has its own argv_array.
Let's use that to avoid having to format options into
separate buffers.
Note that we'll need to declare the child process outside of
the run_service_command() helper to do this. But that opens
up a further simplification, which is that the helper can
append to our argument list, saving each caller from
specifying "." manually.
Signed-off-by: Jeff King <peff@peff.net>
"git daemon" used fixed-length buffers to turn URL to the
repository the client asked for into the server side directory
path, using snprintf() to avoid overflowing these buffers, but
allowed possibly truncated paths to the directory. This has been
tightened to reject such a request that causes overlong path to be
required to serve.
* jk/daemon-path-ok-check-truncation:
daemon: detect and reject too-long paths
The smudge/clean filter API expect an external process is spawned
to filter the contents for each path that has a filter defined. A
new type of "process" filter API has been added to allow the first
request to run the filter for a path to spawn a single process, and
all filtering need is served by this single process for multiple
paths, reducing the process creation overhead.
* ls/filter-process:
contrib/long-running-filter: add long running filter example
convert: add filter.<driver>.process option
convert: prepare filter.<driver>.process option
convert: make apply_filter() adhere to standard Git error handling
pkt-line: add functions to read/write flush terminated packet streams
pkt-line: add packet_write_gently()
pkt-line: add packet_flush_gently()
pkt-line: add packet_write_fmt_gently()
pkt-line: extract set_packet_header()
pkt-line: rename packet_write() to packet_write_fmt()
run-command: add clean_on_exit_handler
run-command: move check_pipe() from write_or_die to run_command
convert: modernize tests
convert: quote filter names in error messages
"git daemon" used fixed-length buffers to turn URL to the
repository the client asked for into the server side directory
path, using snprintf() to avoid overflowing these buffers, but
allowed possibly truncated paths to the directory. This has been
tightened to reject such a request that causes overlong path to be
required to serve.
* jk/daemon-path-ok-check-truncation:
daemon: detect and reject too-long paths
When we are checking the path via path_ok(), we use some
fixed PATH_MAX buffers. We write into them via snprintf(),
so there's no possibility of overflow, but it does mean we
may silently truncate the path, leading to potentially
confusing errors when the partial path does not exist.
We're better off to reject the path explicitly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.
packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>