Commit graph

24 commits

Author SHA1 Message Date
Diomidis Spinellis
1819ad327b grep: fix multibyte regex handling under macOS
The commit 29de20504e (Makefile: fix default regex settings on
Darwin, 2013-05-11) fixed t0070-fundamental.sh under Darwin (macOS) by
adopting Git's regex library.  However, this library is compiled with
NO_MBSUPPORT, which causes git-grep to work incorrectly on multibyte
(e.g. UTF-8) files.  Current macOS versions pass t0070-fundamental.sh
with the native macOS regex library, which also supports multibyte
characters.

Adjust the Makefile to use the native regex library, and call
setlocale(3) to set CTYPE according to the user's preference.
The setlocale call is required on all platforms, but in platforms
supporting gettext(3), setlocale was called as a side-effect of
initializing gettext.  Therefore, move the CTYPE setlocale call from
gettext.c to common-main.c and the corresponding locale.h include
into git-compat-util.h.

Thanks to the global initialization of CTYPE setlocale, the test-tool
regex command now works correctly with supported multibyte regexes, and
is used to set the MB_REGEX test prerequisite by assessing a platform's
support for them.

Signed-off-by: Diomidis Spinellis <dds@aueb.gr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26 11:45:52 -07:00
Ævar Arnfjörð Bjarmason
0cc05b044f usage.c: add a non-fatal bug() function to go with BUG()
Add a bug() function to use in cases where we'd like to indicate a
runtime BUG(), but would like to defer the BUG() call because we're
possibly accumulating more bug() callers to exhaustively indicate what
went wrong.

We already have this sort of facility in various parts of the
codebase, just in the form of ad-hoc re-inventions of the
functionality that this new API provides. E.g. this will be used to
replace optbug() in parse-options.c, and the 'error("BUG:[...]' we do
in a loop in builtin/receive-pack.c.

Unlike the code this replaces we'll log to trace2 with this new bug()
function (as with other usage.c functions, including BUG()), we'll
also be able to avoid calls to xstrfmt() in some cases, as the bug()
function itself accepts variadic sprintf()-like arguments.

Any caller to bug() can follow up such calls with BUG_if_bug(),
which will BUG() out (i.e. abort()) if there were any preceding calls
to bug(), callers can also decide not to call BUG_if_bug() and leave
the resulting BUG() invocation until exit() time. There are currently
no bug() API users that don't call BUG_if_bug() themselves after a
for-loop, but allowing for not calling BUG_if_bug() keeps the API
flexible. As the tests and documentation here show we'll catch missing
BUG_if_bug() invocations in our exit() wrapper.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Ævar Arnfjörð Bjarmason
19d75948ef common-main.c: move non-trace2 exit() behavior out of trace2.c
Change the exit() wrapper added in ee4512ed48 (trace2: create new
combined trace facility, 2019-02-22) so that we'll split up the trace2
logging concerns from wanting to wrap the "exit()" function itself for
other purposes.

This makes more sense structurally, as we won't seem to conflate
non-trace2 behavior with the trace2 code. I'd previously added an
explanation for this in 368b584315 (common-main.c: call exit(), don't
return, 2021-12-07), that comment is being adjusted here.

Now the only thing we'll do if we're not using trace2 is to truncate
the "code" argument to the lowest 8 bits.

We only need to do that truncation on non-POSIX systems, but in
ee4512ed48 that "if defined(__MINGW32__)" code added in
47e3de0e79 (MinGW: truncate exit()'s argument to lowest 8 bits,
2009-07-05) was made to run everywhere. It might be good for clarify
to narrow that down by an "ifdef" again, but I'm not certain that in
the interim we haven't had some other non-POSIX systems rely the
behavior. On a POSIX system taking the lowest 8 bits is implicit, see
exit(3)[1] and wait(2)[2]. Let's leave a comment about that instead.

1. https://man7.org/linux/man-pages/man3/exit.3.html
2. https://man7.org/linux/man-pages/man2/wait.2.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:30 -07:00
Junio C Hamano
da81d473fc Merge branch 'en/keep-cwd'
Many git commands that deal with working tree files try to remove a
directory that becomes empty (i.e. "git switch" from a branch that
has the directory to another branch that does not would attempt
remove all files in the directory and the directory itself).  This
drops users into an unfamiliar situation if the command was run in
a subdirectory that becomes subject to removal due to the command.
The commands have been taught to keep an empty directory if it is
the directory they were started in to avoid surprising users.

* en/keep-cwd:
  t2501: simplify the tests since we can now assume desired behavior
  dir: new flag to remove_dir_recurse() to spare the original_cwd
  dir: avoid incidentally removing the original_cwd in remove_path()
  stash: do not attempt to remove startup_info->original_cwd
  rebase: do not attempt to remove startup_info->original_cwd
  clean: do not attempt to remove startup_info->original_cwd
  symlinks: do not include startup_info->original_cwd in dir removal
  unpack-trees: add special cwd handling
  unpack-trees: refuse to remove startup_info->original_cwd
  setup: introduce startup_info->original_cwd
  t2501: add various tests for removing the current working directory
2022-01-05 14:01:28 -08:00
Elijah Newren
e6f8861bd4 setup: introduce startup_info->original_cwd
Removing the current working directory causes all subsequent git
commands run from that directory to get confused and fail with a message
about being unable to read the current working directory:

    $ git status
    fatal: Unable to read current working directory: No such file or directory

Non-git commands likely have similar warnings or even errors, e.g.

    $ bash -c 'echo hello'
    shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
    hello

This confuses end users, particularly since the command they get the
error from is not the one that caused the problem; the problem came from
the side-effect of some previous command.

We would like to avoid removing the current working directory of our
parent process; towards this end, introduce a new variable,
startup_info->original_cwd, that tracks the current working directory
that we inherited from our parent process.  For convenience of later
comparisons, we prefer that this new variable store a path relative to
the toplevel working directory (thus much like 'prefix'), except without
the trailing slash.

Subsequent commits will make use of this new variable.

Acked-by: Derrick Stolee <stolee@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:33:12 -08:00
Ævar Arnfjörð Bjarmason
368b584315 common-main.c: call exit(), don't return
Change the main() function to call "exit()" instead of ending with a
"return" statement. The "exit()" function is our own wrapper that
calls trace2_cmd_exit_fl() for us, from git-compat-util.h:

	#define exit(code) exit(trace2_cmd_exit_fl(__FILE__, __LINE__, (code)))

That "exit()" wrapper has been in use ever since ee4512ed48 (trace2:
create new combined trace facility, 2019-02-22).

This changes nothing about how we "exit()", as we'd invoke
"trace2_cmd_exit_fl()" in both cases due to the wrapper, this change
makes it easier to reason about this code, as we're now always
obviously relying on our "exit()" wrapper.

There is already code immediately downstream of our "main()" which has
a hard reliance on that, e.g. the various "exit()" calls downstream of
"cmd_main()" in "git.c".

We even had a comment in "t/helper/test-trace2.c" that seemed to be
confused about how the "exit()" wrapper interacted with uses of
"return", even though it was introduced in the same trace2 series in
a15860dca3 (trace2: t/helper/test-trace2, t0210.sh, t0211.sh,
t0212.sh, 2019-02-22), after the aforementioned ee4512ed48. Perhaps
it pre-dated the "exit()" wrapper?

This change makes the "trace2_cmd_exit()" macro orphaned, we now
always use "trace2_cmd_exit_fl()" directly, but let's keep that
simpler example in place. Even if we're unlikely to get another
"main()" other than the one in our "common-main.c", there's some value
in having the API documentation and example discuss a simpler version
that doesn't require an "exit()" wrapper macro.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-07 12:29:57 -08:00
Jeff King
5732f2b1ef common-main: delay trace2 initialization
We initialize the trace2 system in the common main() function so that
all programs (even ones that aren't builtins) will enable tracing. But
trace2 startup is relatively heavy-weight, as we have to actually read
on-disk config to decide whether to trace. This can cause unexpected
interactions with other common-main initialization. For instance, we'll
end up in the config code before calling initialize_the_repository(),
and the usual invariant that the_repository is never NULL will not hold.

Let's push the trace2 initialization further down in common-main, to
just before we execute cmd_main(). The other parts of the initialization
are much more self-contained and less likely to call library code that
depends on those kinds of invariants.

Originally the trace2 code tried to start as early as possible to get
accurate timings. But the timer initialization was split out from the
config reading in a089724958 (trace2: refactor setting process starting
time, 2019-04-15), so there shouldn't be any impact from this patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-06 13:09:01 -07:00
Jeff Hostetler
26c6f251d7 trace2: report peak memory usage of the process
Teach Windows version of git to report peak memory usage
during exit() processing.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16 13:37:07 +09:00
Jeff Hostetler
a7bc01eb25 trace2: find exec-dir before trace2 initialization
Teach Git to resolve the executable directory before initializing
Trace2.  This allows the system configuration directory to be
discovered earlier (because it is sometimes relative to the prefix
or runtime-prefix).

This will be used by the next commit to allow trace2 settings to
be loaded from the system config.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16 13:37:06 +09:00
Jeff Hostetler
a089724958 trace2: refactor setting process starting time
Create trace2_initialize_clock() and call from main() to capture
process start time in isolation and before other sub-systems are
ready.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16 13:37:06 +09:00
Jeff Hostetler
353d3d77f4 trace2: collect Windows-specific process information
Add platform-specific interface to log information about the current
process.

On Windows, this interface is used to indicate whether the git process
is running under a debugger and list names of the process ancestors.

Information for other platforms is left for a future effort.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22 15:27:59 -08:00
Jeff Hostetler
ee4512ed48 trace2: create new combined trace facility
Create a new unified tracing facility for git.  The eventual intent is to
replace the current trace_printf* and trace_performance* routines with a
unified set of git_trace2* routines.

In addition to the usual printf-style API, trace2 provides higer-level
event verbs with fixed-fields allowing structured data to be written.
This makes post-processing and analysis easier for external tools.

Trace2 defines 3 output targets.  These are set using the environment
variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT".  These may be
set to "1" or to an absolute pathname (just like the current GIT_TRACE).

* GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command
  summary data.

* GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE.
  It extends the output with columns for the command process, thread,
  repo, absolute and relative elapsed times.  It reports events for
  child process start/stop, thread start/stop, and per-thread function
  nesting.

* GIT_TR2_EVENT is a new structured format. It writes event data as a
  series of JSON records.

Calls to trace2 functions log to any of the 3 output targets enabled
without the need to call different trace_printf* or trace_performance*
routines.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22 15:27:59 -08:00
Junio C Hamano
92034a9cd5 Merge branch 'dj/runtime-prefix'
A build-time option has been added to allow Git to be told to refer
to its associated files relative to the main binary, in the same
way that has been possible on Windows for quite some time, for
Linux, BSDs and Darwin.

* dj/runtime-prefix:
  Makefile: quote $INSTLIBDIR when passing it to sed
  Makefile: remove unused @@PERLLIBDIR@@ substitution variable
  mingw/msvc: use the new-style RUNTIME_PREFIX helper
  exec_cmd: provide a new-style RUNTIME_PREFIX helper for Windows
  exec_cmd: RUNTIME_PREFIX on some POSIX systems
  Makefile: add Perl runtime prefix support
  Makefile: generate Perl header from template file
2018-05-08 15:59:17 +09:00
Stefan Beller
d807c4a01d exec_cmd: rename to use dash in file name
This is more consistent with the project style. The majority of Git's
source files use dashes in preference to underscores in their file names.

Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11 18:11:00 +09:00
Dan Jacques
226c0ddd0d exec_cmd: RUNTIME_PREFIX on some POSIX systems
Enable Git to resolve its own binary location using a variety of
OS-specific and generic methods, including:

- procfs via "/proc/self/exe" (Linux)
- _NSGetExecutablePath (Darwin)
- KERN_PROC_PATHNAME sysctl on BSDs.
- argv0, if absolute (all, including Windows).

This is used to enable RUNTIME_PREFIX support for non-Windows systems,
notably Linux and Darwin. When configured with RUNTIME_PREFIX, Git will
do a best-effort resolution of its executable path and automatically use
this as its "exec_path" for relative helper and data lookups, unless
explicitly overridden.

Small incidental formatting cleanup of "exec_cmd.c".

Signed-off-by: Dan Jacques <dnj@google.com>
Thanks-to: Robbie Iannucci <iannucci@google.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 18:10:28 +09:00
Nguyễn Thái Ngọc Duy
b2f0eceecf repository: initialize the_repository in main()
This simplifies initialization of struct repository and anything
inside. Easier to read. Easier to add/remove fields.

Everything will go through main() common-main.c so this should cover all
programs, including t/helper.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05 11:14:03 -08:00
Brandon Williams
1a600b7555 attr: use hashmap for attribute dictionary
The current implementation of the attribute dictionary uses a custom
hashtable.  This modernizes the dictionary by converting it to the builtin
'hashmap' structure.

Also, in order to enable a threaded API in the future add an
accompanying mutex which must be acquired prior to accessing the
dictionary of interned attributes.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:53 -08:00
Jeff King
6854a8f5c9 common-main: stop munging argv[0] path
Since 650c44925 (common-main: call git_extract_argv0_path(),
2016-07-01), the argv[0] that is seen in cmd_main() of
individual programs is always the basename of the
executable, as common-main strips off the full path. This
can produce confusing results for git-daemon, which wants to
re-exec itself.

For instance, if the program was originally run as
"/usr/lib/git/git-daemon", it will try just re-execing
"git-daemon", which will find the first instance in $PATH.
If git's exec-path has not been prepended to $PATH, we may
find the git-daemon from a different version (or no
git-daemon at all).

Normally this isn't a problem. Git commands are run as "git
daemon", the git wrapper puts the exec-path at the front of
$PATH, and argv[0] is already "daemon" anyway. But running
git-daemon via its full exec-path, while not really a
recommended method, did work prior to 650c44925. Let's make
it work again.

The real goal of 650c44925 was not to munge argv[0], but to
reliably set the argv0_path global. The only reason it
munges at all is that one caller, the git.c wrapper,
piggy-backed on that computation to find the command
basename.  Instead, let's leave argv[0] untouched in
common-main, and have git.c do its own basename computation.

While we're at it, let's drop the return value from
git_extract_argv0_path(). It was only ever used in this one
callsite, and its dual purposes is what led to this
confusion in the first place.

Note that by changing the interface, the compiler can
confirm for us that there are no other callers storing the
return value. But the compiler can't tell us whether any of
the cmd_main() functions (besides git.c) were relying on the
basename munging. However, we can observe that prior to
650c44925, no other cmd_main() functions did that munging,
and no new cmd_main() functions have been introduced since
then. So we can't be regressing any of those cases.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-29 11:01:48 -08:00
Johannes Schindelin
08aade7080 mingw: declare main()'s argv as const
In 84d32bf (sparse: Fix mingw_main() argument number/type errors,
2013-04-27), we addressed problems identified by the 'sparse' tool where
argv was declared inconsistently. The way we addressed it was by casting
from the non-const version to the const-version.

This patch is long overdue, fixing compat/mingw.h's declaration to
make the "argv" parameter const.  This also allows us to lose the
"const" trickery introduced earlier to common-main.c:main().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-06 08:11:47 -07:00
Jeff King
5ce5f5fa5a common-main: call git_setup_gettext()
This should be part of every program, as otherwise users do
not get translated error messages. However, some external
commands forgot to do so (e.g., git-credential-store). This
fixes them, and eliminates the repeated code in programs
that did remember to use it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 15:09:10 -07:00
Jeff King
12e0437f23 common-main: call restore_sigpipe_to_default()
This is another safety/sanity setup that should be in force
everywhere, but which we only applied in git.c. This did
catch most cases, since even external commands are typically
run via "git ..." (and the restoration applies to
sub-processes, too). But there were cases we missed, such as
somebody calling git-upload-pack directly via ssh, or
scripts which use dashed external commands directly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 15:09:10 -07:00
Jeff King
57f5d52a94 common-main: call sanitize_stdfds()
This is setup that should be done in every program for
safety, but we never got around to adding it everywhere (so
builtins benefited from the call in git.c, but any external
commands did not). Putting it in the common main() gives us
this safety everywhere.

Note that the case in daemon.c is a little funny. We wait
until we know whether we want to daemonize, and then either:

 - call daemonize(), which will close stdio and reopen it to
   /dev/null under the hood

 - sanitize_stdfds(), to fix up any odd cases

But that is way too late; the point of sanitizing is to give
us reliable descriptors on 0/1/2, and we will already have
executed code, possibly called die(), etc. The sanitizing
should be the very first thing that happens.

With this patch, git-daemon will sanitize first, and can
remove the call in the non-daemonize case. It does mean that
daemonize() may just end up closing the descriptors we
opened, but that's not a big deal (it's not wrong to do so,
nor is it really less optimal than the case where our parent
process redirected us from /dev/null ahead of time).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 15:09:10 -07:00
Jeff King
650c449250 common-main: call git_extract_argv0_path()
Every program which links against libgit.a must call this
function, or risk hitting an assert() in system_path() that
checks whether we have configured argv0_path (though only
when RUNTIME_PREFIX is defined, so essentially only on
Windows).

Looking at the diff, you can see that putting it into the
common main() saves us having to do it individually in each
of the external commands. But what you can't see are the
cases where we _should_ have been doing so, but weren't
(e.g., git-credential-store, and all of the t/helper test
programs).

This has been an accident-waiting-to-happen for a long time,
but wasn't triggered until recently because it involves one
of those programs actually calling system_path(). That
happened with git-credential-store in v2.8.0 with ae5f677
(lazily load core.sharedrepository, 2016-03-11). The
program:

  - takes a lock file, which...

  - opens a tempfile, which...

  - calls adjust_shared_perm to fix permissions, which...

  - lazy-loads the config (as of ae5f677), which...

  - calls system_path() to find the location of
    /etc/gitconfig

On systems with RUNTIME_PREFIX, this means credential-store
reliably hits that assert() and cannot be used.

We never noticed in the test suite, because we set
GIT_CONFIG_NOSYSTEM there, which skips the system_path()
lookup entirely.  But if we were to tweak git_config() to
find /etc/gitconfig even when we aren't going to open it,
then the test suite shows multiple failures (for
credential-store, and for some other test helpers). I didn't
include that tweak here because it's way too specific to
this particular call to be worth carrying around what is
essentially dead code.

The implementation is fairly straightforward, with one
exception: there is exactly one caller (git.c) that actually
cares about the result of the function, and not the
side-effect of setting up argv0_path. We can accommodate
that by simply replacing the value of argv[0] in the array
we hand down to cmd_main().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 15:09:10 -07:00
Jeff King
3f2e2297b9 add an extra level of indirection to main()
There are certain startup tasks that we expect every git
process to do. In some cases this is just to improve the
quality of the program (e.g., setting up gettext()). In
others it is a requirement for using certain functions in
libgit.a (e.g., system_path() expects that you have called
git_extract_argv0_path()).

Most commands are builtins and are covered by the git.c
version of main(). However, there are still a few external
commands that use their own main(). Each of these has to
remember to include the correct startup sequence, and we are
not always consistent.

Rather than just fix the inconsistencies, let's make this
harder to get wrong by providing a common main() that can
run this standard startup.

We basically have two options to do this:

 - the compat/mingw.h file already does something like this by
   adding a #define that replaces the definition of main with a
   wrapper that calls mingw_startup().

   The upside is that the code in each program doesn't need
   to be changed at all; it's rewritten on the fly by the
   preprocessor.

   The downside is that it may make debugging of the startup
   sequence a bit more confusing, as the preprocessor is
   quietly inserting new code.

 - the builtin functions are all of the form cmd_foo(),
   and git.c's main() calls them.

   This is much more explicit, which may make things more
   obvious to somebody reading the code. It's also more
   flexible (because of course we have to figure out _which_
   cmd_foo() to call).

   The downside is that each of the builtins must define
   cmd_foo(), instead of just main().

This patch chooses the latter option, preferring the more
explicit approach, even though it is more invasive. We
introduce a new file common-main.c, with the "real" main. It
expects to call cmd_main() from whatever other objects it is
linked against.

We link common-main.o against anything that links against
libgit.a, since we know that such programs will need to do
this setup. Note that common-main.o can't actually go inside
libgit.a, as the linker would not pick up its main()
function automatically (it has no callers).

The rest of the patch is just adjusting all of the various
external programs (mostly in t/helper) to use cmd_main().
I've provided a global declaration for cmd_main(), which
means that all of the programs also need to match its
signature. In particular, many functions need to switch to
"const char **" instead of "char **" for argv. This effect
ripples out to a few other variables and functions, as well.

This makes the patch even more invasive, but the end result
is much better. We should be treating argv strings as const
anyway, and now all programs conform to the same signature
(which also matches the way builtins are defined).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 15:09:10 -07:00