git/compat
Ævar Arnfjörð Bjarmason 2d3491b117 tr2: log N parent process names on Linux
In 2f732bf15e (tr2: log parent process name, 2021-07-21) we started
logging parent process names, but only logged all parents on Windows.
on Linux only the name of the immediate parent process was logged.

Extend the functionality added there to also log full parent chain on
Linux.

This requires us to lookup "/proc/<getppid()>/stat" instead of
"/proc/<getppid()>/comm". The "comm" file just contains the name of the
process, but the "stat" file has both that information, and the parent
PID of that process, see procfs(5). We parse out the parent PID of our
own parent, and recursively walk the chain of "/proc/*/stat" files all
the way up the chain. A parent PID of 0 indicates the end of the
chain.

It's possible given the semantics of Linux's PID files that we end up
getting an entirely nonsensical chain of processes. It could happen if
e.g. we have a chain of processes like:

    1 (init) => 321 (bash) => 123 (git)

Let's assume that "bash" was started a while ago, and that as shown
the OS has already cycled back to using a lower PID for us than our
parent process. In the time it takes us to start up and get to
trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP) our parent
process might exit, and be replaced by an entirely different process!

We'd racily look up our own getppid(), but in the meantime our parent
would exit, and Linux would have cycled all the way back to starting
an entirely unrelated process as PID 321.

If that happens we'll just silently log incorrect data in our ancestry
chain. Luckily we don't need to worry about this except in this
specific cycling scenario, as Linux does not have PID
randomization. It appears it once did through a third-party feature,
but that it was removed around 2006[1]. For anyone worried about this
edge case raising PID_MAX via "/proc/sys/kernel/pid_max" will mitigate
it, but not eliminate it.

One thing we don't need to worry about is getting into an infinite
loop when walking "/proc/*/stat". See 353d3d77f4 (trace2: collect
Windows-specific process information, 2019-02-22) for the related
Windows code that needs to deal with that, and [2] for an explanation
of that edge case.

Aside from potential race conditions it's also a bit painful to
correctly parse the process name out of "/proc/*/stat". A simpler
approach is to use fscanf(), see [3] for an implementation of that,
but as noted in the comment being added here it would fail in the face
of some weird process names, so we need our own parse_proc_stat() to
parse it out.

With this patch the "ancestry" chain for a trace2 event might look
like this:

    $ GIT_TRACE2_EVENT=/dev/stdout ~/g/git/git version | grep ancestry | jq -r .ancestry
    [
      "bash",
      "screen",
      "systemd"
    ]

And in the case of naughty process names like the following. This uses
perl's ability to use prctl(PR_SET_NAME, ...). See
Perl/perl5@7636ea95c5 (Set the legacy process name with prctl() on
assignment to $0 on Linux, 2010-04-15)[4]:

    $ perl -e '$0 = "(naughty\nname)"; system "GIT_TRACE2_EVENT=/dev/stdout ~/g/git/git version"' | grep ancestry | jq -r .ancestry
    [
      "sh",
      "(naughty\nname)",
      "bash",
      "screen",
      "systemd"
    ]

1. https://grsecurity.net/news#grsec2110
2. https://lore.kernel.org/git/48a62d5e-28e2-7103-a5bb-5db7e197a4b9@jeffhostetler.com/
3. https://lore.kernel.org/git/87o8agp29o.fsf@evledraar.gmail.com/
4. 7636ea95c5

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 11:08:00 -07:00
..
linux tr2: log N parent process names on Linux 2021-09-07 11:08:00 -07:00
nedmalloc Fix spelling errors in no-longer-updated-from-upstream modules 2019-11-10 16:00:55 +09:00
poll mingw: workaround for hangs when sending STDIN 2020-02-27 14:23:29 -08:00
regex compat/regex: move stdlib.h up in inclusion chain 2020-04-27 11:21:16 -07:00
simple-ipc simple-ipc: correct ifdefs when NO_PTHREADS is defined 2021-05-21 07:55:00 +09:00
stub tr2: make process info collection platform-generic 2021-07-22 13:35:20 -07:00
vcbuild ci(vs-build): stop passing the iconv library location explicitly 2020-12-04 12:03:15 -08:00
win32 run-command: trigger PATH lookup properly on Cygwin 2020-03-27 11:06:17 -07:00
access.c git-compat-util: work around for access(X_OK) under root 2019-04-25 17:49:44 +09:00
apple-common-crypto.h imap-send: use HMAC() function provided by OpenSSL 2016-04-08 11:45:47 -07:00
basename.c compat/basename.c: provide a dirname() compatibility function 2016-01-12 10:40:54 -08:00
bswap.h compat/bswap.h: don't assume MSVC is little-endian 2020-11-11 11:24:47 -08:00
compiler.h bugreport: add compiler info 2020-04-16 15:23:42 -07:00
fileno.c git-compat-util: work around for access(X_OK) under root 2019-04-25 17:49:44 +09:00
fopen.c git_fopen: fix a sparse 'not declared' warning 2017-05-26 12:33:55 +09:00
hstrerror.c compat/hstrerror: convert sprintf to snprintf 2015-09-25 10:18:18 -07:00
inet_ntop.c compat/inet_ntop: fix off-by-one in inet_ntop4 2015-09-25 10:18:18 -07:00
inet_pton.c Drop system includes from inet_pton/inet_ntop compatibility wrappers 2012-02-05 16:32:33 -08:00
memmem.c
mingw.c mingw: align symlinks-related rmdir() behavior with Linux 2021-08-02 15:10:58 -07:00
mingw.h Makefile: add OPEN_RETURNS_EINTR knob 2021-02-26 14:15:51 -08:00
mkdir.c compat: some mkdir() do not like a slash at the end 2012-08-24 09:48:51 -07:00
mkdtemp.c Fix gitmkdtemp: correct test for mktemp() return value 2010-02-25 12:08:22 -08:00
mmap.c compat: make sure git_mmap is not expected to write 2018-10-25 18:51:03 +09:00
msvc.c win32: use our own dirent.h 2010-11-23 16:06:50 -08:00
msvc.h msvc: add pragmas for common warnings 2019-06-25 10:46:57 -07:00
obstack.c compat/obstack: fix -Wcast-function-type warnings 2019-01-17 11:13:38 -08:00
obstack.h obstack: avoid computing offsets from NULL pointer 2020-01-28 23:13:25 -08:00
open.c Makefile: add OPEN_RETURNS_EINTR knob 2021-02-26 14:15:51 -08:00
pread.c
precompose_utf8.c precompose_utf8: make precompose_string_if_needed() public 2021-04-05 17:30:04 -07:00
precompose_utf8.h precompose_utf8: make precompose_string_if_needed() public 2021-04-05 17:30:04 -07:00
qsort_s.c compat: add qsort_s() 2017-01-23 11:02:34 -08:00
setenv.c use st_add and st_mult for allocation size computation 2016-02-22 14:51:09 -08:00
sha1-chunked.c sha1: allow limiting the size of the data passed to SHA1_Update() 2015-11-05 10:35:11 -08:00
sha1-chunked.h sha1: allow limiting the size of the data passed to SHA1_Update() 2015-11-05 10:35:11 -08:00
snprintf.c MSVC: vsnprintf in Visual Studio 2015 doesn't need SNPRINTF_SIZE_CORR any more 2016-03-30 11:13:01 -07:00
stat.c compat: convert modes to use portable file type values 2014-12-04 11:58:36 -08:00
strcasestr.c
strdup.c compat: move strdup(3) replacement to its own file 2016-09-07 10:41:45 -07:00
strlcpy.c
strtoimax.c Add strtoimax() compatibility function. 2011-11-02 13:06:30 -07:00
strtoumax.c
terminal.c strvec: convert more callers away from argv_array name 2020-07-28 15:02:18 -07:00
terminal.h terminal: add a new function to read a single keystroke 2020-01-15 12:06:17 -08:00
unsetenv.c Revert "compat/unsetenv.c: Fix a sparse warning" 2013-07-21 15:09:56 -07:00
win32.h mingw: rename WIN32 cpp macro to GIT_WINDOWS_NATIVE 2013-05-08 12:14:35 -07:00
win32mmap.c mmap(win32): avoid expensive fstat() call 2016-04-22 15:01:16 -07:00
winansi.c mingw: work around incorrect standard handles 2019-11-23 11:17:01 +09:00