git/t/helper/test-run-command.c

463 lines
11 KiB
C
Raw Normal View History

/*
* test-run-command.c: test run command API.
*
* (C) 2009 Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
*
* This code is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include "test-tool.h"
#include "run-command.h"
#include "strvec.h"
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
#include "strbuf.h"
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
#include "parse-options.h"
#include "string-list.h"
#include "thread-utils.h"
#include "wildmatch.h"
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
static int number_callbacks;
static int parallel_next(struct child_process *cp,
struct strbuf *err,
void *cb,
void **task_cb UNUSED)
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
{
struct child_process *d = cb;
if (number_callbacks >= 4)
return 0;
strvec_pushv(&cp->args, d->args.v);
run-command: add an "ungroup" option to run_process_parallel() Extend the parallel execution API added in c553c72eed6 (run-command: add an asynchronous parallel child processor, 2015-12-15) to support a mode where the stdout and stderr of the processes isn't captured and output in a deterministic order, instead we'll leave it to the kernel and stdio to sort it out. This gives the API same functionality as GNU parallel's --ungroup option. As we'll see in a subsequent commit the main reason to want this is to support stdout and stderr being connected to the TTY in the case of jobs=1, demonstrated here with GNU parallel: $ parallel --ungroup 'test -t {} && echo TTY || echo NTTY' ::: 1 2 TTY TTY $ parallel 'test -t {} && echo TTY || echo NTTY' ::: 1 2 NTTY NTTY Another is as GNU parallel's documentation notes a potential for optimization. As demonstrated in next commit our results with "git hook run" will be similar, but generally speaking this shows that if you want to run processes in parallel where the exact order isn't important this can be a lot faster: $ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null ' Benchmark 1: parallel seq ::: 10000000 >/dev/null Time (mean ± σ): 220.2 ms ± 9.3 ms [User: 124.9 ms, System: 96.1 ms] Range (min … max): 212.3 ms … 230.5 ms 3 runs Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null Time (mean ± σ): 154.7 ms ± 0.9 ms [User: 136.2 ms, System: 25.1 ms] Range (min … max): 153.9 ms … 155.7 ms 3 runs Summary 'parallel --ungroup seq ::: 10000000 >/dev/null ' ran 1.42 ± 0.06 times faster than 'parallel seq ::: 10000000 >/dev/null ' A large part of the juggling in the API is to make the API safer for its maintenance and consumers alike. For the maintenance of the API we e.g. avoid malloc()-ing the "pp->pfd", ensuring that SANITIZE=address and other similar tools will catch any unexpected misuse. For API consumers we take pains to never pass the non-NULL "out" buffer to an API user that provided the "ungroup" option. The resulting code in t/helper/test-run-command.c isn't typical of such a user, i.e. they'd typically use one mode or the other, and would know whether they'd provided "ungroup" or not. We could also avoid the strbuf_init() for "buffered_output" by having "struct parallel_processes" use a static PARALLEL_PROCESSES_INIT initializer, but let's leave that cleanup for later. Using a global "run_processes_parallel_ungroup" variable to enable this option is rather nasty, but is being done here to produce as minimal of a change as possible for a subsequent regression fix. This change is extracted from a larger initial version[1] which ends up with a better end-state for the API, but in doing so needed to modify all existing callers of the API. Let's defer that for now, and narrowly focus on what we need for fixing the regression in the subsequent commit. It's safe to do this with a global variable because: A) hook.c is the only user of it that sets it to non-zero, and before we'll get any other API users we'll refactor away this method of passing in the option, i.e. re-roll [1]. B) Even if hook.c wasn't the only user we don't have callers of this API that concurrently invoke this parallel process starting API itself in parallel. As noted above "A" && "B" are rather nasty, and we don't want to live with those caveats long-term, but for now they should be an acceptable compromise. 1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 08:48:19 +00:00
if (err)
strbuf_addstr(err, "preloaded output of a child\n");
else
fprintf(stderr, "preloaded output of a child\n");
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
number_callbacks++;
return 1;
}
static int no_job(struct child_process *cp UNUSED,
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
struct strbuf *err,
void *cb UNUSED,
void **task_cb UNUSED)
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
{
run-command: add an "ungroup" option to run_process_parallel() Extend the parallel execution API added in c553c72eed6 (run-command: add an asynchronous parallel child processor, 2015-12-15) to support a mode where the stdout and stderr of the processes isn't captured and output in a deterministic order, instead we'll leave it to the kernel and stdio to sort it out. This gives the API same functionality as GNU parallel's --ungroup option. As we'll see in a subsequent commit the main reason to want this is to support stdout and stderr being connected to the TTY in the case of jobs=1, demonstrated here with GNU parallel: $ parallel --ungroup 'test -t {} && echo TTY || echo NTTY' ::: 1 2 TTY TTY $ parallel 'test -t {} && echo TTY || echo NTTY' ::: 1 2 NTTY NTTY Another is as GNU parallel's documentation notes a potential for optimization. As demonstrated in next commit our results with "git hook run" will be similar, but generally speaking this shows that if you want to run processes in parallel where the exact order isn't important this can be a lot faster: $ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null ' Benchmark 1: parallel seq ::: 10000000 >/dev/null Time (mean ± σ): 220.2 ms ± 9.3 ms [User: 124.9 ms, System: 96.1 ms] Range (min … max): 212.3 ms … 230.5 ms 3 runs Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null Time (mean ± σ): 154.7 ms ± 0.9 ms [User: 136.2 ms, System: 25.1 ms] Range (min … max): 153.9 ms … 155.7 ms 3 runs Summary 'parallel --ungroup seq ::: 10000000 >/dev/null ' ran 1.42 ± 0.06 times faster than 'parallel seq ::: 10000000 >/dev/null ' A large part of the juggling in the API is to make the API safer for its maintenance and consumers alike. For the maintenance of the API we e.g. avoid malloc()-ing the "pp->pfd", ensuring that SANITIZE=address and other similar tools will catch any unexpected misuse. For API consumers we take pains to never pass the non-NULL "out" buffer to an API user that provided the "ungroup" option. The resulting code in t/helper/test-run-command.c isn't typical of such a user, i.e. they'd typically use one mode or the other, and would know whether they'd provided "ungroup" or not. We could also avoid the strbuf_init() for "buffered_output" by having "struct parallel_processes" use a static PARALLEL_PROCESSES_INIT initializer, but let's leave that cleanup for later. Using a global "run_processes_parallel_ungroup" variable to enable this option is rather nasty, but is being done here to produce as minimal of a change as possible for a subsequent regression fix. This change is extracted from a larger initial version[1] which ends up with a better end-state for the API, but in doing so needed to modify all existing callers of the API. Let's defer that for now, and narrowly focus on what we need for fixing the regression in the subsequent commit. It's safe to do this with a global variable because: A) hook.c is the only user of it that sets it to non-zero, and before we'll get any other API users we'll refactor away this method of passing in the option, i.e. re-roll [1]. B) Even if hook.c wasn't the only user we don't have callers of this API that concurrently invoke this parallel process starting API itself in parallel. As noted above "A" && "B" are rather nasty, and we don't want to live with those caveats long-term, but for now they should be an acceptable compromise. 1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 08:48:19 +00:00
if (err)
strbuf_addstr(err, "no further jobs available\n");
else
fprintf(stderr, "no further jobs available\n");
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
return 0;
}
static int task_finished(int result UNUSED,
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
struct strbuf *err,
void *pp_cb UNUSED,
void *pp_task_cb UNUSED)
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
{
run-command: add an "ungroup" option to run_process_parallel() Extend the parallel execution API added in c553c72eed6 (run-command: add an asynchronous parallel child processor, 2015-12-15) to support a mode where the stdout and stderr of the processes isn't captured and output in a deterministic order, instead we'll leave it to the kernel and stdio to sort it out. This gives the API same functionality as GNU parallel's --ungroup option. As we'll see in a subsequent commit the main reason to want this is to support stdout and stderr being connected to the TTY in the case of jobs=1, demonstrated here with GNU parallel: $ parallel --ungroup 'test -t {} && echo TTY || echo NTTY' ::: 1 2 TTY TTY $ parallel 'test -t {} && echo TTY || echo NTTY' ::: 1 2 NTTY NTTY Another is as GNU parallel's documentation notes a potential for optimization. As demonstrated in next commit our results with "git hook run" will be similar, but generally speaking this shows that if you want to run processes in parallel where the exact order isn't important this can be a lot faster: $ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null ' Benchmark 1: parallel seq ::: 10000000 >/dev/null Time (mean ± σ): 220.2 ms ± 9.3 ms [User: 124.9 ms, System: 96.1 ms] Range (min … max): 212.3 ms … 230.5 ms 3 runs Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null Time (mean ± σ): 154.7 ms ± 0.9 ms [User: 136.2 ms, System: 25.1 ms] Range (min … max): 153.9 ms … 155.7 ms 3 runs Summary 'parallel --ungroup seq ::: 10000000 >/dev/null ' ran 1.42 ± 0.06 times faster than 'parallel seq ::: 10000000 >/dev/null ' A large part of the juggling in the API is to make the API safer for its maintenance and consumers alike. For the maintenance of the API we e.g. avoid malloc()-ing the "pp->pfd", ensuring that SANITIZE=address and other similar tools will catch any unexpected misuse. For API consumers we take pains to never pass the non-NULL "out" buffer to an API user that provided the "ungroup" option. The resulting code in t/helper/test-run-command.c isn't typical of such a user, i.e. they'd typically use one mode or the other, and would know whether they'd provided "ungroup" or not. We could also avoid the strbuf_init() for "buffered_output" by having "struct parallel_processes" use a static PARALLEL_PROCESSES_INIT initializer, but let's leave that cleanup for later. Using a global "run_processes_parallel_ungroup" variable to enable this option is rather nasty, but is being done here to produce as minimal of a change as possible for a subsequent regression fix. This change is extracted from a larger initial version[1] which ends up with a better end-state for the API, but in doing so needed to modify all existing callers of the API. Let's defer that for now, and narrowly focus on what we need for fixing the regression in the subsequent commit. It's safe to do this with a global variable because: A) hook.c is the only user of it that sets it to non-zero, and before we'll get any other API users we'll refactor away this method of passing in the option, i.e. re-roll [1]. B) Even if hook.c wasn't the only user we don't have callers of this API that concurrently invoke this parallel process starting API itself in parallel. As noted above "A" && "B" are rather nasty, and we don't want to live with those caveats long-term, but for now they should be an acceptable compromise. 1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 08:48:19 +00:00
if (err)
strbuf_addstr(err, "asking for a quick stop\n");
else
fprintf(stderr, "asking for a quick stop\n");
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
return 1;
}
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
struct testsuite {
struct string_list tests, failed;
int next;
int quiet, immediate, verbose, verbose_log, trace, write_junit_xml;
};
#define TESTSUITE_INIT { \
.tests = STRING_LIST_INIT_DUP, \
.failed = STRING_LIST_INIT_DUP, \
}
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
static int next_test(struct child_process *cp, struct strbuf *err, void *cb,
void **task_cb)
{
struct testsuite *suite = cb;
const char *test;
if (suite->next >= suite->tests.nr)
return 0;
test = suite->tests.items[suite->next++].string;
strvec_pushl(&cp->args, "sh", test, NULL);
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (suite->quiet)
strvec_push(&cp->args, "--quiet");
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (suite->immediate)
strvec_push(&cp->args, "-i");
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (suite->verbose)
strvec_push(&cp->args, "-v");
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (suite->verbose_log)
strvec_push(&cp->args, "-V");
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (suite->trace)
strvec_push(&cp->args, "-x");
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (suite->write_junit_xml)
strvec_push(&cp->args, "--write-junit-xml");
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
strbuf_addf(err, "Output of '%s':\n", test);
*task_cb = (void *)test;
return 1;
}
static int test_finished(int result, struct strbuf *err, void *cb,
void *task_cb)
{
struct testsuite *suite = cb;
const char *name = (const char *)task_cb;
if (result)
string_list_append(&suite->failed, name);
strbuf_addf(err, "%s: '%s'\n", result ? "FAIL" : "SUCCESS", name);
return 0;
}
static int test_failed(struct strbuf *out, void *cb, void *task_cb)
{
struct testsuite *suite = cb;
const char *name = (const char *)task_cb;
string_list_append(&suite->failed, name);
strbuf_addf(out, "FAILED TO START: '%s'\n", name);
return 0;
}
static const char * const testsuite_usage[] = {
"test-run-command testsuite [<options>] [<pattern>...]",
NULL
};
static int testsuite(int argc, const char **argv)
{
struct testsuite suite = TESTSUITE_INIT;
int max_jobs = 1, i, ret = 0;
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
DIR *dir;
struct dirent *d;
struct option options[] = {
OPT_BOOL('i', "immediate", &suite.immediate,
"stop at first failed test case(s)"),
OPT_INTEGER('j', "jobs", &max_jobs, "run <N> jobs in parallel"),
OPT_BOOL('q', "quiet", &suite.quiet, "be terse"),
OPT_BOOL('v', "verbose", &suite.verbose, "be verbose"),
OPT_BOOL('V', "verbose-log", &suite.verbose_log,
"be verbose, redirected to a file"),
OPT_BOOL('x', "trace", &suite.trace, "trace shell commands"),
OPT_BOOL(0, "write-junit-xml", &suite.write_junit_xml,
"write JUnit-style XML files"),
OPT_END()
};
struct run_process_parallel_opts opts = {
.get_next_task = next_test,
.start_failure = test_failed,
.task_finished = test_finished,
.data = &suite,
};
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
argc = parse_options(argc, argv, NULL, options,
testsuite_usage, PARSE_OPT_STOP_AT_NON_OPTION);
if (max_jobs <= 0)
max_jobs = online_cpus();
dir = opendir(".");
if (!dir)
die("Could not open the current directory");
while ((d = readdir(dir))) {
const char *p = d->d_name;
if (*p != 't' || !isdigit(p[1]) || !isdigit(p[2]) ||
!isdigit(p[3]) || !isdigit(p[4]) || p[5] != '-' ||
!ends_with(p, ".sh"))
continue;
/* No pattern: match all */
if (!argc) {
string_list_append(&suite.tests, p);
continue;
}
for (i = 0; i < argc; i++)
if (!wildmatch(argv[i], p, 0)) {
string_list_append(&suite.tests, p);
break;
}
}
closedir(dir);
if (!suite.tests.nr)
die("No tests match!");
if (max_jobs > suite.tests.nr)
max_jobs = suite.tests.nr;
fprintf(stderr, "Running %"PRIuMAX" tests (%d at a time)\n",
(uintmax_t)suite.tests.nr, max_jobs);
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
opts.processes = max_jobs;
run_processes_parallel(&opts);
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (suite.failed.nr > 0) {
ret = 1;
fprintf(stderr, "%"PRIuMAX" tests failed:\n\n",
(uintmax_t)suite.failed.nr);
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
for (i = 0; i < suite.failed.nr; i++)
fprintf(stderr, "\t%s\n", suite.failed.items[i].string);
}
string_list_clear(&suite.tests, 0);
string_list_clear(&suite.failed, 0);
return ret;
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
}
static uint64_t my_random_next = 1234;
static uint64_t my_random(void)
{
uint64_t res = my_random_next;
my_random_next = my_random_next * 1103515245 + 12345;
return res;
}
static int quote_stress_test(int argc, const char **argv)
{
/*
* We are running a quote-stress test.
* spawn a subprocess that runs quote-stress with a
* special option that echoes back the arguments that
* were passed in.
*/
char special[] = ".?*\\^_\"'`{}()[]<>@~&+:;$%"; // \t\r\n\a";
int i, j, k, trials = 100, skip = 0, msys2 = 0;
struct strbuf out = STRBUF_INIT;
struct strvec args = STRVEC_INIT;
struct option options[] = {
OPT_INTEGER('n', "trials", &trials, "number of trials"),
OPT_INTEGER('s', "skip", &skip, "skip <n> trials"),
OPT_BOOL('m', "msys2", &msys2, "test quoting for MSYS2's sh"),
OPT_END()
};
const char * const usage[] = {
"test-tool run-command quote-stress-test <options>",
NULL
};
argc = parse_options(argc, argv, NULL, options, usage, 0);
setenv("MSYS_NO_PATHCONV", "1", 0);
for (i = 0; i < trials; i++) {
struct child_process cp = CHILD_PROCESS_INIT;
size_t arg_count, arg_offset;
int ret = 0;
strvec_clear(&args);
if (msys2)
strvec_pushl(&args, "sh", "-c",
"printf %s\\\\0 \"$@\"", "skip", NULL);
else
strvec_pushl(&args, "test-tool", "run-command",
"quote-echo", NULL);
arg_offset = args.nr;
if (argc > 0) {
trials = 1;
arg_count = argc;
for (j = 0; j < arg_count; j++)
strvec_push(&args, argv[j]);
} else {
arg_count = 1 + (my_random() % 5);
for (j = 0; j < arg_count; j++) {
char buf[20];
size_t min_len = 1;
size_t arg_len = min_len +
(my_random() % (ARRAY_SIZE(buf) - min_len));
for (k = 0; k < arg_len; k++)
buf[k] = special[my_random() %
ARRAY_SIZE(special)];
buf[arg_len] = '\0';
strvec_push(&args, buf);
}
}
if (i < skip)
continue;
strvec_pushv(&cp.args, args.v);
strbuf_reset(&out);
if (pipe_command(&cp, NULL, 0, &out, 0, NULL, 0) < 0)
return error("Failed to spawn child process");
for (j = 0, k = 0; j < arg_count; j++) {
const char *arg = args.v[j + arg_offset];
if (strcmp(arg, out.buf + k))
ret = error("incorrectly quoted arg: '%s', "
"echoed back as '%s'",
arg, out.buf + k);
k += strlen(out.buf + k) + 1;
}
if (k != out.len)
ret = error("got %d bytes, but consumed only %d",
(int)out.len, (int)k);
if (ret) {
fprintf(stderr, "Trial #%d failed. Arguments:\n", i);
for (j = 0; j < arg_count; j++)
fprintf(stderr, "arg #%d: '%s'\n",
(int)j, args.v[j + arg_offset]);
strbuf_release(&out);
strvec_clear(&args);
return ret;
}
if (i && (i % 100) == 0)
fprintf(stderr, "Trials completed: %d\n", (int)i);
}
strbuf_release(&out);
strvec_clear(&args);
return 0;
}
static int quote_echo(int argc, const char **argv)
{
while (argc > 1) {
fwrite(argv[1], strlen(argv[1]), 1, stdout);
fputc('\0', stdout);
argv++;
argc--;
}
return 0;
}
static int inherit_handle(const char *argv0)
{
struct child_process cp = CHILD_PROCESS_INIT;
char path[PATH_MAX];
int tmp;
/* First, open an inheritable handle */
xsnprintf(path, sizeof(path), "out-XXXXXX");
tmp = xmkstemp(path);
strvec_pushl(&cp.args,
"test-tool", argv0, "inherited-handle-child", NULL);
cp.in = -1;
cp.no_stdout = cp.no_stderr = 1;
if (start_command(&cp) < 0)
die("Could not start child process");
/* Then close it, and try to delete it. */
close(tmp);
if (unlink(path))
die("Could not delete '%s'", path);
if (close(cp.in) < 0 || finish_command(&cp) < 0)
die("Child did not finish");
return 0;
}
static int inherit_handle_child(void)
{
struct strbuf buf = STRBUF_INIT;
if (strbuf_read(&buf, 0, 0) < 0)
die("Could not read stdin");
printf("Received %s\n", buf.buf);
strbuf_release(&buf);
return 0;
}
int cmd__run_command(int argc, const char **argv)
{
struct child_process proc = CHILD_PROCESS_INIT;
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
int jobs;
int ret;
struct run_process_parallel_opts opts = {
.data = &proc,
};
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (argc > 1 && !strcmp(argv[1], "testsuite"))
return testsuite(argc - 1, argv + 1);
if (!strcmp(argv[1], "inherited-handle"))
return inherit_handle(argv[0]);
if (!strcmp(argv[1], "inherited-handle-child"))
return inherit_handle_child();
test-tool run-command: learn to run (parts of) the testsuite Git for Windows jumps through hoops to provide a development environment that allows to build Git and to run its test suite. To that end, an entire MSYS2 system, including GNU make and GCC is offered as "the Git for Windows SDK". It does come at a price: an initial download of said SDK weighs in with several hundreds of megabytes, and the unpacked SDK occupies ~2GB of disk space. A much more native development environment on Windows is Visual Studio. To help contributors use that environment, we already have a Makefile target `vcxproj` that generates a commit with project files (and other generated files), and Git for Windows' `vs/master` branch is continuously re-generated using that target. The idea is to allow building Git in Visual Studio, and to run individual tests using a Portable Git. The one missing thing is a way to run the entire test suite: neither `make` nor `prove` are required to run Git, therefore Git for Windows does not support those commands in the Portable Git. To help with that, add a simple test helper that exercises the `run_processes_parallel()` function to allow for running test scripts in parallel (which is really necessary, especially on Windows, as Git's test suite takes such a long time to run). This will also come in handy for the upcoming change to our Azure Pipeline: we will use this helper in a Portable Git to test the Visual Studio build of Git. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 15:09:33 +00:00
if (argc >= 2 && !strcmp(argv[1], "quote-stress-test"))
return !!quote_stress_test(argc - 1, argv + 1);
if (argc >= 2 && !strcmp(argv[1], "quote-echo"))
return !!quote_echo(argc - 1, argv + 1);
if (argc < 3)
return 1;
while (!strcmp(argv[1], "env")) {
if (!argv[2])
die("env specifier without a value");
strvec_push(&proc.env, argv[2]);
argv += 2;
argc -= 2;
}
if (argc < 3) {
ret = 1;
goto cleanup;
}
strvec_pushv(&proc.args, (const char **)argv + 2);
if (!strcmp(argv[1], "start-command-ENOENT")) {
if (start_command(&proc) < 0 && errno == ENOENT) {
ret = 0;
goto cleanup;
}
fprintf(stderr, "FAIL %s\n", argv[1]);
return 1;
}
if (!strcmp(argv[1], "run-command")) {
ret = run_command(&proc);
goto cleanup;
}
run-command: add an "ungroup" option to run_process_parallel() Extend the parallel execution API added in c553c72eed6 (run-command: add an asynchronous parallel child processor, 2015-12-15) to support a mode where the stdout and stderr of the processes isn't captured and output in a deterministic order, instead we'll leave it to the kernel and stdio to sort it out. This gives the API same functionality as GNU parallel's --ungroup option. As we'll see in a subsequent commit the main reason to want this is to support stdout and stderr being connected to the TTY in the case of jobs=1, demonstrated here with GNU parallel: $ parallel --ungroup 'test -t {} && echo TTY || echo NTTY' ::: 1 2 TTY TTY $ parallel 'test -t {} && echo TTY || echo NTTY' ::: 1 2 NTTY NTTY Another is as GNU parallel's documentation notes a potential for optimization. As demonstrated in next commit our results with "git hook run" will be similar, but generally speaking this shows that if you want to run processes in parallel where the exact order isn't important this can be a lot faster: $ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null ' Benchmark 1: parallel seq ::: 10000000 >/dev/null Time (mean ± σ): 220.2 ms ± 9.3 ms [User: 124.9 ms, System: 96.1 ms] Range (min … max): 212.3 ms … 230.5 ms 3 runs Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null Time (mean ± σ): 154.7 ms ± 0.9 ms [User: 136.2 ms, System: 25.1 ms] Range (min … max): 153.9 ms … 155.7 ms 3 runs Summary 'parallel --ungroup seq ::: 10000000 >/dev/null ' ran 1.42 ± 0.06 times faster than 'parallel seq ::: 10000000 >/dev/null ' A large part of the juggling in the API is to make the API safer for its maintenance and consumers alike. For the maintenance of the API we e.g. avoid malloc()-ing the "pp->pfd", ensuring that SANITIZE=address and other similar tools will catch any unexpected misuse. For API consumers we take pains to never pass the non-NULL "out" buffer to an API user that provided the "ungroup" option. The resulting code in t/helper/test-run-command.c isn't typical of such a user, i.e. they'd typically use one mode or the other, and would know whether they'd provided "ungroup" or not. We could also avoid the strbuf_init() for "buffered_output" by having "struct parallel_processes" use a static PARALLEL_PROCESSES_INIT initializer, but let's leave that cleanup for later. Using a global "run_processes_parallel_ungroup" variable to enable this option is rather nasty, but is being done here to produce as minimal of a change as possible for a subsequent regression fix. This change is extracted from a larger initial version[1] which ends up with a better end-state for the API, but in doing so needed to modify all existing callers of the API. Let's defer that for now, and narrowly focus on what we need for fixing the regression in the subsequent commit. It's safe to do this with a global variable because: A) hook.c is the only user of it that sets it to non-zero, and before we'll get any other API users we'll refactor away this method of passing in the option, i.e. re-roll [1]. B) Even if hook.c wasn't the only user we don't have callers of this API that concurrently invoke this parallel process starting API itself in parallel. As noted above "A" && "B" are rather nasty, and we don't want to live with those caveats long-term, but for now they should be an acceptable compromise. 1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 08:48:19 +00:00
if (!strcmp(argv[1], "--ungroup")) {
argv += 1;
argc -= 1;
opts.ungroup = 1;
run-command: add an "ungroup" option to run_process_parallel() Extend the parallel execution API added in c553c72eed6 (run-command: add an asynchronous parallel child processor, 2015-12-15) to support a mode where the stdout and stderr of the processes isn't captured and output in a deterministic order, instead we'll leave it to the kernel and stdio to sort it out. This gives the API same functionality as GNU parallel's --ungroup option. As we'll see in a subsequent commit the main reason to want this is to support stdout and stderr being connected to the TTY in the case of jobs=1, demonstrated here with GNU parallel: $ parallel --ungroup 'test -t {} && echo TTY || echo NTTY' ::: 1 2 TTY TTY $ parallel 'test -t {} && echo TTY || echo NTTY' ::: 1 2 NTTY NTTY Another is as GNU parallel's documentation notes a potential for optimization. As demonstrated in next commit our results with "git hook run" will be similar, but generally speaking this shows that if you want to run processes in parallel where the exact order isn't important this can be a lot faster: $ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null ' Benchmark 1: parallel seq ::: 10000000 >/dev/null Time (mean ± σ): 220.2 ms ± 9.3 ms [User: 124.9 ms, System: 96.1 ms] Range (min … max): 212.3 ms … 230.5 ms 3 runs Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null Time (mean ± σ): 154.7 ms ± 0.9 ms [User: 136.2 ms, System: 25.1 ms] Range (min … max): 153.9 ms … 155.7 ms 3 runs Summary 'parallel --ungroup seq ::: 10000000 >/dev/null ' ran 1.42 ± 0.06 times faster than 'parallel seq ::: 10000000 >/dev/null ' A large part of the juggling in the API is to make the API safer for its maintenance and consumers alike. For the maintenance of the API we e.g. avoid malloc()-ing the "pp->pfd", ensuring that SANITIZE=address and other similar tools will catch any unexpected misuse. For API consumers we take pains to never pass the non-NULL "out" buffer to an API user that provided the "ungroup" option. The resulting code in t/helper/test-run-command.c isn't typical of such a user, i.e. they'd typically use one mode or the other, and would know whether they'd provided "ungroup" or not. We could also avoid the strbuf_init() for "buffered_output" by having "struct parallel_processes" use a static PARALLEL_PROCESSES_INIT initializer, but let's leave that cleanup for later. Using a global "run_processes_parallel_ungroup" variable to enable this option is rather nasty, but is being done here to produce as minimal of a change as possible for a subsequent regression fix. This change is extracted from a larger initial version[1] which ends up with a better end-state for the API, but in doing so needed to modify all existing callers of the API. Let's defer that for now, and narrowly focus on what we need for fixing the regression in the subsequent commit. It's safe to do this with a global variable because: A) hook.c is the only user of it that sets it to non-zero, and before we'll get any other API users we'll refactor away this method of passing in the option, i.e. re-roll [1]. B) Even if hook.c wasn't the only user we don't have callers of this API that concurrently invoke this parallel process starting API itself in parallel. As noted above "A" && "B" are rather nasty, and we don't want to live with those caveats long-term, but for now they should be an acceptable compromise. 1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 08:48:19 +00:00
}
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
jobs = atoi(argv[2]);
strvec_clear(&proc.args);
strvec_pushv(&proc.args, (const char **)argv + 3);
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 00:04:10 +00:00
if (!strcmp(argv[1], "run-command-parallel")) {
opts.get_next_task = parallel_next;
} else if (!strcmp(argv[1], "run-command-abort")) {
opts.get_next_task = parallel_next;
opts.task_finished = task_finished;
} else if (!strcmp(argv[1], "run-command-no-jobs")) {
opts.get_next_task = no_job;
opts.task_finished = task_finished;
} else {
ret = 1;
fprintf(stderr, "check usage\n");
goto cleanup;
}
opts.processes = jobs;
run_processes_parallel(&opts);
ret = 0;
cleanup:
child_process_clear(&proc);
return ret;
}