Commit graph

8 commits

Author SHA1 Message Date
Lars Schneider 2841e8f81c convert: add "status=delayed" to filter process protocol
Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.

Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.

Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:50:41 -07:00
Lars Schneider 1757333410 t0021: write "OUT <size>" only on success
"rot13-filter.pl" always writes "OUT <size>" to the debug log at the end
of a response.

This works perfectly for the existing responses "abort", "error", and
"success". A new response "delayed", that will be introduced in a
subsequent patch, accepts the input without giving the filtered result
right away. At this point we cannot know the size of the response.
Therefore, we do not write "OUT <size>" for "delayed" responses.

To simplify the code we do not write "OUT <size>" for "abort" and
"error" responses either as their size is always zero.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-29 11:23:47 -07:00
Lars Schneider e1ec4721d6 t0021: make debug log file name configurable
The "rot13-filter.pl" helper wrote its debug logs always to "rot13-filter.log".
Make this configurable by defining the log file as first parameter of
"rot13-filter.pl".

This is useful if "rot13-filter.pl" is configured multiple times similar to the
subsequent patch 'convert: add "status=delayed" to filter process protocol'.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26 10:36:00 -07:00
Junio C Hamano 08721a056b Merge branch 'ls/filter-process'
Doc update.

* ls/filter-process:
  t0021: fix flaky test
  docs: warn about possible '=' in clean/smudge filter process values
2016-12-27 00:11:42 -08:00
Lars Schneider c6b0831c9c docs: warn about possible '=' in clean/smudge filter process values
A pathname value in a clean/smudge filter process "key=value" pair can
contain the '=' character (introduced in edcc858). Make the user aware
of this issue in the docs, add a corresponding test case, and fix the
issue in filter process value parser of the example implementation in
contrib.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06 11:29:52 -08:00
Jeff King 4821494ebc t0021: fix filehandle usage on older perl
The rot13-filter.pl script calls methods on implicitly
defined filehandles (STDOUT, and the result of an open()
call).  Prior to perl 5.13, these methods are not
automatically loaded, and perl will complain with:

  Can't locate object method "flush" via package "IO::Handle"

Let's explicitly load IO::File (which inherits from
IO::Handle). That's more than we need for just "flush", but
matches what perl has done since:

  http://perl5.git.perl.org/perl.git/commit/15e6cdd91beb4cefae4b65e855d68cf64766965d

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-02 19:36:29 -07:00
Jeff King f272696a35 t0021: use $PERL_PATH for rot13-filter.pl
The rot13-filter.pl script hardcodes "#!/usr/bin/perl", and
does not respect $PERL_PATH at all. That is a problem if the
system does not have perl at that path, or if it has a perl
that is too old to run a complicated script like the
rot13-filter (but PERL_PATH points to a more modern one).

We can fix this by using write_script() to create a new copy
of the script with the correct #!-line. In theory we could
move the whole script inside t0021-conversion.sh rather than
having it as an auxiliary file, but it's long enough that
it just makes things harder to read.

As a bonus, we can stop using the full path to the script in
the filter-process config we add (because the trash
directory is in our PATH). Not only is this shorter, but it
sidesteps any shell-quoting issues. The original was broken
when $TEST_DIRECTORY contained a space, because it was
interpolated in the outer script.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-02 19:36:29 -07:00
Lars Schneider edcc85814c convert: add filter.<driver>.process option
Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: https://github.com/github/git-lfs/pull/1382

This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter.<driver>.process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:45:52 -07:00