development/git - HydraGit

mirror of https://github.com/git/git synced 2024-08-24 02:11:08 +00:00

Author	SHA1	Message	Date
Johannes Schindelin	2d1142a3e8	Sync with 2.26.3 * maint-2.26: Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path	2021-02-12 15:50:04 +01:00
Matheus Tavares	684dd4c2b4	checkout: fix bug that makes checkout follow symlinks in leading path Before checking out a file, we have to confirm that all of its leading components are real existing directories. And to reduce the number of lstat() calls in this process, we cache the last leading path known to contain only directories. However, when a path collision occurs (e.g. when checking out case-sensitive files in case-insensitive file systems), a cached path might have its file type changed on disk, leaving the cache on an invalid state. Normally, this doesn't bring any bad consequences as we usually check out files in index order, and therefore, by the time the cached path becomes outdated, we no longer need it anyway (because all files in that directory would have already been written). But, there are some users of the checkout machinery that do not always follow the index order. In particular: checkout-index writes the paths in the same order that they appear on the CLI (or stdin); and the delayed checkout feature -- used when a long-running filter process replies with "status=delayed" -- postpones the checkout of some entries, thus modifying the checkout order. When we have to check out an out-of-order entry and the lstat() cache is invalid (due to a previous path collision), checkout_entry() may end up using the invalid data and thrusting that the leading components are real directories when, in reality, they are not. In the best case scenario, where the directory was replaced by a regular file, the user will get an error: "fatal: unable to create file 'foo/bar': Not a directory". But if the directory was replaced by a symlink, checkout could actually end up following the symlink and writing the file at a wrong place, even outside the repository. Since delayed checkout is affected by this bug, it could be used by an attacker to write arbitrary files during the clone of a maliciously crafted repository. Some candidate solutions considered were to disable the lstat() cache during unordered checkouts or sort the entries before passing them to the checkout machinery. But both ideas include some performance penalty and they don't future-proof the code against new unordered use cases. Instead, we now manually reset the lstat cache whenever we successfully remove a directory. Note: We are not even checking whether the directory was the same as the lstat cache points to because we might face a scenario where the paths refer to the same location but differ due to case folding, precomposed UTF-8 issues, or the presence of `..` components in the path. Two regression tests, with case-collisions and utf8-collisions, are also added for both checkout-index and delayed checkout. Note: to make the previously mentioned clone attack unfeasible, it would be sufficient to reset the lstat cache only after the remove_subtree() call inside checkout_entry(). This is the place where we would remove a directory whose path collides with the path of another entry that we are currently trying to check out (possibly a symlink). However, in the interest of a thorough fix that does not leave Git open to similar-but-not-identical attack vectors, we decided to intercept all `rmdir()` calls in one fell swoop. This addresses CVE-2021-21300. Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>	2021-02-12 15:47:02 +01:00
brian m. carlson	13e7ed6a3a	builtin/checkout: compute checkout metadata for checkouts Provide commit metadata for checkout code paths that use unpack_trees and friends. When we're checking out a commit, use the commit information, but don't provide commit information if we're checking out from the index, since there need not be any particular commit associated with the index, and even if there is one, we can't know what it is. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-16 11:37:02 -07:00
brian m. carlson	c397aac02f	convert: provide additional metadata to filters Now that we have the codebase wired up to pass any additional metadata to filters, let's collect the additional metadata that we'd like to pass. The two main places we pass this metadata are checkouts and archives. In these two situations, reading HEAD isn't a valid option, since HEAD isn't updated for checkouts until after the working tree is written and archives can accept an arbitrary tree. In other situations, HEAD will usually reflect the refname of the branch in current use. We pass a smaller amount of data in other cases, such as git cat-file, where we can really only logically know about the blob. This commit updates only the parts of the checkout code where we don't use unpack_trees. That function and callers of it will be handled in a future commit. In the archive code, we leak a small amount of memory, since nothing we pass in the archiver argument structure is freed. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-16 11:37:02 -07:00
Johannes Schindelin	3306f6524d	mingw: handle GITPERLLIB in t0021 in a Windows-compatible way Git's assumption that all path lists are colon-separated is not only wrong on Windows, it is not even an assumption that is compatible with POSIX. In the interest of time, let's not try to fix this properly but simply work around the obvious breakage on Windows, where the MSYS2 Bash used by Git for Windows to interpret the Git's Unix shell scripts will automagically convert path lists in the environment to semicolon-separated lists of Windows paths (with drive letter and the corresponding colon and all that jazz). In other words, we simply look whether there is a semicolon in GITPERLLIB and split by semicolons if found instead of colons. This is not fool-proof, of course, as the path list could consist of a single path. But that is not the case in Git for Windows' test suite, there are always two paths in GITPERLLIB. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-10 14:00:54 -08:00
Christian Couder	cb1c64b4a8	Git/Packet: clarify that packet_required_key_val_read allows EOF The function calls itself "required", but it does not die when it sees an unexpected EOF. Let's rename it to "packet_key_val_read()". Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-22 16:23:55 +09:00
Christian Couder	0fe8d516bb	Git/Packet.pm: extract parts of t0021/rot13-filter.pl for reuse And while at it let's simplify t0021/rot13-filter.pl by using Git/Packet.pm. This will make it possible to reuse packet related functions in other test scripts. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-07 10:26:01 +09:00
Christian Couder	f11c8ce1f6	t0021/rot13-filter: add capability functions These function help read and write capabilities. To make them more generic and make it easy to reuse them, the following changes are made: - we don't require capabilities to come in a fixed order, - we allow duplicates, - we check that the remote supports the capabilities we advertise, - we don't check if the remote declares any capability we don't know about. The reason behind the last change is that the protocol should work using only the capabilities that both ends support, and it should not stop working if one end starts to advertise a new capability. Despite those changes, we can still require a set of capabilities, and die if one of them is not supported. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-07 09:54:41 +09:00
Christian Couder	4a9ef1bbc1	t0021/rot13-filter: refactor checking final lf As checking for a lf character at the end of a buffer will be useful in another function, let's refactor this functionality into a small remove_final_lf_or_die() helper function. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-07 09:54:41 +09:00
Christian Couder	25cbfe3465	t0021/rot13-filter: add packet_initialize() Let's refactor the code to initialize communication into its own packet_initialize() function, so that we can reuse this functionality in following patches. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-07 09:54:41 +09:00
Christian Couder	00df039faa	t0021/rot13-filter: improve error message If there is no new line at the end of something it receives, the packet_txt_read() function die()s, but it's difficult to debug without much context. Let's give a bit more information when that happens. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-07 09:54:41 +09:00
Christian Couder	ed17d26245	t0021/rot13-filter: improve 'if .. elsif .. else' style Before further refactoring the "t0021/rot13-filter.pl" script, let's modernize the style of its 'if .. elsif .. else' clauses to improve its readability by making it more similar to our other perl scripts. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-07 09:54:41 +09:00
Christian Couder	2c9ea595a7	t0021/rot13-filter: refactor packet reading functions To make it possible in a following commit to move packet reading and writing functions into a Packet.pm module, let's refactor these functions, so they don't handle printing debug output and exiting. While at it let's create packet_required_key_val_read() to still handle erroring out in a common case. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-07 09:54:41 +09:00
Christian Couder	0a26882621	t0021/rot13-filter: fix list comparison Since `edcc8581` ("convert: add filter.<driver>.process option", 2016-10-16) when t0021/rot13-filter.pl was created, list comparison in this perl script have been quite broken. packet_txt_read() returns a 2-element list, and the right hand side of "eq" also has a list with (two, elements), but "eq" takes the last element of the list on each side, and compares them. The first elements (0 or 1) on the right hand side lists do not matter, which means we do not require to see a flush at the end of the version -- a simple empty string or an EOF would do, which is definitely not what we want. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-07 09:54:41 +09:00
Lars Schneider	2841e8f81c	convert: add "status=delayed" to filter process protocol Some `clean` / `smudge` filters may require a significant amount of time to process a single blob (e.g. the Git LFS smudge filter might perform network requests). During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol, introduced in `edcc8581` ("convert: add filter.<driver>.process option", 2016-10-16), to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations for now. The optimization is most effective in these code paths as all files of the tree are processed. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-30 13:50:41 -07:00
Lars Schneider	1757333410	t0021: write "OUT <size>" only on success "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end of a response. This works perfectly for the existing responses "abort", "error", and "success". A new response "delayed", that will be introduced in a subsequent patch, accepts the input without giving the filtered result right away. At this point we cannot know the size of the response. Therefore, we do not write "OUT <size>" for "delayed" responses. To simplify the code we do not write "OUT <size>" for "abort" and "error" responses either as their size is always zero. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-29 11:23:47 -07:00
Lars Schneider	e1ec4721d6	t0021: make debug log file name configurable The "rot13-filter.pl" helper wrote its debug logs always to "rot13-filter.log". Make this configurable by defining the log file as first parameter of "rot13-filter.pl". This is useful if "rot13-filter.pl" is configured multiple times similar to the subsequent patch 'convert: add "status=delayed" to filter process protocol'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-26 10:36:00 -07:00
Junio C Hamano	08721a056b	Merge branch 'ls/filter-process' Doc update. * ls/filter-process: t0021: fix flaky test docs: warn about possible '=' in clean/smudge filter process values	2016-12-27 00:11:42 -08:00
Lars Schneider	c6b0831c9c	docs: warn about possible '=' in clean/smudge filter process values A pathname value in a clean/smudge filter process "key=value" pair can contain the '=' character (introduced in `edcc858`). Make the user aware of this issue in the docs, add a corresponding test case, and fix the issue in filter process value parser of the example implementation in contrib. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-12-06 11:29:52 -08:00
Jeff King	4821494ebc	t0021: fix filehandle usage on older perl The rot13-filter.pl script calls methods on implicitly defined filehandles (STDOUT, and the result of an open() call). Prior to perl 5.13, these methods are not automatically loaded, and perl will complain with: Can't locate object method "flush" via package "IO::Handle" Let's explicitly load IO::File (which inherits from IO::Handle). That's more than we need for just "flush", but matches what perl has done since: http://perl5.git.perl.org/perl.git/commit/15e6cdd91beb4cefae4b65e855d68cf64766965d Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-11-02 19:36:29 -07:00
Jeff King	f272696a35	t0021: use $PERL_PATH for rot13-filter.pl The rot13-filter.pl script hardcodes "#!/usr/bin/perl", and does not respect $PERL_PATH at all. That is a problem if the system does not have perl at that path, or if it has a perl that is too old to run a complicated script like the rot13-filter (but PERL_PATH points to a more modern one). We can fix this by using write_script() to create a new copy of the script with the correct #!-line. In theory we could move the whole script inside t0021-conversion.sh rather than having it as an auxiliary file, but it's long enough that it just makes things harder to read. As a bonus, we can stop using the full path to the script in the filter-process config we add (because the trash directory is in our PATH). Not only is this shorter, but it sidesteps any shell-quoting issues. The original was broken when $TEST_DIRECTORY contained a space, because it was interpolated in the outer script. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-11-02 19:36:29 -07:00
Lars Schneider	edcc85814c	convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-10-17 11:45:52 -07:00

22 commits