development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 14:03:28 +00:00

Author	SHA1	Message	Date
Eric Sunshine	73c768dae9	chainlint: annotate original test definition rather than token stream When chainlint detects problems in a test, such as a broken &&-chain, it prints out the test with "?!FOO?!" annotations inserted at each problem location. However, rather than annotating the original test definition, it instead dumps out a parsed token representation of the test. Since it lacks comments, indentations, here-doc bodies, and so forth, this tokenized representation can be difficult for the test author to digest and relate back to the original test definition. However, now that each parsed token carries positional information, the location of a detected problem can be pinpointed precisely in the original test definition. Therefore, take advantage of this information to annotate the test definition itself rather than annotating the parsed token stream, thus making it easier for a test author to relate a problem back to the source. Maintaining the positional meta-information associated with each detected problem requires a slight change in how the problems are managed internally. In particular, shell syntax such as: msg="total: $(cd data; wc -w .txt) words" requires the lexical analyzer to recursively invoke the parser in order to detect problems within the $(...) expression inside the double-quoted string. In this case, the recursive parse context will detect the broken &&-chain between the `cd` and `wc` commands, returning the token stream: cd data ; ?!AMP?! wc -w .txt However, the parent parse context will see everything inside the double-quotes as a single string token: "total: $(cd data ; ?!AMP?! wc -w .txt) words" losing whatever positional information was attached to the ";" token where the problem was detected. One way to preserve the positional information of a detected problem in a recursive parse context within a string would be to attach the positional information to the annotation textually; for instance: "total: $(cd data ; ?!AMP:21:22?! wc -w .txt) words" and then extract the positional information when annotating the original test definition. However, a cleaner and much simpler approach is to maintain the list of detected problems separately rather than embedding the problems as annotations directly in the parsed token stream. Not only does this ensure that positional information within recursive parse contexts is not lost, but it keeps the token stream free from non-token pollution, which may simplify implementation of validations added in the future since they won't have to handle non-token "?!FOO!?" items specially. Finally, the chainlint self-test "expect" files need a few mechanical adjustments now that the original test definitions are emitted rather than the parsed token stream. In particular, the following items missing from the historic parsed-token output are now preserved verbatim: * indentation (and whitespace, in general) * comments * here-doc bodies * here-doc tag quoting (i.e. "\EOF") * line-splices (i.e. "\" at the end of a line) Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-08 15:10:49 -05:00
Eric Sunshine	d00113ec34	t/Makefile: apply chainlint.pl to existing self-tests Now that chainlint.pl is functional, take advantage of the existing chainlint self-tests to validate its operation. (While at it, stop validating chainlint.sed against the self-tests since it will soon be retired.) Due to chainlint.sed implementation limitations leaking into the self-test "expect" files, a few of them require minor adjustment to make them compatible with chainlint.pl which does not share those limitations. First, because `sed` does not provide any sort of real recursion, chainlint.sed only emulates recursion into subshells, and each level of recursion leads to a multiplicative increase in complexity of the `sed` rules. To avoid substantial complexity, chainlint.sed, therefore, only emulates subshell recursion one level deep. Any subshell deeper than that is passed through as-is, which means that &&-chains are not checked in deeper subshells. chainlint.pl, on the other hand, employs a proper recursive descent parser, thus checks subshells to any depth and correctly flags broken &&-chains in deep subshells. Second, due to sed's line-oriented nature, chainlint.sed, by necessity, folds multi-line quoted strings into a single line. chainlint.pl, on the other hand, employs a proper lexical analyzer which preserves quoted strings as-is, including embedded newlines. Furthermore, the output of chainlint.sed and chainlint.pl do not match precisely in terms of whitespace. However, since the purpose of the self-checks is to verify that the ?!AMP?! annotations are being correctly added, minor whitespace differences are immaterial. For this reason, rather than adjusting whitespace in all existing self-test "expect" files to match the new linter's output, the `check-chainlint` target ignores whitespace differences. Since `diff -w` is not POSIX, `check-chainlint` attempts to employ `git diff -w`, and only falls back to non-POSIX `diff -w` (and `-u`) if `git diff` is not available. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-01 10:07:40 -07:00
Eric Sunshine	34ba05c296	chainlint.sed: stop throwing away here-doc tags The purpose of chainlint is to highlight problems it finds in test code by inserting annotations at the location of each problem. Arbitrarily eliding bits of the code it is checking is not helpful, yet this is exactly what chainlint.sed does by cavalierly and unnecessarily dropping the here-doc operator and tag; i.e. `cat <<TAG` becomes simply `cat` in the output. This behavior can make it more difficult for the test writer to align the annotated output of chainlint.sed with the original test code. Address this by retaining here-doc tags. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-13 14:15:29 -08:00
Eric Sunshine	5be30d0cd3	chainlint.sed: drop subshell-closing ">" annotation chainlint.sed inserts a ">" annotation at the beginning of a line to signal that its heuristics have identified an end-of-subshell. This was useful as a debugging aid during development of the script, but it has no value to test writers and might even confuse them into thinking that the linter is misbehaving by inserting line-noise into the shell code it is validating. Moreover, its presence also potentially makes it difficult to reuse the chainlint self-test "expect" output should a more capable linter ever be developed. Therefore, drop the ">" annotation. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-13 14:15:29 -08:00
Eric Sunshine	db8c7a1cc0	chainlint.sed: improve ?!AMP?! placement accuracy When chainlint.sed detects a broken &&-chain, it places an ?!AMP?! annotation at the beginning of the line. However, this is an unusual location for programmers accustomed to error messages (from compilers, for instance) indicating the exact point of the problem. Therefore, relocate the ?!AMP?! annotation to the end of the line in order to better direct the programmer's attention to the source of the problem. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-13 14:15:29 -08:00
Eric Sunshine	22e3e0241a	chainlint: recognize multi-line quoted strings more robustly chainlint.sed recognizes multi-line quoted strings within subshells: echo "abc def" >out && so it can avoid incorrectly classifying lines internal to the string as breaking the &&-chain. To identify the first line of a multi-line string, it checks if the line contains a single quote. However, this is fragile and can be easily fooled by a line containing multiple strings: echo "xyz" "abc def" >out && Make detection more robust by checking for an odd number of quotes rather than only a single one. (Escaped quotes are not handled, but support may be added later.) The original multi-line string recognizer rather cavalierly threw away all but the final quote, whereas the new one is careful to retain all quotes, so the "expected" output of a couple existing chainlint tests is updated to account for this new behavior. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 12:22:12 -07:00
Eric Sunshine	d93871143f	chainlint: let here-doc and multi-line string commence on same line After swallowing a here-doc, chainlint.sed assumes that no other processing needs to be done on the line aside from checking for &&-chain breakage; likewise, after folding a multi-line quoted string. However, it's conceivable (even if unlikely in practice) that both a here-doc and a multi-line quoted string might commence on the same line: cat <<\EOF && echo "foo bar" data EOF Support this case by sending the line (after swallowing and folding) through the normal processing sequence rather than jumping directly to the check for broken &&-chain. This change also allows other somewhat pathological cases to be handled, such as closing a subshell on the same line starting a here-doc: ( cat <<-\INPUT) data INPUT or, for instance, opening a multi-line $(...) expression on the same line starting a here-doc: x=$(cat <<-\END && data END echo "x") among others. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 12:22:12 -07:00

7 commits