Because `sed` is line-oriented, for ease of implementation, when
chainlint.sed encounters an opening subshell in which the first command
is cuddled with the "(", it splits the line into two lines: one
containing only "(", and the other containing whatever follows "(".
This allows chainlint.sed to get by with a single set of regular
expressions for matching shell statements rather than having to
duplicate each expression (one set for matching non-cuddled statements,
and one set for matching cuddled statements).
However, although syntactically and semantically immaterial, this
transformation has no value to test authors and might even confuse them
into thinking that the linter is misbehaving by inserting (whitespace)
line-noise into the shell code it is validating. Moreover, it also
allows an implementation detail of chainlint.sed to seep into the
chainlint self-test "expect" files, which potentially makes it difficult
to reuse the self-tests should a more capable chainlint ever be
developed.
To address these concerns, stop splitting cuddled "(..." into two lines.
Note that, as an implementation artifact, due to sed's line-oriented
nature, this change inserts a blank line at output time just before the
"(..." line is emitted. It would be possible to suppress this blank line
but doing so would add a fair bit of complexity to chainlint.sed.
Therefore, rather than suppressing the extra blank line, the Makefile's
`check-chainlint` target which runs the chainlint self-tests is instead
modified to ignore blank lines when comparing chainlint output against
the self-test "expect" output. This is a reasonable compromise for two
reasons. First, the purpose of the chainlint self-tests is to verify
that the ?!AMP?! annotations are being correctly added; precise
whitespace is immaterial. Second, by necessity, chainlint.sed itself
already throws away all blank lines within subshells since, when
checking for a broken &&-chain, it needs to check the final _statement_
in a subshell, not the final _line_ (which might be blank), thus it has
never made any attempt to precisely reproduce blank lines in its output.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When checking for broken a &&-chain, chainlint.sed knows that the final
statement in a subshell should not end with `&&`, so it takes care to
make a distinction between the final line which is an actual statement
and any lines which may be mere comments preceding the closing ')'. As
such, it swallows comment lines so that they do not interfere with the
&&-chain check.
However, since `sed` does not provide any sort of real recursion,
chainlint.sed only checks &&-chains in subshells one level deep; it
doesn't do any checking in deeper subshells or in `{...}` blocks within
subshells. Furthermore, on account of potential implementation
complexity, it doesn't check &&-chains within `case` arms.
Due to an oversight, it also doesn't swallow comments inside deep
subshells, `{...}` blocks, or `case` statements, which makes its output
inconsistent (swallowing comments in some cases but not others).
Unfortunately, this inconsistency seeps into the chainlint self-test
"expect" files, which potentially makes it difficult to reuse the
self-tests should a more capable chainlint ever be developed. Therefore,
teach chainlint.sed to consistently swallow comments in all cases.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The purpose of chainlint is to highlight problems it finds in test code
by inserting annotations at the location of each problem. Arbitrarily
eliding bits of the code it is checking is not helpful, yet this is
exactly what chainlint.sed does by cavalierly and unnecessarily dropping
the here-doc operator and tag; i.e. `cat <<TAG` becomes simply `cat` in
the output. This behavior can make it more difficult for the test writer
to align the annotated output of chainlint.sed with the original test
code. Address this by retaining here-doc tags.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tighten here-doc recognition to prevent it from being fooled by text
which looks like a here-doc operator but happens merely to be the
content of a string, such as this real-world case from t7201:
echo "<<<<<<< ours" &&
echo ourside &&
echo "=======" &&
echo theirside &&
echo ">>>>>>> theirs"
This problem went unnoticed because chainlint.sed is not a real parser,
but rather applies heuristics to pretend to understand shell code. In
this case, it saw what it thought was a here-doc operator (`<< ours`),
and fell off the end of the test looking for the closing tag "ours"
which it never found, thus swallowed the remainder of the test without
checking it for &&-chain breakage.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
chainlint.sed inserts a ">" annotation at the beginning of a line to
signal that its heuristics have identified an end-of-subshell. This was
useful as a debugging aid during development of the script, but it has
no value to test writers and might even confuse them into thinking that
the linter is misbehaving by inserting line-noise into the shell code it
is validating. Moreover, its presence also potentially makes it
difficult to reuse the chainlint self-test "expect" output should a more
capable linter ever be developed. Therefore, drop the ">" annotation.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
>From inception, when chainlint.sed encountered a line using semicolon to
separate commands rather than `&&`, it would insert a ?!SEMI?!
annotation at the beginning of the line rather ?!AMP?! even though the
&&-chain is also broken by the semicolon. Given a line such as:
?!SEMI?! cmd1; cmd2 &&
the ?!SEMI?! annotation makes it easier to see what the problem is than
if the output had been:
?!AMP?! cmd1; cmd2 &&
which might confuse the test author into thinking that the linter is
broken (since the line clearly ends with `&&`).
However, now that the ?!AMP?! an ?!SEMI?! annotations are inserted at
the point of breakage rather than at the beginning of the line, and
taking into account that both represent a broken &&-chain, there is
little reason to distinguish between the two. Using ?!AMP?! alone is
sufficient to point the test author at the problem. For instance, in:
cmd1; ?!AMP?! cmd2 &&
cmd3
it is clear that the &&-chain is broken between `cmd1` and `cmd2`.
Likewise, in:
cmd1 && cmd2 ?!AMP?!
cmd3
it is clear that the &&-chain is broken between `cmd2` and `cmd3`.
Finally, in:
cmd1; ?!AMP?! cmd2 ?!AMP?!
cmd3
it is clear that the &&-chain is broken between each command.
Hence, there is no longer a good reason to make a distinction between a
broken &&-chain due to a semicolon and a broken chain due to a missing
`&&` at end-of-line. Therefore, drop the ?!SEMI?! annotation and use
?!AMP?! exclusively.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
chainlint.sed flags ";" when used as a command terminator since it
breaks the &&-chain, thus can allow failures to go undetected. However,
when a command terminated by ";" is the last command in the body of a
compound statement, such as `command-2` in:
if test $# -gt 1
then
command-1 &&
command-2;
fi
then the ";" is harmless and the exit code from `command-2` is passed
through untouched and becomes the exit code of the compound statement,
as if the ";" was not present. Therefore, tolerate a trailing ";" in
this position rather than complaining about broken &&-chain.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When chainlint.sed detects commands separated by a semicolon rather than
by `&&`, it places a ?!SEMI?! annotation at the beginning of the line.
However, this is an unusual location for programmers accustomed to error
messages (from compilers, for instance) indicating the exact point of
the problem. Therefore, relocate the ?!SEMI?! annotation to the location
of the semicolon in order to better direct the programmer's attention to
the source of the problem.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When chainlint.sed detects a broken &&-chain, it places an ?!AMP?!
annotation at the beginning of the line. However, this is an unusual
location for programmers accustomed to error messages (from compilers,
for instance) indicating the exact point of the problem. Therefore,
relocate the ?!AMP?! annotation to the end of the line in order to
better direct the programmer's attention to the source of the problem.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The purpose of chainlint.sed is to detect &&-chain breakage only within
subshells (one level deep); it doesn't bother checking for top-level
&&-chain breakage since the &&-chain checker built into t/test-lib.sh
should detect broken &&-chains outside of subshells by making them
magically exit with code 117.
Unfortunately, one of the chainlint.sed self-tests has overly intimate
knowledge of this particular division of responsibilities and only cares
about what chainlint.sed itself will produce, while ignoring the fact
that a more all-encompassing linter would complain about a broken
&&-chain outside the subshell. This makes it difficult to re-use the
test with a more capable chainlint implementation should one ever be
developed. Therefore, adjust the test and its "expected" output to
avoid being specific to the tunnel-vision of this one implementation.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The purpose of chainlint.sed is to detect &&-chain breakage only within
subshells (one level deep); it doesn't bother checking for top-level
&&-chain breakage since the &&-chain checker built into t/test-lib.sh
should detect broken &&-chains outside of subshells by making them
magically exit with code 117. However, this division of labor may not
always be the case if a more capable chainlint implementation is ever
developed. Beyond that, due to being sed-based and due to its use of
heuristics, chainlint.sed has several limitations (such as being unable
to detect &&-chain breakage in subshells more than one level deep since
it only manually emulates recursion into a subshell).
Some of the comments in the chainlint self-tests unnecessarily reflect
the limitations of chainlint.sed even though those limitations are not
what is being tested. Therefore, simplify and generalize the comments to
explain only what is being tested, thus ensuring that they won't become
outdated if a more capable chainlint is ever developed.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The chainlint self-test code snippets are supposed to represent the body
of a test_expect_success() or test_expect_failure(), yet the contents of
a few tests would have caused the shell to report syntax errors had they
been real test bodies due to the mix of single- and double-quotes.
Although chainlint.sed, with its simplistic heuristics, is blind to this
problem, a future more robust chainlint implementation might not have
such a limitation. Therefore, stop mixing quote types haphazardly in
those tests and unify quoting throughout. While at it, drop chunks of
tests which merely repeat what is already tested elsewhere but with
alternative quotes.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The chainlint self-test code snippets are supposed to represent the body
of a test_expect_success() or test_expect_failure(), yet the contents of
these tests would have caused the shell to report syntax errors had they
been real test bodies. Although chainlint.sed, with its simplistic
heuristics, is blind to these syntactic problems, a future more robust
chainlint implementation might not have such a limitation, so make these
snippets syntactically valid.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A here-doc tag can be quoted ('EOF'/"EOF") or escaped (\EOF) to suppress
interpolation within the body. chainlint recognizes single-quoted and
escaped tags, but does not know about double-quoted tags. For
completeness, teach it to recognize double-quoted tags, as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This extract from contrib/subtree/t7900 triggered a false positive due
to three chainlint limitations:
* recognizing only a "blessed" set of here-doc tag names in a subshell
("EOF", "EOT", "INPUT_END"), of which "TXT" is not a member
* inability to recognize multi-line $(...) when the first statement of
the body is cuddled with the opening "$("
* inability to recognize multiple constructs on a single line, such as
opening a multi-line $(...) and starting a here-doc
Now that all of these shortcomings have been addressed, turn this rather
pathological bit of shell coding into a chainlint test case.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
chainlint.sed recognizes multi-line quoted strings within subshells:
echo "abc
def" >out &&
so it can avoid incorrectly classifying lines internal to the string as
breaking the &&-chain. To identify the first line of a multi-line
string, it checks if the line contains a single quote. However, this is
fragile and can be easily fooled by a line containing multiple strings:
echo "xyz" "abc
def" >out &&
Make detection more robust by checking for an odd number of quotes
rather than only a single one.
(Escaped quotes are not handled, but support may be added later.)
The original multi-line string recognizer rather cavalierly threw away
all but the final quote, whereas the new one is careful to retain all
quotes, so the "expected" output of a couple existing chainlint tests is
updated to account for this new behavior.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After swallowing a here-doc, chainlint.sed assumes that no other
processing needs to be done on the line aside from checking for &&-chain
breakage; likewise, after folding a multi-line quoted string. However,
it's conceivable (even if unlikely in practice) that both a here-doc and
a multi-line quoted string might commence on the same line:
cat <<\EOF && echo "foo
bar"
data
EOF
Support this case by sending the line (after swallowing and folding)
through the normal processing sequence rather than jumping directly to
the check for broken &&-chain.
This change also allows other somewhat pathological cases to be handled,
such as closing a subshell on the same line starting a here-doc:
(
cat <<-\INPUT)
data
INPUT
or, for instance, opening a multi-line $(...) expression on the same
line starting a here-doc:
x=$(cat <<-\END &&
data
END
echo "x")
among others.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For multi-line $(...) expressions nested within subshells, chainlint.sed
only recognizes:
x=$(
echo foo &&
...
but it is not unlikely that test authors may also cuddle the command
with the opening "$(", so support that style, as well:
x=$(echo foo &&
...
The closing ")" is already correctly recognized when cuddled or not.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A here-doc tag can be quoted ('EOF') or escaped (\EOF) to suppress
interpolation within the body. Although, chainlint recognizes escaped
tags, it does not know about quoted tags. For completeness, teach it to
recognize quoted tags, as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
chainlint.sed swallows top-level here-docs to avoid being fooled by
content which might look like start-of-subshell. It likewise swallows
here-docs in subshells to avoid marking content lines as breaking the
&&-chain, and to avoid being fooled by content which might look like
end-of-subshell, start-of-nested-subshell, or other specially-recognized
constructs.
At the time of implementation, it was believed that it was not possible
to support arbitrary here-doc tag names since 'sed' provides no way to
stash the opening tag name in a variable for later comparison against a
line signaling end-of-here-doc. Consequently, tag names are hard-coded,
with "EOF" being the only tag recognized at the top-level, and only
"EOF", "EOT", and "INPUT_END" being recognized within subshells. Also,
special care was taken to avoid being confused by here-docs nested
within other here-docs.
In practice, this limited number of hard-coded tag names has been "good
enough" for the 13000+ existing Git test, despite many of those tests
using tags other than the recognized ones, since the bodies of those
here-docs do not contain content which would fool the linter.
Nevertheless, the situation is not ideal since someone writing new
tests, and choosing a name not in the "blessed" set could potentially
trigger a false-positive.
To address this shortcoming, upgrade chainlint.sed to handle arbitrary
here-doc tag names, both at the top-level and within subshells.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>