Commit graph

39 commits

Author SHA1 Message Date
Patrick Steinhardt 647b5e0998 tests: adjust whitespace in chainlint expectations
The "check-chainlint" target runs automatically when running tests and
performs self-checks to verify that the chainlinter itself produces the
expected output. Originally, the chainlinter was implemented via sed,
but the infrastructure has been rewritten in fb41727b7e (t: retire
unused chainlint.sed, 2022-09-01) to use a Perl script instead.

The rewrite caused some slight whitespace changes in the output that are
ultimately not of much importance. In order to be able to assert that
the actual chainlinter errors match our expectations we thus have to
ignore whitespace characters when diffing them. As the `-w` flag is not
in POSIX we try to use `git diff -w --no-index` before we fall back to
`diff -w -u`.

To accomodate for cases where the host system has no Git installation we
use the locally-compiled version of Git. This can result in problems
though when the Git project's repository is using extensions that the
locally-compiled version of Git doesn't understand. It will refuse to
run and thus cause the checks to fail.

Instead of improving the detection logic, fix our ".expect" files so
that we do not need any post-processing at all anymore. This allows us
to drop the `-w` flag when diffing so that we can always use diff(1)
now.

Note that we keep some of the post-processing of `chainlint.pl` output
intact to strip leading line numbers generated by the script. Having
these would cause a rippling effect whenever we add a new test that
sorts into the middle of existing tests and would require us to
renumerate all subsequent lines, which seems rather pointless.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15 08:36:14 -08:00
Eric Sunshine 2b61c8dc88 tests: diagnose unclosed here-doc in chainlint.pl
An unclosed here-doc in a test is a problem, because it silently gobbles
up any remaining commands. Since 99a64e4b73 (tests: lint for run-away
here-doc, 2017-03-22) we detect this by piggy-backing on the internal
chainlint checker in test-lib.sh.

However, it would be nice to detect it in chainlint.pl, for a few
reasons:

  - the output from chainlint.pl is much nicer; it can show the exact
    spot of the error, rather than a vague "somewhere in this test you
    broke the &&-chain or had a bad here-doc" message.

  - the implementation in test-lib.sh runs for each test snippet. And
    since it requires a subshell, the extra cost is small but not zero.
    If chainlint.pl can reliably find the problem, we can optimize the
    test-lib.sh code.

The chainlint.pl code never intended to find here-doc problems. But
since it has to parse them anyway (to avoid reporting problems inside
here-docs), most of what we need is already there. We can detect the
problem when we fail to find the missing end-tag in swallow_heredocs().
The extra change in scan_heredoc_tag() stores the location of the start
of the here-doc, which lets us mark it as the source of the error in the
output (see the new tests for examples).

[jk: added commit message and tests]

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-30 13:07:29 -07:00
Eric Sunshine 73c768dae9 chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.

However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.

Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:

    msg="total: $(cd data; wc -w *.txt) words"

requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:

    cd data ; ?!AMP?! wc -w *.txt

However, the parent parse context will see everything inside the
double-quotes as a single string token:

    "total: $(cd data ; ?!AMP?! wc -w *.txt) words"

losing whatever positional information was attached to the ";" token
where the problem was detected.

One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:

    "total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"

and then extract the positional information when annotating the original
test definition.

However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.

Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:

    * indentation (and whitespace, in general)

    * comments

    * here-doc bodies

    * here-doc tag quoting (i.e. "\EOF")

    * line-splices (i.e. "\" at the end of a line)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 15:10:49 -05:00
Eric Sunshine 56066523ed t/chainlint: add more chainlint.pl self-tests
During the development of chainlint.pl, numerous new self-tests were
created to verify correct functioning beyond the checks already
represented by the existing self-tests. The new checks fall into several
categories:

* behavior of the lexical analyzer for complex cases, such as line
  splicing, token pasting, entering and exiting string contexts inside
  and outside of test script bodies; for instance:

    test_expect_success 'title' '
      x=$(echo "something" |
        sed -e '\''s/\\/\\\\/g'\'' -e '\''s/[[/.*^$]/\\&/g'\''
    '

* behavior of the parser for all compound grammatical constructs, such
  as `if...fi`, `case...esac`, `while...done`, `{...}`, etc., and for
  other legal shell grammatical constructs not covered by existing
  chainlint.sed self-tests, as well as complex cases, such as:

    OUT=$( ((large_git 1>&3) | :) 3>&1 ) &&

* detection of problems, such as &&-chain breakage, from top-level to
  any depth since the existing self-tests do not cover any top-level
  context and only cover subshells one level deep due to limitations of
  chainlint.sed

* address blind spots in chainlint.sed (such as not detecting a broken
  &&-chain on a one-line for-loop in a subshell[1]) which chainlint.pl
  correctly detects

* real-world cases which tripped up chainlint.pl during its development

[1]: https://lore.kernel.org/git/dce35a47012fecc6edc11c68e91dbb485c5bc36f.1661663880.git.gitgitgadget@gmail.com/

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:41 -07:00
Eric Sunshine ae0c55abf8 chainlint.pl: allow || echo to signal failure upstream of a pipe
The use of `|| return` (or `|| exit`) to signal failure within a loop
isn't effective when the loop is upstream of a pipe since the pipe
swallows all upstream exit codes and returns only the exit code of the
final command in the pipeline.

To work around this limitation, tests may adopt an alternative strategy
of signaling failure by emitting text which would never be emitted in
the non-failing case. For instance:

    while condition
    do
        command1 &&
        command2 ||
        echo "impossible text"
    done |
    sort >actual &&

Such usage indicates deliberate thought about failure cases by the test
author, thus flagging them as missing `|| return` (or `|| exit`) is not
helpful. Therefore, take this case into consideration when checking for
explicit loop termination.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:41 -07:00
Eric Sunshine fd4094c3ca chainlint.pl: complain about loops lacking explicit failure handling
Shell `for` and `while` loops do not terminate automatically just
because a command fails within the loop body. Instead, the loop
continues to iterate and eventually returns the exit status of the final
command of the final iteration, which may not be the command which
failed, thus it is possible for failures to go undetected. Consequently,
it is important for test authors to explicitly handle failure within the
loop body by terminating the loop manually upon failure. This can be
done by returning a non-zero exit code from within the loop body
(i.e. `|| return 1`) or exiting (i.e. `|| exit 1`) if the loop is within
a subshell, or by manually checking `$?` and taking some appropriate
action. Therefore, add logic to detect and complain about loops which
lack explicit `return` or `exit`, or `$?` check.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:41 -07:00
Eric Sunshine 832c68b3c2 chainlint.pl: don't flag broken &&-chain if failure indicated explicitly
There are quite a few tests which print an error messages and then
explicitly signal failure with `false`, `return 1`, or `exit 1` as the
final command in an `if` branch. In these cases, the tests don't bother
maintaining the &&-chain between `echo` and the explicit "test failed"
indicator. Since such constructs are manually signaling failure, their
&&-chain breakage is legitimate and safe -- both for the command
immediately preceding `false`, `return`, or `exit`, as well as for all
preceding commands in the `if` branch. Therefore, stop flagging &&-chain
breakage in these sorts of cases.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:41 -07:00
Eric Sunshine a8f30ee050 chainlint.pl: don't flag broken &&-chain if $? handled explicitly
There are cases in which tests capture and check a command's exit code
explicitly without employing test_expect_code(). They do so by
intentionally breaking the &&-chain since it would be impossible to
capture "$?" in the failing case if the `status=$?` assignment was part
of the &&-chain. Since such constructs are manually checking the exit
code, their &&-chain breakage is legitimate and safe, thus should not be
flagged. Therefore, stop flagging &&-chain breakage in such cases.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:41 -07:00
Eric Sunshine aabc3258a1 chainlint.pl: don't require & background command to end with &&
The exit status of the `&` asynchronous operator which starts a command
in the background is unconditionally zero, and the few places in the
test scripts which launch commands asynchronously are not interested in
the exit status of the `&` operator (though they often capture the
background command's PID). As such, there is little value in complaining
about broken &&-chain for a command launched in the background, and
doing so would only make busy-work for test authors. Therefore, take
this special case into account when checking for &&-chain breakage.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:40 -07:00
Eric Sunshine d00113ec34 t/Makefile: apply chainlint.pl to existing self-tests
Now that chainlint.pl is functional, take advantage of the existing
chainlint self-tests to validate its operation. (While at it, stop
validating chainlint.sed against the self-tests since it will soon be
retired.)

Due to chainlint.sed implementation limitations leaking into the
self-test "expect" files, a few of them require minor adjustment to make
them compatible with chainlint.pl which does not share those
limitations.

First, because `sed` does not provide any sort of real recursion,
chainlint.sed only emulates recursion into subshells, and each level of
recursion leads to a multiplicative increase in complexity of the `sed`
rules. To avoid substantial complexity, chainlint.sed, therefore, only
emulates subshell recursion one level deep. Any subshell deeper than
that is passed through as-is, which means that &&-chains are not checked
in deeper subshells. chainlint.pl, on the other hand, employs a proper
recursive descent parser, thus checks subshells to any depth and
correctly flags broken &&-chains in deep subshells.

Second, due to sed's line-oriented nature, chainlint.sed, by necessity,
folds multi-line quoted strings into a single line. chainlint.pl, on the
other hand, employs a proper lexical analyzer which preserves quoted
strings as-is, including embedded newlines.

Furthermore, the output of chainlint.sed and chainlint.pl do not match
precisely in terms of whitespace. However, since the purpose of the
self-checks is to verify that the ?!AMP?! annotations are being
correctly added, minor whitespace differences are immaterial. For this
reason, rather than adjusting whitespace in all existing self-test
"expect" files to match the new linter's output, the `check-chainlint`
target ignores whitespace differences. Since `diff -w` is not POSIX,
`check-chainlint` attempts to employ `git diff -w`, and only falls back
to non-POSIX `diff -w` (and `-u`) if `git diff` is not available.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:40 -07:00
Eric Sunshine 35ebb1e37b chainlint.pl: don't require return|exit|continue to end with &&
In order to check for &&-chain breakage, each time TestParser encounters
a new command, it checks whether the previous command ends with `&&`,
and -- with a couple exceptions -- signals breakage if it does not. The
first exception is that a command may validly end with `||`, which is
commonly employed as `command || return 1` at the very end of a loop
body to terminate the loop early. The second is that piping one
command's output with `|` to another command does not constitute a
&&-chain break (the exit status of the pipe is the exit status of the
final command in the pipe).

However, it turns out that there are a few additional cases found in the
wild in which it is likely safe for `&&` to be missing even when other
commands follow. For instance:

    while {condition-1}
    do
        test {condition-2} || return 1 # or `exit 1` within a subshell
        more-commands
    done

    while {condition-1}
    do
        test {condition-2} || continue
        more-commands
    done

Such cases indicate deliberate thought about failure modes by the test
author, thus flagging them as breaking the &&-chain is not helpful.
Therefore, take these special cases into consideration when checking for
&&-chain breakage.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:07:40 -07:00
Eric Sunshine d73f5cfa89 chainlint.sed: stop splitting "(..." into separate lines "(" and "..."
Because `sed` is line-oriented, for ease of implementation, when
chainlint.sed encounters an opening subshell in which the first command
is cuddled with the "(", it splits the line into two lines: one
containing only "(", and the other containing whatever follows "(".
This allows chainlint.sed to get by with a single set of regular
expressions for matching shell statements rather than having to
duplicate each expression (one set for matching non-cuddled statements,
and one set for matching cuddled statements).

However, although syntactically and semantically immaterial, this
transformation has no value to test authors and might even confuse them
into thinking that the linter is misbehaving by inserting (whitespace)
line-noise into the shell code it is validating. Moreover, it also
allows an implementation detail of chainlint.sed to seep into the
chainlint self-test "expect" files, which potentially makes it difficult
to reuse the self-tests should a more capable chainlint ever be
developed.

To address these concerns, stop splitting cuddled "(..." into two lines.

Note that, as an implementation artifact, due to sed's line-oriented
nature, this change inserts a blank line at output time just before the
"(..." line is emitted. It would be possible to suppress this blank line
but doing so would add a fair bit of complexity to chainlint.sed.
Therefore, rather than suppressing the extra blank line, the Makefile's
`check-chainlint` target which runs the chainlint self-tests is instead
modified to ignore blank lines when comparing chainlint output against
the self-test "expect" output. This is a reasonable compromise for two
reasons. First, the purpose of the chainlint self-tests is to verify
that the ?!AMP?! annotations are being correctly added; precise
whitespace is immaterial. Second, by necessity, chainlint.sed itself
already throws away all blank lines within subshells since, when
checking for a broken &&-chain, it needs to check the final _statement_
in a subshell, not the final _line_ (which might be blank), thus it has
never made any attempt to precisely reproduce blank lines in its output.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine 31da22d1fd chainlint.sed: swallow comments consistently
When checking for broken a &&-chain, chainlint.sed knows that the final
statement in a subshell should not end with `&&`, so it takes care to
make a distinction between the final line which is an actual statement
and any lines which may be mere comments preceding the closing ')'. As
such, it swallows comment lines so that they do not interfere with the
&&-chain check.

However, since `sed` does not provide any sort of real recursion,
chainlint.sed only checks &&-chains in subshells one level deep; it
doesn't do any checking in deeper subshells or in `{...}` blocks within
subshells. Furthermore, on account of potential implementation
complexity, it doesn't check &&-chains within `case` arms.

Due to an oversight, it also doesn't swallow comments inside deep
subshells, `{...}` blocks, or `case` statements, which makes its output
inconsistent (swallowing comments in some cases but not others).
Unfortunately, this inconsistency seeps into the chainlint self-test
"expect" files, which potentially makes it difficult to reuse the
self-tests should a more capable chainlint ever be developed. Therefore,
teach chainlint.sed to consistently swallow comments in all cases.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine 34ba05c296 chainlint.sed: stop throwing away here-doc tags
The purpose of chainlint is to highlight problems it finds in test code
by inserting annotations at the location of each problem. Arbitrarily
eliding bits of the code it is checking is not helpful, yet this is
exactly what chainlint.sed does by cavalierly and unnecessarily dropping
the here-doc operator and tag; i.e. `cat <<TAG` becomes simply `cat` in
the output. This behavior can make it more difficult for the test writer
to align the annotated output of chainlint.sed with the original test
code. Address this by retaining here-doc tags.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine 22597af97d chainlint.sed: don't mistake << word in string as here-doc operator
Tighten here-doc recognition to prevent it from being fooled by text
which looks like a here-doc operator but happens merely to be the
content of a string, such as this real-world case from t7201:

    echo "<<<<<<< ours" &&
    echo ourside &&
    echo "=======" &&
    echo theirside &&
    echo ">>>>>>> theirs"

This problem went unnoticed because chainlint.sed is not a real parser,
but rather applies heuristics to pretend to understand shell code. In
this case, it saw what it thought was a here-doc operator (`<< ours`),
and fell off the end of the test looking for the closing tag "ours"
which it never found, thus swallowed the remainder of the test without
checking it for &&-chain breakage.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine 5be30d0cd3 chainlint.sed: drop subshell-closing ">" annotation
chainlint.sed inserts a ">" annotation at the beginning of a line to
signal that its heuristics have identified an end-of-subshell. This was
useful as a debugging aid during development of the script, but it has
no value to test writers and might even confuse them into thinking that
the linter is misbehaving by inserting line-noise into the shell code it
is validating. Moreover, its presence also potentially makes it
difficult to reuse the chainlint self-test "expect" output should a more
capable linter ever be developed. Therefore, drop the ">" annotation.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine 0d7131763e chainlint.sed: drop unnecessary distinction between ?!AMP?! and ?!SEMI?!
>From inception, when chainlint.sed encountered a line using semicolon to
separate commands rather than `&&`, it would insert a ?!SEMI?!
annotation at the beginning of the line rather ?!AMP?! even though the
&&-chain is also broken by the semicolon. Given a line such as:

    ?!SEMI?! cmd1; cmd2 &&

the ?!SEMI?! annotation makes it easier to see what the problem is than
if the output had been:

    ?!AMP?! cmd1; cmd2 &&

which might confuse the test author into thinking that the linter is
broken (since the line clearly ends with `&&`).

However, now that the ?!AMP?! an ?!SEMI?! annotations are inserted at
the point of breakage rather than at the beginning of the line, and
taking into account that both represent a broken &&-chain, there is
little reason to distinguish between the two. Using ?!AMP?! alone is
sufficient to point the test author at the problem. For instance, in:

    cmd1; ?!AMP?! cmd2 &&
    cmd3

it is clear that the &&-chain is broken between `cmd1` and `cmd2`.
Likewise, in:

    cmd1 && cmd2 ?!AMP?!
    cmd3

it is clear that the &&-chain is broken between `cmd2` and `cmd3`.
Finally, in:

    cmd1; ?!AMP?! cmd2 ?!AMP?!
    cmd3

it is clear that the &&-chain is broken between each command.

Hence, there is no longer a good reason to make a distinction between a
broken &&-chain due to a semicolon and a broken chain due to a missing
`&&` at end-of-line. Therefore, drop the ?!SEMI?! annotation and use
?!AMP?! exclusively.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine 3865a7e36d chainlint.sed: tolerate harmless ";" at end of last line in block
chainlint.sed flags ";" when used as a command terminator since it
breaks the &&-chain, thus can allow failures to go undetected. However,
when a command terminated by ";" is the last command in the body of a
compound statement, such as `command-2` in:

    if test $# -gt 1
    then
        command-1 &&
        command-2;
    fi

then the ";" is harmless and the exit code from `command-2` is passed
through untouched and becomes the exit code of the compound statement,
as if the ";" was not present. Therefore, tolerate a trailing ";" in
this position rather than complaining about broken &&-chain.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine fbd992b61b chainlint.sed: improve ?!SEMI?! placement accuracy
When chainlint.sed detects commands separated by a semicolon rather than
by `&&`, it places a ?!SEMI?! annotation at the beginning of the line.
However, this is an unusual location for programmers accustomed to error
messages (from compilers, for instance) indicating the exact point of
the problem. Therefore, relocate the ?!SEMI?! annotation to the location
of the semicolon in order to better direct the programmer's attention to
the source of the problem.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine db8c7a1cc0 chainlint.sed: improve ?!AMP?! placement accuracy
When chainlint.sed detects a broken &&-chain, it places an ?!AMP?!
annotation at the beginning of the line. However, this is an unusual
location for programmers accustomed to error messages (from compilers,
for instance) indicating the exact point of the problem. Therefore,
relocate the ?!AMP?! annotation to the end of the line in order to
better direct the programmer's attention to the source of the problem.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:29 -08:00
Eric Sunshine 0cca54c706 t/chainlint/one-liner: avoid overly intimate chainlint.sed knowledge
The purpose of chainlint.sed is to detect &&-chain breakage only within
subshells (one level deep); it doesn't bother checking for top-level
&&-chain breakage since the &&-chain checker built into t/test-lib.sh
should detect broken &&-chains outside of subshells by making them
magically exit with code 117.

Unfortunately, one of the chainlint.sed self-tests has overly intimate
knowledge of this particular division of responsibilities and only cares
about what chainlint.sed itself will produce, while ignoring the fact
that a more all-encompassing linter would complain about a broken
&&-chain outside the subshell. This makes it difficult to re-use the
test with a more capable chainlint implementation should one ever be
developed. Therefore, adjust the test and its "expected" output to
avoid being specific to the tunnel-vision of this one implementation.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:28 -08:00
Eric Sunshine 1ad0780a77 t/chainlint/*.test: generalize self-test commentary
The purpose of chainlint.sed is to detect &&-chain breakage only within
subshells (one level deep); it doesn't bother checking for top-level
&&-chain breakage since the &&-chain checker built into t/test-lib.sh
should detect broken &&-chains outside of subshells by making them
magically exit with code 117. However, this division of labor may not
always be the case if a more capable chainlint implementation is ever
developed. Beyond that, due to being sed-based and due to its use of
heuristics, chainlint.sed has several limitations (such as being unable
to detect &&-chain breakage in subshells more than one level deep since
it only manually emulates recursion into a subshell).

Some of the comments in the chainlint self-tests unnecessarily reflect
the limitations of chainlint.sed even though those limitations are not
what is being tested. Therefore, simplify and generalize the comments to
explain only what is being tested, thus ensuring that they won't become
outdated if a more capable chainlint is ever developed.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:28 -08:00
Eric Sunshine 5459bc1bbb t/chainlint/*.test: fix invalid test cases due to mixing quote types
The chainlint self-test code snippets are supposed to represent the body
of a test_expect_success() or test_expect_failure(), yet the contents of
a few tests would have caused the shell to report syntax errors had they
been real test bodies due to the mix of single- and double-quotes.
Although chainlint.sed, with its simplistic heuristics, is blind to this
problem, a future more robust chainlint implementation might not have
such a limitation. Therefore, stop mixing quote types haphazardly in
those tests and unify quoting throughout. While at it, drop chunks of
tests which merely repeat what is already tested elsewhere but with
alternative quotes.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:28 -08:00
Eric Sunshine 925d7304d2 t/chainlint/*.test: don't use invalid shell syntax
The chainlint self-test code snippets are supposed to represent the body
of a test_expect_success() or test_expect_failure(), yet the contents of
these tests would have caused the shell to report syntax errors had they
been real test bodies. Although chainlint.sed, with its simplistic
heuristics, is blind to these syntactic problems, a future more robust
chainlint implementation might not have such a limitation, so make these
snippets syntactically valid.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 14:15:28 -08:00
Eric Sunshine 3042b6bb59 chainlint: match "quoted" here-doc tags
A here-doc tag can be quoted ('EOF'/"EOF") or escaped (\EOF) to suppress
interpolation within the body. chainlint recognizes single-quoted and
escaped tags, but does not know about double-quoted tags. For
completeness, teach it to recognize double-quoted tags, as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 10:57:38 -07:00
Eric Sunshine 4f69176feb chainlint: add test of pathological case which triggered false positive
This extract from contrib/subtree/t7900 triggered a false positive due
to three chainlint limitations:

* recognizing only a "blessed" set of here-doc tag names in a subshell
  ("EOF", "EOT", "INPUT_END"), of which "TXT" is not a member

* inability to recognize multi-line $(...) when the first statement of
  the body is cuddled with the opening "$("

* inability to recognize multiple constructs on a single line, such as
  opening a multi-line $(...) and starting a here-doc

Now that all of these shortcomings have been addressed, turn this rather
pathological bit of shell coding into a chainlint test case.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 12:22:12 -07:00
Eric Sunshine 22e3e0241a chainlint: recognize multi-line quoted strings more robustly
chainlint.sed recognizes multi-line quoted strings within subshells:

    echo "abc
        def" >out &&

so it can avoid incorrectly classifying lines internal to the string as
breaking the &&-chain. To identify the first line of a multi-line
string, it checks if the line contains a single quote. However, this is
fragile and can be easily fooled by a line containing multiple strings:

    echo "xyz" "abc
        def" >out &&

Make detection more robust by checking for an odd number of quotes
rather than only a single one.

(Escaped quotes are not handled, but support may be added later.)

The original multi-line string recognizer rather cavalierly threw away
all but the final quote, whereas the new one is careful to retain all
quotes, so the "expected" output of a couple existing chainlint tests is
updated to account for this new behavior.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 12:22:12 -07:00
Eric Sunshine d93871143f chainlint: let here-doc and multi-line string commence on same line
After swallowing a here-doc, chainlint.sed assumes that no other
processing needs to be done on the line aside from checking for &&-chain
breakage; likewise, after folding a multi-line quoted string. However,
it's conceivable (even if unlikely in practice) that both a here-doc and
a multi-line quoted string might commence on the same line:

    cat <<\EOF && echo "foo
    bar"
    data
    EOF

Support this case by sending the line (after swallowing and folding)
through the normal processing sequence rather than jumping directly to
the check for broken &&-chain.

This change also allows other somewhat pathological cases to be handled,
such as closing a subshell on the same line starting a here-doc:

    (
        cat <<-\INPUT)
        data
        INPUT

or, for instance, opening a multi-line $(...) expression on the same
line starting a here-doc:

    x=$(cat <<-\END &&
        data
        END
        echo "x")

among others.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 12:22:12 -07:00
Eric Sunshine 06fc5c9f90 chainlint: recognize multi-line $(...) when command cuddled with "$("
For multi-line $(...) expressions nested within subshells, chainlint.sed
only recognizes:

    x=$(
        echo foo &&
        ...

but it is not unlikely that test authors may also cuddle the command
with the opening "$(", so support that style, as well:

    x=$(echo foo &&
        ...

The closing ")" is already correctly recognized when cuddled or not.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 12:22:11 -07:00
Eric Sunshine 7e32a31b21 chainlint: match 'quoted' here-doc tags
A here-doc tag can be quoted ('EOF') or escaped (\EOF) to suppress
interpolation within the body. Although, chainlint recognizes escaped
tags, it does not know about quoted tags. For completeness, teach it to
recognize quoted tags, as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 12:22:11 -07:00
Eric Sunshine c2c29cc03e chainlint: match arbitrary here-docs tags rather than hard-coded names
chainlint.sed swallows top-level here-docs to avoid being fooled by
content which might look like start-of-subshell. It likewise swallows
here-docs in subshells to avoid marking content lines as breaking the
&&-chain, and to avoid being fooled by content which might look like
end-of-subshell, start-of-nested-subshell, or other specially-recognized
constructs.

At the time of implementation, it was believed that it was not possible
to support arbitrary here-doc tag names since 'sed' provides no way to
stash the opening tag name in a variable for later comparison against a
line signaling end-of-here-doc. Consequently, tag names are hard-coded,
with "EOF" being the only tag recognized at the top-level, and only
"EOF", "EOT", and "INPUT_END" being recognized within subshells. Also,
special care was taken to avoid being confused by here-docs nested
within other here-docs.

In practice, this limited number of hard-coded tag names has been "good
enough" for the 13000+ existing Git test, despite many of those tests
using tags other than the recognized ones, since the bodies of those
here-docs do not contain content which would fool the linter.
Nevertheless, the situation is not ideal since someone writing new
tests, and choosing a name not in the "blessed" set could potentially
trigger a false-positive.

To address this shortcoming, upgrade chainlint.sed to handle arbitrary
here-doc tag names, both at the top-level and within subshells.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 12:22:11 -07:00
Eric Sunshine 950079b7b6 t/chainlint: add chainlint "specialized" test cases
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.

In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 09:15:15 -07:00
Eric Sunshine 1f718b0b78 t/chainlint: add chainlint "complex" test cases
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.

In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 09:15:15 -07:00
Eric Sunshine 24c8618064 t/chainlint: add chainlint "cuddled" test cases
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.

In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 09:15:14 -07:00
Eric Sunshine ebcbbe060f t/chainlint: add chainlint "loop" and "conditional" test cases
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.

In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 09:15:14 -07:00
Eric Sunshine bb4efbc5df t/chainlint: add chainlint "nested subshell" test cases
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.

In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 09:15:14 -07:00
Eric Sunshine 90a880393a t/chainlint: add chainlint "one-liner" test cases
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.

In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 09:15:14 -07:00
Eric Sunshine 7b90679012 t/chainlint: add chainlint "whitespace" test cases
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.

In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 09:15:14 -07:00
Eric Sunshine 5238710eb4 t/chainlint: add chainlint "basic" test cases
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.

In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 09:15:14 -07:00