t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
#!/usr/bin/env perl
|
|
|
|
#
|
|
|
|
# Copyright (c) 2021-2022 Eric Sunshine <sunshine@sunshineco.com>
|
|
|
|
#
|
|
|
|
# This tool scans shell scripts for test definitions and checks those tests for
|
|
|
|
# problems, such as broken &&-chains, which might hide bugs in the tests
|
|
|
|
# themselves or in behaviors being exercised by the tests.
|
|
|
|
#
|
|
|
|
# Input arguments are pathnames of shell scripts containing test definitions,
|
|
|
|
# or globs referencing a collection of scripts. For each problem discovered,
|
|
|
|
# the pathname of the script containing the test is printed along with the test
|
2024-09-10 04:10:12 +00:00
|
|
|
# name and the test body with a `?!LINT: ...?!` annotation at the location of
|
|
|
|
# each detected problem, where "..." is an explanation of the problem. Returns
|
|
|
|
# zero if no problems are discovered, otherwise non-zero.
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
|
|
|
|
use warnings;
|
|
|
|
use strict;
|
2022-09-01 00:29:44 +00:00
|
|
|
use Config;
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
use File::Glob;
|
|
|
|
use Getopt::Long;
|
|
|
|
|
2022-09-01 00:29:44 +00:00
|
|
|
my $jobs = -1;
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
my $show_stats;
|
|
|
|
my $emit_all;
|
|
|
|
|
2022-09-01 00:29:40 +00:00
|
|
|
# Lexer tokenizes POSIX shell scripts. It is roughly modeled after section 2.3
|
|
|
|
# "Token Recognition" of POSIX chapter 2 "Shell Command Language". Although
|
|
|
|
# similar to lexical analyzers for other languages, this one differs in a few
|
|
|
|
# substantial ways due to quirks of the shell command language.
|
|
|
|
#
|
|
|
|
# For instance, in many languages, newline is just whitespace like space or
|
|
|
|
# TAB, but in shell a newline is a command separator, thus a distinct lexical
|
|
|
|
# token. A newline is significant and returned as a distinct token even at the
|
|
|
|
# end of a shell comment.
|
|
|
|
#
|
|
|
|
# In other languages, `1+2` would typically be scanned as three tokens
|
|
|
|
# (`1`, `+`, and `2`), but in shell it is a single token. However, the similar
|
|
|
|
# `1 + 2`, which embeds whitepace, is scanned as three token in shell, as well.
|
|
|
|
# In shell, several characters with special meaning lose that meaning when not
|
|
|
|
# surrounded by whitespace. For instance, the negation operator `!` is special
|
|
|
|
# when standing alone surrounded by whitespace; whereas in `foo!uucp` it is
|
|
|
|
# just a plain character in the longer token "foo!uucp". In many other
|
|
|
|
# languages, `"string"/foo:'string'` might be scanned as five tokens ("string",
|
|
|
|
# `/`, `foo`, `:`, and 'string'), but in shell, it is just a single token.
|
|
|
|
#
|
|
|
|
# The lexical analyzer for the shell command language is also somewhat unusual
|
|
|
|
# in that it recursively invokes the parser to handle the body of `$(...)`
|
|
|
|
# expressions which can contain arbitrary shell code. Such expressions may be
|
|
|
|
# encountered both inside and outside of double-quoted strings.
|
|
|
|
#
|
|
|
|
# The lexical analyzer is responsible for consuming shell here-doc bodies which
|
|
|
|
# extend from the line following a `<<TAG` operator until a line consisting
|
|
|
|
# solely of `TAG`. Here-doc consumption begins when a newline is encountered.
|
|
|
|
# It is legal for multiple here-doc `<<TAG` operators to be present on a single
|
|
|
|
# line, in which case their bodies must be present one following the next, and
|
|
|
|
# are consumed in the (left-to-right) order the `<<TAG` operators appear on the
|
|
|
|
# line. A special complication is that the bodies of all here-docs must be
|
|
|
|
# consumed when the newline is encountered even if the parse context depth has
|
|
|
|
# changed. For instance, in `cat <<A && x=$(cat <<B &&\n`, bodies of here-docs
|
|
|
|
# "A" and "B" must be consumed even though "A" was introduced outside the
|
|
|
|
# recursive parse context in which "B" was introduced and in which the newline
|
|
|
|
# is encountered.
|
|
|
|
package Lexer;
|
|
|
|
|
|
|
|
sub new {
|
|
|
|
my ($class, $parser, $s) = @_;
|
|
|
|
bless {
|
|
|
|
parser => $parser,
|
|
|
|
buff => $s,
|
2022-11-11 07:34:53 +00:00
|
|
|
lineno => 1,
|
2022-09-01 00:29:40 +00:00
|
|
|
heretags => []
|
|
|
|
} => $class;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub scan_heredoc_tag {
|
|
|
|
my $self = shift @_;
|
|
|
|
${$self->{buff}} =~ /\G(-?)/gc;
|
|
|
|
my $indented = $1;
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
my $token = $self->scan_token();
|
|
|
|
return "<<$indented" unless $token;
|
|
|
|
my $tag = $token->[0];
|
2022-09-01 00:29:40 +00:00
|
|
|
$tag =~ s/['"\\]//g;
|
2023-03-30 19:30:31 +00:00
|
|
|
$$token[0] = $indented ? "\t$tag" : "$tag";
|
|
|
|
push(@{$self->{heretags}}, $token);
|
2022-09-01 00:29:40 +00:00
|
|
|
return "<<$indented$tag";
|
|
|
|
}
|
|
|
|
|
|
|
|
sub scan_op {
|
|
|
|
my ($self, $c) = @_;
|
|
|
|
my $b = $self->{buff};
|
|
|
|
return $c unless $$b =~ /\G(.)/sgc;
|
|
|
|
my $cc = $c . $1;
|
|
|
|
return scan_heredoc_tag($self) if $cc eq '<<';
|
|
|
|
return $cc if $cc =~ /^(?:&&|\|\||>>|;;|<&|>&|<>|>\|)$/;
|
|
|
|
pos($$b)--;
|
|
|
|
return $c;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub scan_sqstring {
|
|
|
|
my $self = shift @_;
|
|
|
|
${$self->{buff}} =~ /\G([^']*'|.*\z)/sgc;
|
2022-11-11 07:34:53 +00:00
|
|
|
my $s = $1;
|
|
|
|
$self->{lineno} += () = $s =~ /\n/sg;
|
|
|
|
return "'" . $s;
|
2022-09-01 00:29:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
sub scan_dqstring {
|
|
|
|
my $self = shift @_;
|
|
|
|
my $b = $self->{buff};
|
|
|
|
my $s = '"';
|
|
|
|
while (1) {
|
|
|
|
# slurp up non-special characters
|
|
|
|
$s .= $1 if $$b =~ /\G([^"\$\\]+)/gc;
|
|
|
|
# handle special characters
|
|
|
|
last unless $$b =~ /\G(.)/sgc;
|
|
|
|
my $c = $1;
|
|
|
|
$s .= '"', last if $c eq '"';
|
|
|
|
$s .= '$' . $self->scan_dollar(), next if $c eq '$';
|
|
|
|
if ($c eq '\\') {
|
|
|
|
$s .= '\\', last unless $$b =~ /\G(.)/sgc;
|
|
|
|
$c = $1;
|
2022-11-11 07:34:53 +00:00
|
|
|
$self->{lineno}++, next if $c eq "\n"; # line splice
|
2022-09-01 00:29:40 +00:00
|
|
|
# backslash escapes only $, `, ", \ in dq-string
|
|
|
|
$s .= '\\' unless $c =~ /^[\$`"\\]$/;
|
|
|
|
$s .= $c;
|
|
|
|
next;
|
|
|
|
}
|
|
|
|
die("internal error scanning dq-string '$c'\n");
|
|
|
|
}
|
2022-11-11 07:34:53 +00:00
|
|
|
$self->{lineno} += () = $s =~ /\n/sg;
|
2022-09-01 00:29:40 +00:00
|
|
|
return $s;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub scan_balanced {
|
|
|
|
my ($self, $c1, $c2) = @_;
|
|
|
|
my $b = $self->{buff};
|
|
|
|
my $depth = 1;
|
|
|
|
my $s = $c1;
|
|
|
|
while ($$b =~ /\G([^\Q$c1$c2\E]*(?:[\Q$c1$c2\E]|\z))/gc) {
|
|
|
|
$s .= $1;
|
|
|
|
$depth++, next if $s =~ /\Q$c1\E$/;
|
|
|
|
$depth--;
|
|
|
|
last if $depth == 0;
|
|
|
|
}
|
2022-11-11 07:34:53 +00:00
|
|
|
$self->{lineno} += () = $s =~ /\n/sg;
|
2022-09-01 00:29:40 +00:00
|
|
|
return $s;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub scan_subst {
|
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens = $self->{parser}->parse(qr/^\)$/);
|
|
|
|
$self->{parser}->next_token(); # closing ")"
|
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub scan_dollar {
|
|
|
|
my $self = shift @_;
|
|
|
|
my $b = $self->{buff};
|
|
|
|
return $self->scan_balanced('(', ')') if $$b =~ /\G\((?=\()/gc; # $((...))
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
return '(' . join(' ', map {$_->[0]} $self->scan_subst()) . ')' if $$b =~ /\G\(/gc; # $(...)
|
2022-09-01 00:29:40 +00:00
|
|
|
return $self->scan_balanced('{', '}') if $$b =~ /\G\{/gc; # ${...}
|
|
|
|
return $1 if $$b =~ /\G(\w+)/gc; # $var
|
|
|
|
return $1 if $$b =~ /\G([@*#?$!0-9-])/gc; # $*, $1, $$, etc.
|
|
|
|
return '';
|
|
|
|
}
|
|
|
|
|
|
|
|
sub swallow_heredocs {
|
|
|
|
my $self = shift @_;
|
|
|
|
my $b = $self->{buff};
|
|
|
|
my $tags = $self->{heretags};
|
|
|
|
while (my $tag = shift @$tags) {
|
2022-11-11 07:34:53 +00:00
|
|
|
my $start = pos($$b);
|
2023-03-30 19:30:31 +00:00
|
|
|
my $indent = $$tag[0] =~ s/^\t// ? '\\s*' : '';
|
|
|
|
$$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
|
|
|
|
if (pos($$b) > $start) {
|
|
|
|
my $body = substr($$b, $start, pos($$b) - $start);
|
2024-07-10 08:38:31 +00:00
|
|
|
$self->{parser}->{heredocs}->{$$tag[0]} = {
|
|
|
|
content => substr($body, 0, length($body) - length($&)),
|
|
|
|
start_line => $self->{lineno},
|
|
|
|
};
|
2023-03-30 19:30:31 +00:00
|
|
|
$self->{lineno} += () = $body =~ /\n/sg;
|
|
|
|
next;
|
|
|
|
}
|
2024-09-10 04:10:12 +00:00
|
|
|
push(@{$self->{parser}->{problems}}, ['HEREDOC', $tag]);
|
2023-03-30 19:30:31 +00:00
|
|
|
$$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
|
2022-11-11 07:34:53 +00:00
|
|
|
my $body = substr($$b, $start, pos($$b) - $start);
|
|
|
|
$self->{lineno} += () = $body =~ /\n/sg;
|
2023-03-30 19:30:31 +00:00
|
|
|
last;
|
2022-09-01 00:29:40 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
sub scan_token {
|
|
|
|
my $self = shift @_;
|
|
|
|
my $b = $self->{buff};
|
|
|
|
my $token = '';
|
2022-11-11 07:34:53 +00:00
|
|
|
my ($start, $startln);
|
2022-09-01 00:29:40 +00:00
|
|
|
RESTART:
|
2022-11-11 07:34:53 +00:00
|
|
|
$startln = $self->{lineno};
|
2022-09-01 00:29:40 +00:00
|
|
|
$$b =~ /\G[ \t]+/gc; # skip whitespace (but not newline)
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
$start = pos($$b) || 0;
|
2022-11-11 07:34:53 +00:00
|
|
|
$self->{lineno}++, return ["\n", $start, pos($$b), $startln, $startln] if $$b =~ /\G#[^\n]*(?:\n|\z)/gc; # comment
|
2022-09-01 00:29:40 +00:00
|
|
|
while (1) {
|
|
|
|
# slurp up non-special characters
|
|
|
|
$token .= $1 if $$b =~ /\G([^\\;&|<>(){}'"\$\s]+)/gc;
|
|
|
|
# handle special characters
|
|
|
|
last unless $$b =~ /\G(.)/sgc;
|
|
|
|
my $c = $1;
|
chainlint: tighten accuracy when consuming input stream
To extract the next token in the input stream, Lexer::scan_token() finds
the start of the token by skipping whitespace, then consumes characters
belonging to the token until it encounters a non-token character, such
as an operator, punctuation, or whitespace. In the case of an operator
or punctuation which ends a token, before returning the just-scanned
token, it pushes that operator or punctuation character back onto the
input stream to ensure that it will be the first character consumed by
the next call to scan_token().
However, scan_token() is intentionally lax when whitespace ends a token;
it doesn't bother pushing the whitespace character back onto the token
stream since it knows that the next call to scan_token() will, as its
first step, skip over whitespace anyhow when looking for the start of
the token.
Although such laxity is harmless for the proper functioning of the
lexical analyzer, it does make it difficult to precisely identify the
token's end position in the input stream. Accurate token position
information may be desirable, for instance, to annotate problems or
highlight other interesting facets of the input found during the parsing
phase. To accommodate such possibilities, tighten scan_token() by making
it push the token-ending whitespace character back onto the input
stream, just as it does for other token-ending characters.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:28 +00:00
|
|
|
pos($$b)--, last if $c =~ /^[ \t]$/; # whitespace ends token
|
2022-09-01 00:29:40 +00:00
|
|
|
pos($$b)--, last if length($token) && $c =~ /^[;&|<>(){}\n]$/;
|
|
|
|
$token .= $self->scan_sqstring(), next if $c eq "'";
|
|
|
|
$token .= $self->scan_dqstring(), next if $c eq '"';
|
|
|
|
$token .= $c . $self->scan_dollar(), next if $c eq '$';
|
2022-11-11 07:34:53 +00:00
|
|
|
$self->{lineno}++, $self->swallow_heredocs(), $token = $c, last if $c eq "\n";
|
2022-09-01 00:29:40 +00:00
|
|
|
$token = $self->scan_op($c), last if $c =~ /^[;&|<>]$/;
|
|
|
|
$token = $c, last if $c =~ /^[(){}]$/;
|
|
|
|
if ($c eq '\\') {
|
|
|
|
$token .= '\\', last unless $$b =~ /\G(.)/sgc;
|
|
|
|
$c = $1;
|
2022-11-11 07:34:53 +00:00
|
|
|
$self->{lineno}++, next if $c eq "\n" && length($token); # line splice
|
|
|
|
$self->{lineno}++, goto RESTART if $c eq "\n"; # line splice
|
2022-09-01 00:29:40 +00:00
|
|
|
$token .= '\\' . $c;
|
|
|
|
next;
|
|
|
|
}
|
|
|
|
die("internal error scanning character '$c'\n");
|
|
|
|
}
|
2022-11-11 07:34:53 +00:00
|
|
|
return length($token) ? [$token, $start, pos($$b), $startln, $self->{lineno}] : undef;
|
2022-09-01 00:29:40 +00:00
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:41 +00:00
|
|
|
# ShellParser parses POSIX shell scripts (with minor extensions for Bash). It
|
|
|
|
# is a recursive descent parser very roughly modeled after section 2.10 "Shell
|
|
|
|
# Grammar" of POSIX chapter 2 "Shell Command Language".
|
|
|
|
package ShellParser;
|
|
|
|
|
|
|
|
sub new {
|
|
|
|
my ($class, $s) = @_;
|
|
|
|
my $self = bless {
|
|
|
|
buff => [],
|
|
|
|
stop => [],
|
2024-07-10 08:38:31 +00:00
|
|
|
output => [],
|
|
|
|
heredocs => {},
|
2024-09-10 04:10:12 +00:00
|
|
|
insubshell => 0,
|
2022-09-01 00:29:41 +00:00
|
|
|
} => $class;
|
|
|
|
$self->{lexer} = Lexer->new($self, $s);
|
|
|
|
return $self;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub next_token {
|
|
|
|
my $self = shift @_;
|
|
|
|
return pop(@{$self->{buff}}) if @{$self->{buff}};
|
|
|
|
return $self->{lexer}->scan_token();
|
|
|
|
}
|
|
|
|
|
|
|
|
sub untoken {
|
|
|
|
my $self = shift @_;
|
|
|
|
push(@{$self->{buff}}, @_);
|
|
|
|
}
|
|
|
|
|
|
|
|
sub peek {
|
|
|
|
my $self = shift @_;
|
|
|
|
my $token = $self->next_token();
|
|
|
|
return undef unless defined($token);
|
|
|
|
$self->untoken($token);
|
|
|
|
return $token;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub stop_at {
|
|
|
|
my ($self, $token) = @_;
|
|
|
|
return 1 unless defined($token);
|
|
|
|
my $stop = ${$self->{stop}}[-1] if @{$self->{stop}};
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
return defined($stop) && $token->[0] =~ $stop;
|
2022-09-01 00:29:41 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
sub expect {
|
|
|
|
my ($self, $expect) = @_;
|
|
|
|
my $token = $self->next_token();
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
return $token if defined($token) && $token->[0] eq $expect;
|
|
|
|
push(@{$self->{output}}, "?!ERR?! expected '$expect' but found '" . (defined($token) ? $token->[0] : "<end-of-input>") . "'\n");
|
2022-09-01 00:29:41 +00:00
|
|
|
$self->untoken($token) if defined($token);
|
|
|
|
return ();
|
|
|
|
}
|
|
|
|
|
|
|
|
sub optional_newlines {
|
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens;
|
|
|
|
while (my $token = $self->peek()) {
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
last unless $token->[0] eq "\n";
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens, $self->next_token());
|
|
|
|
}
|
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_group {
|
|
|
|
my $self = shift @_;
|
|
|
|
return ($self->parse(qr/^}$/),
|
|
|
|
$self->expect('}'));
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_subshell {
|
|
|
|
my $self = shift @_;
|
2024-09-10 04:10:12 +00:00
|
|
|
$self->{insubshell}++;
|
|
|
|
my @tokens = ($self->parse(qr/^\)$/),
|
|
|
|
$self->expect(')'));
|
|
|
|
$self->{insubshell}--;
|
|
|
|
return @tokens;
|
2022-09-01 00:29:41 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_case_pattern {
|
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens;
|
|
|
|
while (defined(my $token = $self->next_token())) {
|
|
|
|
push(@tokens, $token);
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
last if $token->[0] eq ')';
|
2022-09-01 00:29:41 +00:00
|
|
|
}
|
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_case {
|
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens;
|
|
|
|
push(@tokens,
|
|
|
|
$self->next_token(), # subject
|
|
|
|
$self->optional_newlines(),
|
|
|
|
$self->expect('in'),
|
|
|
|
$self->optional_newlines());
|
|
|
|
while (1) {
|
|
|
|
my $token = $self->peek();
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
last unless defined($token) && $token->[0] ne 'esac';
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens,
|
|
|
|
$self->parse_case_pattern(),
|
|
|
|
$self->optional_newlines(),
|
|
|
|
$self->parse(qr/^(?:;;|esac)$/)); # item body
|
|
|
|
$token = $self->peek();
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
last unless defined($token) && $token->[0] ne 'esac';
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens,
|
|
|
|
$self->expect(';;'),
|
|
|
|
$self->optional_newlines());
|
|
|
|
}
|
|
|
|
push(@tokens, $self->expect('esac'));
|
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_for {
|
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens;
|
|
|
|
push(@tokens,
|
|
|
|
$self->next_token(), # variable
|
|
|
|
$self->optional_newlines());
|
|
|
|
my $token = $self->peek();
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
if (defined($token) && $token->[0] eq 'in') {
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens,
|
|
|
|
$self->expect('in'),
|
|
|
|
$self->optional_newlines());
|
|
|
|
}
|
|
|
|
push(@tokens,
|
|
|
|
$self->parse(qr/^do$/), # items
|
|
|
|
$self->expect('do'),
|
|
|
|
$self->optional_newlines(),
|
|
|
|
$self->parse_loop_body(),
|
|
|
|
$self->expect('done'));
|
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_if {
|
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens;
|
|
|
|
while (1) {
|
|
|
|
push(@tokens,
|
|
|
|
$self->parse(qr/^then$/), # if/elif condition
|
|
|
|
$self->expect('then'),
|
|
|
|
$self->optional_newlines(),
|
|
|
|
$self->parse(qr/^(?:elif|else|fi)$/)); # if/elif body
|
|
|
|
my $token = $self->peek();
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
last unless defined($token) && $token->[0] eq 'elif';
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens, $self->expect('elif'));
|
|
|
|
}
|
|
|
|
my $token = $self->peek();
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
if (defined($token) && $token->[0] eq 'else') {
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens,
|
|
|
|
$self->expect('else'),
|
|
|
|
$self->optional_newlines(),
|
|
|
|
$self->parse(qr/^fi$/)); # else body
|
|
|
|
}
|
|
|
|
push(@tokens, $self->expect('fi'));
|
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_loop_body {
|
|
|
|
my $self = shift @_;
|
|
|
|
return $self->parse(qr/^done$/);
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_loop {
|
|
|
|
my $self = shift @_;
|
|
|
|
return ($self->parse(qr/^do$/), # condition
|
|
|
|
$self->expect('do'),
|
|
|
|
$self->optional_newlines(),
|
|
|
|
$self->parse_loop_body(),
|
|
|
|
$self->expect('done'));
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_func {
|
|
|
|
my $self = shift @_;
|
|
|
|
return ($self->expect('('),
|
|
|
|
$self->expect(')'),
|
|
|
|
$self->optional_newlines(),
|
|
|
|
$self->parse_cmd()); # body
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse_bash_array_assignment {
|
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens = $self->expect('(');
|
|
|
|
while (defined(my $token = $self->next_token())) {
|
|
|
|
push(@tokens, $token);
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
last if $token->[0] eq ')';
|
2022-09-01 00:29:41 +00:00
|
|
|
}
|
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
|
|
|
my %compound = (
|
|
|
|
'{' => \&parse_group,
|
|
|
|
'(' => \&parse_subshell,
|
|
|
|
'case' => \&parse_case,
|
|
|
|
'for' => \&parse_for,
|
|
|
|
'if' => \&parse_if,
|
|
|
|
'until' => \&parse_loop,
|
|
|
|
'while' => \&parse_loop);
|
|
|
|
|
|
|
|
sub parse_cmd {
|
|
|
|
my $self = shift @_;
|
|
|
|
my $cmd = $self->next_token();
|
|
|
|
return () unless defined($cmd);
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
return $cmd if $cmd->[0] eq "\n";
|
2022-09-01 00:29:41 +00:00
|
|
|
|
|
|
|
my $token;
|
|
|
|
my @tokens = $cmd;
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
if ($cmd->[0] eq '!') {
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens, $self->parse_cmd());
|
|
|
|
return @tokens;
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
} elsif (my $f = $compound{$cmd->[0]}) {
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens, $self->$f());
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
} elsif (defined($token = $self->peek()) && $token->[0] eq '(') {
|
|
|
|
if ($cmd->[0] !~ /\w=$/) {
|
2022-09-01 00:29:41 +00:00
|
|
|
push(@tokens, $self->parse_func());
|
|
|
|
return @tokens;
|
|
|
|
}
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
my @array = $self->parse_bash_array_assignment();
|
|
|
|
$tokens[-1]->[0] .= join(' ', map {$_->[0]} @array);
|
|
|
|
$tokens[-1]->[2] = $array[$#array][2] if @array;
|
2022-09-01 00:29:41 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
while (defined(my $token = $self->next_token())) {
|
|
|
|
$self->untoken($token), last if $self->stop_at($token);
|
|
|
|
push(@tokens, $token);
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
last if $token->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
|
2022-09-01 00:29:41 +00:00
|
|
|
}
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
push(@tokens, $self->next_token()) if $tokens[-1]->[0] ne "\n" && defined($token = $self->peek()) && $token->[0] eq "\n";
|
2022-09-01 00:29:41 +00:00
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub accumulate {
|
|
|
|
my ($self, $tokens, $cmd) = @_;
|
|
|
|
push(@$tokens, @$cmd);
|
|
|
|
}
|
|
|
|
|
|
|
|
sub parse {
|
|
|
|
my ($self, $stop) = @_;
|
|
|
|
push(@{$self->{stop}}, $stop);
|
|
|
|
goto DONE if $self->stop_at($self->peek());
|
|
|
|
my @tokens;
|
|
|
|
while (my @cmd = $self->parse_cmd()) {
|
|
|
|
$self->accumulate(\@tokens, \@cmd);
|
|
|
|
last if $self->stop_at($self->peek());
|
|
|
|
}
|
|
|
|
DONE:
|
|
|
|
pop(@{$self->{stop}});
|
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:42 +00:00
|
|
|
# TestParser is a subclass of ShellParser which, beyond parsing shell script
|
|
|
|
# code, is also imbued with semantic knowledge of test construction, and checks
|
|
|
|
# tests for common problems (such as broken &&-chains) which might hide bugs in
|
|
|
|
# the tests themselves or in behaviors being exercised by the tests. As such,
|
|
|
|
# TestParser is only called upon to parse test bodies, not the top-level
|
|
|
|
# scripts in which the tests are defined.
|
|
|
|
package TestParser;
|
|
|
|
|
|
|
|
use base 'ShellParser';
|
|
|
|
|
chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:30 +00:00
|
|
|
sub new {
|
|
|
|
my $class = shift @_;
|
|
|
|
my $self = $class->SUPER::new(@_);
|
|
|
|
$self->{problems} = [];
|
|
|
|
return $self;
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:42 +00:00
|
|
|
sub find_non_nl {
|
|
|
|
my $tokens = shift @_;
|
|
|
|
my $n = shift @_;
|
|
|
|
$n = $#$tokens if !defined($n);
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
$n-- while $n >= 0 && $$tokens[$n]->[0] eq "\n";
|
2022-09-01 00:29:42 +00:00
|
|
|
return $n;
|
|
|
|
}
|
|
|
|
|
|
|
|
sub ends_with {
|
|
|
|
my ($tokens, $needles) = @_;
|
|
|
|
my $n = find_non_nl($tokens);
|
|
|
|
for my $needle (reverse(@$needles)) {
|
|
|
|
return undef if $n < 0;
|
|
|
|
$n = find_non_nl($tokens, $n), next if $needle eq "\n";
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
return undef if $$tokens[$n]->[0] !~ $needle;
|
2022-09-01 00:29:42 +00:00
|
|
|
$n--;
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:45 +00:00
|
|
|
sub match_ending {
|
|
|
|
my ($tokens, $endings) = @_;
|
|
|
|
for my $needles (@$endings) {
|
|
|
|
next if @$tokens < scalar(grep {$_ ne "\n"} @$needles);
|
|
|
|
return 1 if ends_with($tokens, $needles);
|
|
|
|
}
|
|
|
|
return undef;
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:50 +00:00
|
|
|
sub parse_loop_body {
|
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens = $self->SUPER::parse_loop_body(@_);
|
|
|
|
# did loop signal failure via "|| return" or "|| exit"?
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
return @tokens if !@tokens || grep {$_->[0] =~ /^(?:return|exit|\$\?)$/} @tokens;
|
2022-09-01 00:29:51 +00:00
|
|
|
# did loop upstream of a pipe signal failure via "|| echo 'impossible
|
|
|
|
# text'" as the final command in the loop body?
|
|
|
|
return @tokens if ends_with(\@tokens, [qr/^\|\|$/, "\n", qr/^echo$/, qr/^.+$/]);
|
2022-09-01 00:29:50 +00:00
|
|
|
# flag missing "return/exit" handling explicit failure in loop body
|
|
|
|
my $n = find_non_nl(\@tokens);
|
2024-09-10 04:10:12 +00:00
|
|
|
push(@{$self->{problems}}, [$self->{insubshell} ? 'LOOPEXIT' : 'LOOPRETURN', $tokens[$n]]);
|
2022-09-01 00:29:50 +00:00
|
|
|
return @tokens;
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:45 +00:00
|
|
|
my @safe_endings = (
|
2022-09-01 00:29:47 +00:00
|
|
|
[qr/^(?:&&|\|\||\||&)$/],
|
2022-09-01 00:29:45 +00:00
|
|
|
[qr/^(?:exit|return)$/, qr/^(?:\d+|\$\?)$/],
|
|
|
|
[qr/^(?:exit|return)$/, qr/^(?:\d+|\$\?)$/, qr/^;$/],
|
|
|
|
[qr/^(?:exit|return|continue)$/],
|
|
|
|
[qr/^(?:exit|return|continue)$/, qr/^;$/]);
|
|
|
|
|
2022-09-01 00:29:42 +00:00
|
|
|
sub accumulate {
|
|
|
|
my ($self, $tokens, $cmd) = @_;
|
chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:30 +00:00
|
|
|
my $problems = $self->{problems};
|
2022-11-08 19:08:27 +00:00
|
|
|
|
|
|
|
# no previous command to check for missing "&&"
|
2022-09-01 00:29:42 +00:00
|
|
|
goto DONE unless @$tokens;
|
2022-11-08 19:08:27 +00:00
|
|
|
|
|
|
|
# new command is empty line; can't yet check if previous is missing "&&"
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
goto DONE if @$cmd == 1 && $$cmd[0]->[0] eq "\n";
|
2022-09-01 00:29:42 +00:00
|
|
|
|
2022-09-01 00:29:45 +00:00
|
|
|
# did previous command end with "&&", "|", "|| return" or similar?
|
|
|
|
goto DONE if match_ending($tokens, \@safe_endings);
|
2022-09-01 00:29:42 +00:00
|
|
|
|
2022-09-01 00:29:48 +00:00
|
|
|
# if this command handles "$?" specially, then okay for previous
|
|
|
|
# command to be missing "&&"
|
|
|
|
for my $token (@$cmd) {
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
goto DONE if $token->[0] =~ /\$\?/;
|
2022-09-01 00:29:48 +00:00
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:49 +00:00
|
|
|
# if this command is "false", "return 1", or "exit 1" (which signal
|
|
|
|
# failure explicitly), then okay for all preceding commands to be
|
|
|
|
# missing "&&"
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
if ($$cmd[0]->[0] =~ /^(?:false|return|exit)$/) {
|
chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:30 +00:00
|
|
|
@$problems = grep {$_->[0] ne 'AMP'} @$problems;
|
2022-09-01 00:29:49 +00:00
|
|
|
goto DONE;
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:42 +00:00
|
|
|
# flag missing "&&" at end of previous command
|
|
|
|
my $n = find_non_nl($tokens);
|
chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:30 +00:00
|
|
|
push(@$problems, ['AMP', $tokens->[$n]]) unless $n < 0;
|
2022-09-01 00:29:42 +00:00
|
|
|
|
|
|
|
DONE:
|
|
|
|
$self->SUPER::accumulate($tokens, $cmd);
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:43 +00:00
|
|
|
# ScriptParser is a subclass of ShellParser which identifies individual test
|
|
|
|
# definitions within test scripts, and passes each test body through TestParser
|
|
|
|
# to identify possible problems. ShellParser detects test definitions not only
|
|
|
|
# at the top-level of test scripts but also within compound commands such as
|
|
|
|
# loops and function definitions.
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
package ScriptParser;
|
|
|
|
|
2022-09-01 00:29:43 +00:00
|
|
|
use base 'ShellParser';
|
|
|
|
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
sub new {
|
|
|
|
my $class = shift @_;
|
2022-09-01 00:29:43 +00:00
|
|
|
my $self = $class->SUPER::new(@_);
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
$self->{ntests} = 0;
|
chainlint: don't be fooled by "?!...?!" in test body
As originally implemented, chainlint did not collect structured
information about detected problems. Instead, it merely emitted raw
parse tokens (not the original test text), along with a "?!...?!"
annotation directly into the output stream each time a problem was
discovered. In order to report statistics (in --stats mode) and to
adjust its exit code to indicate success or failure, it merely counts
the number of times "?!...?!" appears in the output stream. An obvious
shortcoming of this approach is that it can be fooled by a legitimate
"?!...?!" sequence in the body of a test (though, only if an actual
problem is detected in the test).
The situation did not improve when 7c04aa7390 (chainlint: colorize
problem annotations and test delimiters, 2022-09-13) colored the
annotations after-the-fact by searching for "?!...?!" in the output
stream and inserting color codes. As above, a shortcoming is that this
approach can incorrectly color a legitimate "?!...?!" sequence in a test
body as if it is an error.
However, when 73c768dae9 (chainlint: annotate original test definition
rather than token stream, 2022-11-08) taught chainlint to output the
original test text verbatim, it started collecting structured
information about detected problems.
Now that it is available, take advantage of the structured problem
information to deterministically count the number of problems detected
and to color the annotations directly, rather than scanning the output
stream for "?!...?!" and performing these operations after-the-fact.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-10 04:10:11 +00:00
|
|
|
$self->{nerrs} = 0;
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
return $self;
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:43 +00:00
|
|
|
# extract the raw content of a token, which may be a single string or a
|
|
|
|
# composition of multiple strings and non-string character runs; for instance,
|
|
|
|
# `"test body"` unwraps to `test body`; `word"a b"42'c d'` to `worda b42c d`
|
|
|
|
sub unwrap {
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
my $token = (@_ ? shift @_ : $_)->[0];
|
2022-09-01 00:29:43 +00:00
|
|
|
# simple case: 'sqstring' or "dqstring"
|
|
|
|
return $token if $token =~ s/^'([^']*)'$/$1/;
|
|
|
|
return $token if $token =~ s/^"([^"]*)"$/$1/;
|
|
|
|
|
|
|
|
# composite case
|
|
|
|
my ($s, $q, $escaped);
|
|
|
|
while (1) {
|
|
|
|
# slurp up non-special characters
|
|
|
|
$s .= $1 if $token =~ /\G([^\\'"]*)/gc;
|
|
|
|
# handle special characters
|
|
|
|
last unless $token =~ /\G(.)/sgc;
|
|
|
|
my $c = $1;
|
|
|
|
$q = undef, next if defined($q) && $c eq $q;
|
|
|
|
$q = $c, next if !defined($q) && $c =~ /^['"]$/;
|
|
|
|
if ($c eq '\\') {
|
|
|
|
last unless $token =~ /\G(.)/sgc;
|
|
|
|
$c = $1;
|
|
|
|
$s .= '\\' if $c eq "\n"; # preserve line splice
|
|
|
|
}
|
|
|
|
$s .= $c;
|
|
|
|
}
|
|
|
|
return $s
|
|
|
|
}
|
|
|
|
|
2024-09-10 04:10:12 +00:00
|
|
|
sub format_problem {
|
|
|
|
local $_ = shift;
|
|
|
|
/^AMP$/ && return "missing '&&'";
|
|
|
|
/^LOOPRETURN$/ && return "missing '|| return 1'";
|
|
|
|
/^LOOPEXIT$/ && return "missing '|| exit 1'";
|
|
|
|
/^HEREDOC$/ && return 'unclosed heredoc';
|
|
|
|
die("unrecognized problem type '$_'\n");
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:43 +00:00
|
|
|
sub check_test {
|
|
|
|
my $self = shift @_;
|
2024-07-10 08:38:31 +00:00
|
|
|
my $title = unwrap(shift @_);
|
|
|
|
my $body = shift @_;
|
|
|
|
my $lineno = $body->[3];
|
|
|
|
$body = unwrap($body);
|
|
|
|
if ($body eq '-') {
|
|
|
|
my $herebody = shift @_;
|
|
|
|
$body = $herebody->{content};
|
|
|
|
$lineno = $herebody->{start_line};
|
|
|
|
}
|
2022-09-01 00:29:43 +00:00
|
|
|
$self->{ntests}++;
|
|
|
|
my $parser = TestParser->new(\$body);
|
|
|
|
my @tokens = $parser->parse();
|
chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:30 +00:00
|
|
|
my $problems = $parser->{problems};
|
chainlint: don't be fooled by "?!...?!" in test body
As originally implemented, chainlint did not collect structured
information about detected problems. Instead, it merely emitted raw
parse tokens (not the original test text), along with a "?!...?!"
annotation directly into the output stream each time a problem was
discovered. In order to report statistics (in --stats mode) and to
adjust its exit code to indicate success or failure, it merely counts
the number of times "?!...?!" appears in the output stream. An obvious
shortcoming of this approach is that it can be fooled by a legitimate
"?!...?!" sequence in the body of a test (though, only if an actual
problem is detected in the test).
The situation did not improve when 7c04aa7390 (chainlint: colorize
problem annotations and test delimiters, 2022-09-13) colored the
annotations after-the-fact by searching for "?!...?!" in the output
stream and inserting color codes. As above, a shortcoming is that this
approach can incorrectly color a legitimate "?!...?!" sequence in a test
body as if it is an error.
However, when 73c768dae9 (chainlint: annotate original test definition
rather than token stream, 2022-11-08) taught chainlint to output the
original test text verbatim, it started collecting structured
information about detected problems.
Now that it is available, take advantage of the structured problem
information to deterministically count the number of problems detected
and to color the annotations directly, rather than scanning the output
stream for "?!...?!" and performing these operations after-the-fact.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-10 04:10:11 +00:00
|
|
|
$self->{nerrs} += @$problems;
|
chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:30 +00:00
|
|
|
return unless $emit_all || @$problems;
|
2022-09-13 04:01:47 +00:00
|
|
|
my $c = main::fd_colors(1);
|
2024-09-10 04:10:13 +00:00
|
|
|
my ($erropen, $errclose) = -t 1 ? ("$c->{rev}$c->{red}", $c->{reset}) : ('?!', '?!');
|
chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:30 +00:00
|
|
|
my $start = 0;
|
|
|
|
my $checked = '';
|
|
|
|
for (sort {$a->[1]->[2] <=> $b->[1]->[2]} @$problems) {
|
|
|
|
my ($label, $token) = @$_;
|
|
|
|
my $pos = $token->[2];
|
2024-09-10 04:10:12 +00:00
|
|
|
my $err = format_problem($label);
|
chainlint: don't be fooled by "?!...?!" in test body
As originally implemented, chainlint did not collect structured
information about detected problems. Instead, it merely emitted raw
parse tokens (not the original test text), along with a "?!...?!"
annotation directly into the output stream each time a problem was
discovered. In order to report statistics (in --stats mode) and to
adjust its exit code to indicate success or failure, it merely counts
the number of times "?!...?!" appears in the output stream. An obvious
shortcoming of this approach is that it can be fooled by a legitimate
"?!...?!" sequence in the body of a test (though, only if an actual
problem is detected in the test).
The situation did not improve when 7c04aa7390 (chainlint: colorize
problem annotations and test delimiters, 2022-09-13) colored the
annotations after-the-fact by searching for "?!...?!" in the output
stream and inserting color codes. As above, a shortcoming is that this
approach can incorrectly color a legitimate "?!...?!" sequence in a test
body as if it is an error.
However, when 73c768dae9 (chainlint: annotate original test definition
rather than token stream, 2022-11-08) taught chainlint to output the
original test text verbatim, it started collecting structured
information about detected problems.
Now that it is available, take advantage of the structured problem
information to deterministically count the number of problems detected
and to color the annotations directly, rather than scanning the output
stream for "?!...?!" and performing these operations after-the-fact.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-10 04:10:11 +00:00
|
|
|
$checked .= substr($body, $start, $pos - $start);
|
|
|
|
$checked .= ' ' unless $checked =~ /\s$/;
|
2024-09-10 04:10:13 +00:00
|
|
|
$checked .= "${erropen}LINT: $err$errclose";
|
chainlint: don't be fooled by "?!...?!" in test body
As originally implemented, chainlint did not collect structured
information about detected problems. Instead, it merely emitted raw
parse tokens (not the original test text), along with a "?!...?!"
annotation directly into the output stream each time a problem was
discovered. In order to report statistics (in --stats mode) and to
adjust its exit code to indicate success or failure, it merely counts
the number of times "?!...?!" appears in the output stream. An obvious
shortcoming of this approach is that it can be fooled by a legitimate
"?!...?!" sequence in the body of a test (though, only if an actual
problem is detected in the test).
The situation did not improve when 7c04aa7390 (chainlint: colorize
problem annotations and test delimiters, 2022-09-13) colored the
annotations after-the-fact by searching for "?!...?!" in the output
stream and inserting color codes. As above, a shortcoming is that this
approach can incorrectly color a legitimate "?!...?!" sequence in a test
body as if it is an error.
However, when 73c768dae9 (chainlint: annotate original test definition
rather than token stream, 2022-11-08) taught chainlint to output the
original test text verbatim, it started collecting structured
information about detected problems.
Now that it is available, take advantage of the structured problem
information to deterministically count the number of problems detected
and to color the annotations directly, rather than scanning the output
stream for "?!...?!" and performing these operations after-the-fact.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-10 04:10:11 +00:00
|
|
|
$checked .= ' ' unless $pos >= length($body) ||
|
|
|
|
substr($body, $pos, 1) =~ /^\s/;
|
chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:30 +00:00
|
|
|
$start = $pos;
|
|
|
|
}
|
|
|
|
$checked .= substr($body, $start);
|
2022-11-11 07:34:54 +00:00
|
|
|
$checked =~ s/^/$lineno++ . ' '/mge;
|
|
|
|
$checked =~ s/^\d+ \n//;
|
|
|
|
$checked =~ s/^\d+/$c->{dim}$&$c->{reset}/mg;
|
2022-09-01 00:29:43 +00:00
|
|
|
$checked .= "\n" unless $checked =~ /\n$/;
|
2022-09-13 04:01:47 +00:00
|
|
|
push(@{$self->{output}}, "$c->{blue}# chainlint: $title$c->{reset}\n$checked");
|
2022-09-01 00:29:43 +00:00
|
|
|
}
|
|
|
|
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
sub parse_cmd {
|
2022-09-01 00:29:43 +00:00
|
|
|
my $self = shift @_;
|
|
|
|
my @tokens = $self->SUPER::parse_cmd();
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
return @tokens unless @tokens && $tokens[0]->[0] =~ /^test_expect_(?:success|failure)$/;
|
2022-09-01 00:29:43 +00:00
|
|
|
my $n = $#tokens;
|
chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.
In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:
package Token;
sub new {
my ($class, $lexeme, $start, $end) = @_;
bless [$lexeme, $start, $end] => $class;
}
sub as_string {
my $self = shift @_;
return $self->[0];
}
sub compare {
my ($x, $y) = @_;
if (UNIVERSAL::isa($y, 'Token')) {
return $x->[0] cmp $y->[0];
}
return $x->[0] cmp $y;
}
use overload (
'""' => 'as_string',
'cmp' => 'compare'
);
The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.
The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 19:08:29 +00:00
|
|
|
$n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
|
2024-07-10 08:38:31 +00:00
|
|
|
my $herebody;
|
|
|
|
if ($n >= 2 && $tokens[$n-1]->[0] eq '-' && $tokens[$n]->[0] =~ /^<<-?(.+)$/) {
|
|
|
|
$herebody = $self->{heredocs}->{$1};
|
|
|
|
$n--;
|
|
|
|
}
|
|
|
|
$self->check_test($tokens[1], $tokens[2], $herebody) if $n == 2; # title body
|
|
|
|
$self->check_test($tokens[2], $tokens[3], $herebody) if $n > 2; # prereq title body
|
2022-09-01 00:29:43 +00:00
|
|
|
return @tokens;
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# main contains high-level functionality for processing command-line switches,
|
|
|
|
# feeding input test scripts to ScriptParser, and reporting results.
|
|
|
|
package main;
|
|
|
|
|
|
|
|
my $getnow = sub { return time(); };
|
|
|
|
my $interval = sub { return time() - shift; };
|
|
|
|
if (eval {require Time::HiRes; Time::HiRes->import(); 1;}) {
|
|
|
|
$getnow = sub { return [Time::HiRes::gettimeofday()]; };
|
|
|
|
$interval = sub { return Time::HiRes::tv_interval(shift); };
|
|
|
|
}
|
|
|
|
|
2022-09-13 04:01:47 +00:00
|
|
|
# Restore TERM if test framework set it to "dumb" so 'tput' will work; do this
|
|
|
|
# outside of get_colors() since under 'ithreads' all threads use %ENV of main
|
|
|
|
# thread and ignore %ENV changes in subthreads.
|
|
|
|
$ENV{TERM} = $ENV{USER_TERM} if $ENV{USER_TERM};
|
|
|
|
|
2022-11-11 07:34:54 +00:00
|
|
|
my @NOCOLORS = (bold => '', rev => '', dim => '', reset => '', blue => '', green => '', red => '');
|
2022-09-13 04:01:47 +00:00
|
|
|
my %COLORS = ();
|
|
|
|
sub get_colors {
|
|
|
|
return \%COLORS if %COLORS;
|
2022-11-11 07:34:52 +00:00
|
|
|
if (exists($ENV{NO_COLOR})) {
|
2022-09-13 04:01:47 +00:00
|
|
|
%COLORS = @NOCOLORS;
|
|
|
|
return \%COLORS;
|
|
|
|
}
|
2022-11-11 07:34:52 +00:00
|
|
|
if ($ENV{TERM} =~ /xterm|xterm-\d+color|xterm-new|xterm-direct|nsterm|nsterm-\d+color|nsterm-direct/) {
|
|
|
|
%COLORS = (bold => "\e[1m",
|
|
|
|
rev => "\e[7m",
|
2022-11-11 07:34:54 +00:00
|
|
|
dim => "\e[2m",
|
2022-11-11 07:34:52 +00:00
|
|
|
reset => "\e[0m",
|
|
|
|
blue => "\e[34m",
|
|
|
|
green => "\e[32m",
|
|
|
|
red => "\e[31m");
|
|
|
|
return \%COLORS;
|
|
|
|
}
|
|
|
|
if (system("tput sgr0 >/dev/null 2>&1") == 0 &&
|
|
|
|
system("tput bold >/dev/null 2>&1") == 0 &&
|
|
|
|
system("tput rev >/dev/null 2>&1") == 0 &&
|
2022-11-11 07:34:54 +00:00
|
|
|
system("tput dim >/dev/null 2>&1") == 0 &&
|
2022-11-11 07:34:52 +00:00
|
|
|
system("tput setaf 1 >/dev/null 2>&1") == 0) {
|
|
|
|
%COLORS = (bold => `tput bold`,
|
|
|
|
rev => `tput rev`,
|
2022-11-11 07:34:54 +00:00
|
|
|
dim => `tput dim`,
|
2022-11-11 07:34:52 +00:00
|
|
|
reset => `tput sgr0`,
|
|
|
|
blue => `tput setaf 4`,
|
|
|
|
green => `tput setaf 2`,
|
|
|
|
red => `tput setaf 1`);
|
|
|
|
return \%COLORS;
|
|
|
|
}
|
|
|
|
%COLORS = @NOCOLORS;
|
2022-09-13 04:01:47 +00:00
|
|
|
return \%COLORS;
|
|
|
|
}
|
|
|
|
|
|
|
|
my %FD_COLORS = ();
|
|
|
|
sub fd_colors {
|
|
|
|
my $fd = shift;
|
|
|
|
return $FD_COLORS{$fd} if exists($FD_COLORS{$fd});
|
|
|
|
$FD_COLORS{$fd} = -t $fd ? get_colors() : {@NOCOLORS};
|
|
|
|
return $FD_COLORS{$fd};
|
|
|
|
}
|
|
|
|
|
2022-09-01 00:29:44 +00:00
|
|
|
sub ncores {
|
|
|
|
# Windows
|
2024-05-20 19:01:29 +00:00
|
|
|
if (exists($ENV{NUMBER_OF_PROCESSORS})) {
|
|
|
|
my $ncpu = $ENV{NUMBER_OF_PROCESSORS};
|
|
|
|
return $ncpu > 0 ? $ncpu : 1;
|
|
|
|
}
|
2022-09-01 00:29:44 +00:00
|
|
|
# Linux / MSYS2 / Cygwin / WSL
|
2024-05-20 19:01:29 +00:00
|
|
|
if (open my $fh, '<', '/proc/cpuinfo') {
|
|
|
|
my $cpuinfo = do { local $/; <$fh> };
|
|
|
|
close($fh);
|
2024-05-20 19:01:31 +00:00
|
|
|
if ($cpuinfo =~ /^n?cpus active\s*:\s*(\d+)/m) {
|
|
|
|
return $1 if $1 > 0;
|
|
|
|
}
|
2024-05-20 19:01:30 +00:00
|
|
|
my @matches = ($cpuinfo =~ /^(processor|CPU)[\s\d]*:/mg);
|
2024-05-20 19:01:29 +00:00
|
|
|
return @matches ? scalar(@matches) : 1;
|
|
|
|
}
|
2022-09-01 00:29:44 +00:00
|
|
|
# macOS & BSD
|
2024-05-20 19:01:29 +00:00
|
|
|
if ($^O =~ /(?:^darwin$|bsd)/) {
|
|
|
|
my $ncpu = qx/sysctl -n hw.ncpu/;
|
|
|
|
return $ncpu > 0 ? $ncpu : 1;
|
|
|
|
}
|
2022-09-01 00:29:44 +00:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
sub show_stats {
|
|
|
|
my ($start_time, $stats) = @_;
|
|
|
|
my $walltime = $interval->($start_time);
|
|
|
|
my ($usertime) = times();
|
|
|
|
my ($total_workers, $total_scripts, $total_tests, $total_errs) = (0, 0, 0, 0);
|
2022-09-13 04:01:47 +00:00
|
|
|
my $c = fd_colors(2);
|
|
|
|
print(STDERR $c->{green});
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
for (@$stats) {
|
|
|
|
my ($worker, $nscripts, $ntests, $nerrs) = @$_;
|
|
|
|
print(STDERR "worker $worker: $nscripts scripts, $ntests tests, $nerrs errors\n");
|
|
|
|
$total_workers++;
|
|
|
|
$total_scripts += $nscripts;
|
|
|
|
$total_tests += $ntests;
|
|
|
|
$total_errs += $nerrs;
|
|
|
|
}
|
2022-09-13 04:01:47 +00:00
|
|
|
printf(STDERR "total: %d workers, %d scripts, %d tests, %d errors, %.2fs/%.2fs (wall/user)$c->{reset}\n", $total_workers, $total_scripts, $total_tests, $total_errs, $walltime, $usertime);
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
sub check_script {
|
|
|
|
my ($id, $next_script, $emit) = @_;
|
|
|
|
my ($nscripts, $ntests, $nerrs) = (0, 0, 0);
|
|
|
|
while (my $path = $next_script->()) {
|
|
|
|
$nscripts++;
|
|
|
|
my $fh;
|
chainlint.pl: force CRLF conversion when opening input files
The lexer in chainlint.pl can't handle CRLF line endings; it complains
about an internal error in scan_token() if we see one. For example, in
our Windows CI environment:
$ perl chainlint.pl chainlint/for-loop.test | cat -v
Thread 2 terminated abnormally: internal error scanning character '^M'
This doesn't break "make check-chainlint" (yet), because we assemble a
concatenated input by passing the contents of each file through "sed".
And the "sed" we use will strip out the CRLFs. But the next patch is
going to rework this a bit, which does break check-chainlint on Windows.
Plus it's probably nicer to folks on Windows who might work on chainlint
itself and write new tests.
In theory we could fix the parser to handle this, but it's not really
worth the trouble. We should be able to ask the input layer to translate
the line endings for us. In fact, I'd expect this to happen by default,
as perl's documentation claims Win32 uses the ":unix:crlf" PERLIO layer
by default ("unix" here just refers to using read/write syscalls, and
then "crlf" layers the translation on top). However, this doesn't seem
to be the case in our Windows CI environment. I didn't dig into the
exact reason, but it is perhaps because we are using an msys build of
perl rather than a "true" Win32 build.
At any rate, it is easy-ish to just ask explicitly for the conversion.
In the above example, setting PERLIO=crlf in the environment is enough
to make it work. Curiously, though, this doesn't work when invoking
chainlint via "make". Again, I didn't dig into it, but it may have to do
with msys programs calling Windows programs or vice versa.
We can make it work consistently by just explicitly asking for CRLF
translation when we open the files. This will even work on non-Windows
platforms, though we wouldn't really expect to find CRLF files there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10 08:37:30 +00:00
|
|
|
unless (open($fh, "<:unix:crlf", $path)) {
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
$emit->("?!ERR?! $path: $!\n");
|
|
|
|
next;
|
|
|
|
}
|
|
|
|
my $s = do { local $/; <$fh> };
|
|
|
|
close($fh);
|
|
|
|
my $parser = ScriptParser->new(\$s);
|
|
|
|
1 while $parser->parse_cmd();
|
|
|
|
if (@{$parser->{output}}) {
|
2022-09-13 04:01:47 +00:00
|
|
|
my $c = fd_colors(1);
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
my $s = join('', @{$parser->{output}});
|
2022-09-13 04:01:47 +00:00
|
|
|
$emit->("$c->{bold}$c->{blue}# chainlint: $path$c->{reset}\n" . $s);
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
}
|
|
|
|
$ntests += $parser->{ntests};
|
chainlint: don't be fooled by "?!...?!" in test body
As originally implemented, chainlint did not collect structured
information about detected problems. Instead, it merely emitted raw
parse tokens (not the original test text), along with a "?!...?!"
annotation directly into the output stream each time a problem was
discovered. In order to report statistics (in --stats mode) and to
adjust its exit code to indicate success or failure, it merely counts
the number of times "?!...?!" appears in the output stream. An obvious
shortcoming of this approach is that it can be fooled by a legitimate
"?!...?!" sequence in the body of a test (though, only if an actual
problem is detected in the test).
The situation did not improve when 7c04aa7390 (chainlint: colorize
problem annotations and test delimiters, 2022-09-13) colored the
annotations after-the-fact by searching for "?!...?!" in the output
stream and inserting color codes. As above, a shortcoming is that this
approach can incorrectly color a legitimate "?!...?!" sequence in a test
body as if it is an error.
However, when 73c768dae9 (chainlint: annotate original test definition
rather than token stream, 2022-11-08) taught chainlint to output the
original test text verbatim, it started collecting structured
information about detected problems.
Now that it is available, take advantage of the structured problem
information to deterministically count the number of problems detected
and to color the annotations directly, rather than scanning the output
stream for "?!...?!" and performing these operations after-the-fact.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-10 04:10:11 +00:00
|
|
|
$nerrs += $parser->{nerrs};
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
}
|
|
|
|
return [$id, $nscripts, $ntests, $nerrs];
|
|
|
|
}
|
|
|
|
|
|
|
|
sub exit_code {
|
|
|
|
my $stats = shift @_;
|
|
|
|
for (@$stats) {
|
|
|
|
my ($worker, $nscripts, $ntests, $nerrs) = @$_;
|
|
|
|
return 1 if $nerrs;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
Getopt::Long::Configure(qw{bundling});
|
|
|
|
GetOptions(
|
|
|
|
"emit-all!" => \$emit_all,
|
2022-09-01 00:29:44 +00:00
|
|
|
"jobs|j=i" => \$jobs,
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
"stats|show-stats!" => \$show_stats) or die("option error\n");
|
2022-09-01 00:29:44 +00:00
|
|
|
$jobs = ncores() if $jobs < 1;
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
|
|
|
|
my $start_time = $getnow->();
|
|
|
|
my @stats;
|
|
|
|
|
|
|
|
my @scripts;
|
|
|
|
push(@scripts, File::Glob::bsd_glob($_)) for (@ARGV);
|
|
|
|
unless (@scripts) {
|
|
|
|
show_stats($start_time, \@stats) if $show_stats;
|
|
|
|
exit;
|
|
|
|
}
|
2024-07-10 08:35:57 +00:00
|
|
|
$jobs = @scripts if @scripts < $jobs;
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
|
2024-07-10 08:35:13 +00:00
|
|
|
unless ($jobs > 1 &&
|
|
|
|
$Config{useithreads} && eval {
|
2022-09-01 00:29:44 +00:00
|
|
|
require threads; threads->import();
|
|
|
|
require Thread::Queue; Thread::Queue->import();
|
|
|
|
1;
|
|
|
|
}) {
|
|
|
|
push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); }));
|
|
|
|
show_stats($start_time, \@stats) if $show_stats;
|
|
|
|
exit(exit_code(\@stats));
|
|
|
|
}
|
|
|
|
|
|
|
|
my $script_queue = Thread::Queue->new();
|
|
|
|
my $output_queue = Thread::Queue->new();
|
|
|
|
|
|
|
|
sub next_script { return $script_queue->dequeue(); }
|
|
|
|
sub emit { $output_queue->enqueue(@_); }
|
|
|
|
|
|
|
|
sub monitor {
|
|
|
|
while (my $s = $output_queue->dequeue()) {
|
|
|
|
print($s);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
my $mon = threads->create({'context' => 'void'}, \&monitor);
|
|
|
|
threads->create({'context' => 'list'}, \&check_script, $_, \&next_script, \&emit) for 1..$jobs;
|
|
|
|
|
|
|
|
$script_queue->enqueue(@scripts);
|
|
|
|
$script_queue->end();
|
|
|
|
|
|
|
|
for (threads->list()) {
|
|
|
|
push(@stats, $_->join()) unless $_ == $mon;
|
|
|
|
}
|
|
|
|
|
|
|
|
$output_queue->end();
|
|
|
|
$mon->join();
|
|
|
|
|
t: add skeleton chainlint.pl
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 00:29:39 +00:00
|
|
|
show_stats($start_time, \@stats) if $show_stats;
|
|
|
|
exit(exit_code(\@stats));
|