The "diff" drivers specified by the "diff" attribute attached to
paths can now specify which algorithm (e.g. histogram) to use.
* jc/diff-algo-attribute:
diff: teach diff to read algorithm from diff driver
diff: consolidate diff algorithm option parsing
It can be useful to specify diff algorithms per file type. For example,
one may want to use the minimal diff algorithm for .json files, another
for .c files, etc.
The diff machinery already checks attributes for a diff driver. Teach
the diff driver parser a new type "algorithm" to look for in the
config, which will be used if a driver has been specified through the
attributes.
Enforce precedence of the diff algorithm by favoring the command line
option, then looking at the driver attributes & config combination, then
finally the diff.algorithm config.
To enforce precedence order, use a new `ignore_driver_algorithm` member
during options parsing to indicate the diff algorithm was set via command
line args.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new kind of class was added in Java 17 -- sealed classes.[1] This
feature includes several new keywords that may appear in a declaration
of a class. New modifiers before name of the class: "sealed" and
"non-sealed", and a clause after name of the class marked by keyword
"permits".
The current set of regular expressions in userdiff.c already allows the
modifier "sealed" and the "permits" clause, but not the modifier
"non-sealed", which is the first hyphenated keyword in Java.[2] Allow
hyphen in the words that precede the name of type to match the
"non-sealed" modifier.
In new input file "java-sealed" for the test t4018-diff-funcname.sh, use
a Java code comment for the marker "RIGHT". This workaround is needed,
because the name of the sealed class appears on the line of code that
has the "ChangeMe" marker.
[1] Detailed description in "JEP 409: Sealed Classes"
https://openjdk.org/jeps/409
[2] "JEP draft: Keyword Management for the Java Language"
https://openjdk.org/jeps/8223002
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new kind of class was added in Java 16 -- records.[1] The syntax of
records is similar to regular classes with one important distinction:
the name of the record class is followed by a mandatory list of
components. The list is enclosed in parentheses, it may be empty, and
it may immediately follow the name of the class or type parameters, if
any, with or without separating whitespace. For example:
public record Example(int i, String s) {
}
public record WithTypeParameters<A, B>(A a, B b, String s) {
}
record SpaceBeforeComponents (String comp1, int comp2) {
}
Support records in the builtin userdiff pattern for Java. Add "record"
to the alternatives of keywords for kinds of class.
Allowing matching various possibilities for the type parameters and/or
list of the components of a record has already been covered by the
preceding patch.
[1] detailed description is available in "JEP 395: Records"
https://openjdk.org/jeps/395
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A class or interface in Java can have type parameters following the name
in the declared type, surrounded by angle brackets (paired less than and
greater than signs).[2] The type parameters -- `A` and `B` in the
examples -- may follow the class name immediately:
public class ParameterizedClass<A, B> {
}
or may be separated by whitespace:
public class SpaceBeforeTypeParameters <A, B> {
}
A part of the builtin userdiff pattern for Java matches declarations of
classes, enums, and interfaces. The regular expression requires at
least one whitespace character after the name of the declared type.
This disallows matching for opening angle bracket of type parameters
immediately after the name of the type. Mandatory whitespace after the
name of the type also disallows using the pattern in repositories with a
fairly common code style that puts braces for the body of a class on
separate lines:
class WithLineBreakBeforeOpeningBrace
{
}
Support matching Java code in more diverse code styles and declarations
of classes and interfaces with type parameters immediately following the
name of the type in the builtin userdiff pattern for Java. Do so by
just matching anything until the end of the line after the keywords for
the kind of type being declared.
[1] Since Java 5 released in 2004.
[2] Detailed description is available in the Java Language
Specification, sections "Type Variables" and "Parameterized Types":
https://docs.oracle.com/javase/specs/jls/se17/html/jls-4.html#jls-4.4
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The contents of the .gitattributes files may evolve over time, but "git
check-attr" always checks attributes against them in the working tree
and/or in the index. It may be beneficial to optionally allow the users
to check attributes taken from a commit other than HEAD against paths.
Add a new flag `--source` which will allow users to check the
attributes against a commit (actually any tree-ish would do). When the
user uses this flag, we go through the stack of .gitattributes files but
instead of checking the current working tree and/or in the index, we
check the blobs from the provided tree-ish object. This allows the
command to also be used in bare repositories.
Since we use a tree-ish object, the user can pass "--source
HEAD:subdirectory" and all the attributes will be looked up as if
subdirectory was the root directory of the repository.
We cannot simply use the `<rev>:<path>` syntax without the `--source`
flag, similar to how it is used in `git show` because any non-flag
parameter before `--` is treated as an attribute and any parameter after
`--` is treated as a pathname.
The change involves creating a new function `read_attr_from_blob`, which
given the path reads the blob for the path against the provided source and
parses the attributes line by line. This function is plugged into
`read_attr()` function wherein we go through the stack of attributes
files.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since f12fa9ee6c (userdiff: add and use for_each_userdiff_driver(),
2021-04-08), lookup of userdiffs is done with a generic
for_each_userdiff_driver(). But the name lookup doesn't use the "type"
field, of course.
We can't get rid of that field from the generic interface because it is
used by t/helper/test-userdiff.c. So mark it as unused in this instance
to silence -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The xfuncname pattern finds func/class declarations
in diffs to display as a hunk header. The word_regex
pattern finds individual tokens in Kotlin code to generate
appropriate diffs.
This patch adds xfuncname regex and word_regex for Kotlin
language.
Signed-off-by: Jaydeep P Das <jaydeepjd.8914@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the "struct userdiff_driver" assignmentns to use designated
initializers, but let's keep the PATTERNS() and IPATTERN() convenience
macros to avoid churn, but have them defined in terms of designated
initializers.
For the "driver_true" and "driver_false" let's have the compiler
implicitly initialize most of the fields, but let's leave a redundant
".binary = 0" for "driver_true" to make it obvious that it's the
opposite of the the ".binary = 1" for "driver_false".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation of digit-separating single-quotes introduced a
note-worthy regression: the change of a character literal with a
digit would splice the digit and the closing single-quote. For
example, the change from 'a' to '2' is now tokenized as
'[-a'-]{+2'+} instead of '[-a-]{+2+}'.
The options to fix the regression are:
- Tighten the regular expression such that the single-quote can only
occur between digits (that would match the official syntax).
- Remove support for digit separators.
I chose to remove support, because
- I have not seen a lot of code make use of digit separators.
- If code does use digit separators, then the numbers are typically
long. If a change in one of the segments occurs, it is actually
better visible if only that segment is highlighted as the word
that changed instead of the whole long number.
This choice does introduce another minor regression, though, which
is highlighted in the test case: when a change occurs in the second
or later segment of a hexadecimal number where the segment begins
with a digit, but also has letters, the segment is mistaken as
consisting of a number and an identifier. I can live with that.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since C++20, the language has a generalized comparison operator <=>.
Teach the cpp driver not to separate it into <= and > tokens.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since C++17, the single-quote can be used as digit separator:
3.141'592'654
1'000'000
0xdead'beaf
Make it known to the word regex of the cpp driver, so that numbers are
not split into separate tokens at the single-quotes.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Generally, word regex can be written such that they match tokens
liberally and need not model the actual syntax because it can be assumed
that the regex will only be applied to syntactically correct text.
The regex for cpp (C/C++) is too liberal, though. It regards these
sequences as single tokens:
1+2
1.5-e+2+f
and the following amalgams as one token:
.l as in str.length
.f as in str.find
.e as in str.erase
Tighten the regex in the following way:
- Accept + and - only in one position in the exponent. + and - are no
longer regarded as the sign of a number and are treated by the
catcher-all that is not visible in the driver's regex.
- Accept a leading decimal point only when it is followed by a digit.
For readability, factor hex- and binary numbers into an own term.
As a drive-by, this fixes that floating point numbers such as 12E5
(with upper-case E) were split into two tokens.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"enum" keyword will be introduced in PHP 8.1.
https://wiki.php.net/rfc/enumerations
Signed-off-by: USAMI Kenta <tadsan@zonu.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remind developers that the userdiff patterns should be kept simple
and permissive, assuming that the contents they apply are always
syntactically correct.
* jc/userdiff-pattern-hint:
userdiff: comment on the builtin patterns
Currently, the git diff hunk headers show the wrong method signature if the
method has a qualified return type, an array return type, or a generic return
type because the regex doesn't allow dots (.), [], or < and > in the return
type. Also, type parameter declarations couldn't be matched.
Add several t4018 tests asserting the right hunk headers for different cases:
- enum constant change
- change in generic method with bounded type parameters
- change in generic method with wildcard
- field change in a nested class
Signed-off-by: Tassilo Horn <tsdh@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remind developers that they do not need to go overboard to implement
patterns to prepare for invalid constructs. They only have to be
sufficiently permissive, assuming that the payload is syntactically
correct, and that may allow them to be simpler.
Text stolen mostly from, and further improved by, Johannes Sixt.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Records are added in C# 9
Code example :
public record Person(string FirstName, string LastName);
For more information, see:
* https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-9
Signed-off-by: Julian Verdurmen <julian.verdurmen@outlook.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A bit of code clean-up and a lot of test clean-up around userdiff
area.
* ab/userdiff-tests:
blame tests: simplify userdiff driver test
blame tests: don't rely on t/t4018/ directory
userdiff: remove support for "broken" tests
userdiff tests: list builtin drivers via test-tool
userdiff tests: explicitly test "default" pattern
userdiff: add and use for_each_userdiff_driver()
userdiff style: normalize pascal regex declaration
userdiff style: declare patterns with consistent style
userdiff style: re-order drivers in alphabetical order
Add a diff driver for Scheme-like languages which recognizes top level
and local `define` forms, whether it is a function definition, binding,
syntax definition or a user-defined `define-xyzzy` form.
Also supports R6RS `library` forms, `module` forms along with class and
struct declarations used in Racket (PLT Scheme).
Alternate "def" syntax such as those in Gerbil Scheme are also
supported, like defstruct, defsyntax and so on.
The rationale for picking `define` forms for the hunk headers is because
it is usually the only significant form for defining the structure of
the program, and it is a common pattern for schemers to have local
function definitions to hide their visibility, so it is not only the top
level `define`'s that are of interest. Schemers also extend the language
with macros to provide their own define forms (for example, something
like a `define-test-suite`) which is also captured in the hunk header.
Since it is common practice to extend syntax with variants of a form
like `module+`, `class*` etc, those have been supported as well.
The word regex is a best-effort attempt to conform to R7RS[1] valid
identifiers, symbols and numbers.
[1] https://small.r7rs.org/attachment/r7rs.pdf (section 2.1)
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the userdiff_find_by_namelen() function so that a new
for_each_userdiff_driver() API function does most of the work.
This will be useful for the same reason we've got other for_each_*()
API functions as part of various APIs, and will be used in a follow-up
commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Declare the pascal pattern consistently with how we declare the
others, not having "\n" on one line by itself, but as part of the
pattern, and when there are alterations have the "|" at the start, not
end of the line.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change those patterns which were declared with a regex on the same
line as the "PATTERNS()" line to put that regex on the next line, and
add missing "/* -- */" separator comments between the pattern and
word_regex.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Address some old code smell and move around the built-in userdiff
drivers so they're both in alphabetical order, and now in the same
order they appear in the gitattributes(5) documentation.
The two started drifting in be58e70dba (diff: unify external diff and
funcname parsing code, 2008-10-05), and then even further in
80c49c3de2 (color-words: make regex configurable via attributes,
2009-01-17) when the "cpp" pattern was added.
There are no functional changes here, and as --color-moved will show
only moved existing lines.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support POSIX, bashism and mixed function declarations, all four
compound command types, trailing comments and mixed whitespace.
Even though Bash allows locale-dependent characters in function names
<https://unix.stackexchange.com/a/245336/3645>, only detect function
names with characters allowed by POSIX.1-2017
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235>
for simplicity. This should cover the vast majority of use cases, and
produces system-agnostic results.
Since a word pattern has to be specified, but there is no easy way to
know the default word pattern, use the default `IFS` characters for a
starter. A later patch can improve this.
Signed-off-by: Victor Engmark <victor@engmark.name>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The regex used for the CSS builtin diff driver in git is only
able to show chunk headers for lines that start with a number,
a letter or an underscore.
However, the regex fails to detect classes (starts with a .), ids
(starts with a #), :root and attribute-value based selectors (for
example [class*="col-"]), as well as @based block-level statements
like @page,@keyframes and @media since all of them, start with a
special character.
Allow the selectors and block level statements to begin with these
special characters.
Signed-off-by: Sohom Datta <sohom.datta@learner.manipal.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
PHP permits functions to be defined like
final public function foo() { }
abstract protected function bar() { }
but our hunk header pattern does not recognize these decorations.
Add "final" and "abstract" to the list of function modifiers.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Javier Spagnoletti <phansys@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The third part of the Fortran xfuncname regex wants to match the
beginning of a subroutine or function, so it allows for all characters
except `'`, `"` or whitespace before the keyword 'function' or
'subroutine'. This is meant to match the 'recursive', 'elemental' or
'pure' keywords, as well as function return types, and to prevent
matches inside strings.
However, the negated set does not contain the `!` comment character,
so a line with an end-of-line comment containing the keyword 'function' or
'subroutine' followed by another word is mistakenly chosen as a hunk header.
Improve the regex by adding `!` to the negated set.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Fortran userdiff patterns, introduced in 909a5494f8 (userdiff.c: add
builtin fortran regex patterns, 2010-09-10), predate the test
infrastructure for xfuncname patterns, introduced in bfa7d01413 (t4018:
an infrastructure to test hunk headers, 2014-03-21).
Add tests for the Fortran xfuncname patterns. The test
't/t4018/fortran-comment-keyword' documents a shortcoming of the regex
that is fixed in a subsequent commit.
While at it, add descriptive comments for the different parts of the
regex.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's typical to find Markdown documentation alongside source code, and
having better context for documentation changes is useful; see also
commit 69f9c87d4 (userdiff: add support for Fountain documents,
2015-07-21).
The pattern is based on the CommonMark specification 0.29, section 4.2
<https://spec.commonmark.org/> but doesn't match empty headings, as
seeing them in a hunk header is unlikely to be useful.
Only ATX headings are supported, as detecting setext headings would
require printing the line before a pattern matches, or matching a
multiline pattern. The word-diff pattern is the same as the pattern for
HTML, because many Markdown parsers accept inline HTML.
Signed-off-by: Ash Holland <ash@sorrel.sh>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.
Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).
When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The regex failed to compile on FreeBSD.
Also add /* -- */ mark to separate the two regex entries given to
the PATTERNS() macro, to make it consistent with patterns for other
content types.
Signed-off-by: Ed Maste <emaste@FreeBSD.org>
Reviewed-by: Jeff King <peff@peff.net>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The userdiff machinery has been taught that "async def" is another
way to begin a "function" in Python.
* jh/userdiff-python-async:
userdiff: support Python async functions
Python's async functions (declared with "async def" rather than "def")
were not being displayed in hunk headers. This commit teaches git about
the async function syntax, and adds tests for the Python userdiff regex.
Signed-off-by: Josh Holland <anowlcalledjosh@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adds support for xfuncref in Elixir[1] language which is Ruby-like
language that runs on Erlang[3] Virtual Machine (BEAM).
[1]: https://elixir-lang.org
[2]: https://www.erlang.org
Signed-off-by: Łukasz Niemier <lukasz@niemier.pl>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While reviewing some dts diffs recently I noticed that the hunk header
logic was failing to find the containing node. This is because the regex
doesn't consider properties that may span multiple lines, i.e.
property = <something>,
<something_else>;
and it got hung up on comments inside nodes that look like the root node
because they start with '/*'. Add tests for these cases and update the
regex to find them. Maybe detecting the root node is too complicated but
forcing it to be a backslash with any amount of whitespace up to an open
bracket seemed OK. I tried to detect that a comment is in-between the
two parts but I wasn't happy so I just dropped it.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Linux kernel receives many patches to the devicetree files each
release. The hunk header for those patches typically show nothing,
making it difficult to figure out what node is being modified without
applying the patch or opening the file and seeking to the context. Let's
add a builtin 'dts' pattern to git so that users can get better diff
output on dts files when they use the diff=dts driver.
The regex has been constructed based on the spec at devicetree.org[1]
and with some help from Johannes Sixt.
[1] https://github.com/devicetree-org/devicetree-specification/releases/latest
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pattern "git diff/grep" use to extract funcname and words
boundary for Rust has been added.
* ml/userdiff-rust:
userdiff: two simplifications of patterns for rust
userdiff: add built-in pattern for rust
- Do not enforce (but assume) syntactic correctness of language
constructs that go into hunk headers: we only want to ensure that
the keywords actually are words and not just the initial part of
some identifier.
- In the word regex, match numbers only when they begin with a digit,
but then be liberal in what follows, assuming that the text that is
matched is syntactially correct.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Octave pattern is almost the same as matlab, except
that '%%%' and '##' can also be used to begin code sections,
in addition to '%%' that is understood by both. Octave
pattern is merged into Matlab pattern. Test cases for
the hunk header patterns of matlab and octave under
t/t4018 are added.
Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>