The following commit will make use of a Getopt::Long feature which is
only present in Perl >= 5.8.1. Document that as the minimum version we
support.
Many of our Perl scripts will continue to run with 5.8.0 but this change
allows us to adjust them as needed without breaking any promises to our
users.
The Perl requirement was last changed in d48b284183 (perl: bump the
required Perl version to 5.8 from 5.6.[21], 2010-09-24). At that time,
5.8.0 was 8 years old. It is now over 21 years old.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We try to flush the output from diff-highlight whenever we see a blank
line. That lets you see the output for each commit as soon as it is
generated, even if Git is still chugging away at a diff, or traversing
to find the next commit.
However, our "blank line" match checks length($_). That won't ever be
true, because we haven't chomped the line ending. As a result, we never
flush. Instead, let's use a simple regex which handles line endings in
with the end-of-line marker.
This has been broken since the initial version in 927a13fe87 (contrib:
add diff highlight script, 2011-10-18). Probably nobody noticed because:
- most output is big enough, or comes fast enough, that it flushes
anyway. And it can be difficult to notice the difference between
"show a commit, then pause" and "pause, then show two commits". I
only noticed because I was viewing "git log" output on a repo with a
very slow textconv filter.
- if stdout is going to the terminal (and not another pager like
less), then the flush isn't necessary. So any manual testing would
show it appearing to work.
You can easily see the difference with something like:
echo '* diff=slow' >>.gitattributes
git -c diff.slow.textconv='sleep 1; cat' \
-c pager.log='diff-highlight | less' \
log -p
That should generate one commit every second or so (more if it touches
multiple files), but without this patch it waits for many seconds before
generating several pages of output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This changes the indent from
"<tab><sp><sp><sp><sp><sp><sp><sp><sp>"
to
"<tab><tab>"
so that the statement lines up with the rest of the block.
Signed-off-by: Norman Rasmussen <norman@rasmussen.co.za>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use File::Spec->devnull() for output redirection to avoid messages
when Windows version of Perl is first in path. The message 'The
system cannot find the path specified.' is displayed each time git is
run to get colors.
Signed-off-by: Chris. Webster <chris@webstech.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes a corner case where diff-highlight may
scramble some diffs when combined with --graph.
Commit 7e4ffb4c17 (diff-highlight: add support for --graph
output, 2016-08-29) taught diff-highlight to skip past the
graph characters at the start of each line with this regex:
($COLOR?\|$COLOR?\s+)*
I.e., any series of pipes separated by and followed by
arbitrary whitespace. We need to match more than just a
single space because the commit in question may be indented
to accommodate other parts of the graph drawing. E.g.:
* commit 1234abcd
| ...
| diff --git ...
has only a single space, but for the last commit before a
fork:
| | |
| * | commit 1234abcd
| |/ ...
| | diff --git
the diff lines have more spaces between the pipes and the
start of the diff.
However, when we soak up all of those spaces with the
$GRAPH regex, we may accidentally include the leading space
for a context line. That means we may consider the actual
contents of a context line as part of the diff syntax. In
other words, something like this:
normal context line
-old line
+new line
-this is a context line with a leading dash
would cause us to see that final context line as a removal
line, and we'd end up showing the hunk in the wrong order:
normal context line
-old line
-this is a context line with a leading dash
+new line
Instead, let's a be a little more clever about parsing the
graph. We'll look for the actual "*" line that marks the
start of a commit, and record the indentation we see there.
Then we can skip past that indentation when checking whether
the line is a hunk header, removal, addition, etc.
There is one tricky thing: the indentation in bytes may be
different for various lines of the graph due to coloring.
E.g., the "*" on a commit line is generally shown without
color, but on the actual diff lines, it will be replaced
with a colorized "|" character, adding several bytes. We
work around this here by counting "visible" bytes. This is
unfortunately a bit more expensive, making us about twice as
slow to handle --graph output. But since this is meant to be
used interactively anyway, it's tolerably fast (and the
non-graph case is unaffected).
One alternative would be to search for hunk header lines and
use their indentation (since they'd have the same colors as
the diff lines which follow). But that just opens up
different corner cases. If we see:
| | @@ 1,2 1,3 @@
we cannot know if this is a real diff that has been
indented due to the graph, or if it's a context line that
happens to look like a diff header. We can only be sure of
the indent on the "*" lines, since we know those don't
contain arbitrary data (technically the user could include a
bunch of extra indentation via --format, but that's rare
enough to disregard).
Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current flush() helper only shows the queued diff but
does not clear the queue. This is conceptually a bug, but it
works because we only call it once at the end of the
program.
Let's teach it to clear the queue, which will let us use it
in more places (one for now, but more in future patches).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our tests send git's output directly to files or pipes, so
there will never be any color. Let's do at least one --color
test to make sure that we can handle this case (which we
currently can, but will be an easy thing to mess up when we
touch the graph code in a future patch).
We'll just cover the --graph case, since this is much more
complex than the earlier cases (i.e., if it manages to
highlight, then the non-graph case definitely would).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The graph test in t9400 covers the case of two simultaneous
branches, but all of the commits during this time are on the
right-hand branch. So we test a graph structure like:
| |
| * commit ...
| |
but we never see the reverse, a commit on the left-hand
branch:
| |
* | commit ...
| |
Since this is an easy thing to get wrong when touching the
graph-matching code, let's cover it by adding one more
commit with its timestamp interleaved with the other branch.
Note that we need to pass --date-order to convince Git to
show it this way (since --topo-order tries to keep lines of
history separate).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generate a bunch of one-line files whose contents match
their names, and then generate our commits by cat-ing those
files. Let's just echo the contents directly, which saves
some processes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The exact ordering output by Git may depend on the commit
timestamps, so let's make sure they're actually
monotonically increasing, and not all the same (or worse,
subject to how long the test script takes to run).
Let's use test_tick to make sure this is stable. Note that
we actually have to rearrange the order of the branches to
match the expected graph structure (which means that
previously we might racily have been testing a slightly
different output, though the test is written in such a way
that we'd still pass).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We actually branch "A" off of "D". The sample "--graph"
output is right, but the left-to-right diagram is
misleading. Let's fix it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that `make` produces a file, we should have a clean target to remove
it.
Signed-off-by: Daniel Watkins <daniel@daniel-watkins.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff-so-fancy project is also written in perl, and most
of its users pipe diffs through both diff-highlight and
diff-so-fancy. It would be nice if this could be done in a
single script. So let's pull most of diff-highlight's code
into its own module which can be used by diff-so-fancy.
In addition, we'll abstract a few basic items like reading
from stdio so that a script using the module can do more
processing before or after diff-highlight handles the lines.
See the README update for more details.
One small downside is that the diff-highlight script must
now be built using the Makefile. There are ways around this,
but it quickly gets into perl arcana. Let's go with the
simple solution. As a bonus, our Makefile now respects the
PERL_PATH variable if it is set.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The algorithm in diff-highlight only understands how to look
at two sides of a diff; it cannot correctly handle combined
diffs with multiple preimages. Often highlighting does not
trigger at all for these diffs because the line counts do
not match up. E.g., if we see:
- ours
-theirs
++resolved
we would not bother highlighting; it naively looks like a
single line went away, and then a separate hunk added
another single line.
But of course there are exceptions. E.g., if the other side
deleted the line, we might see:
- ours
++resolved
which looks like we dropped " ours" and added "+resolved".
This is only a small highlighting glitch (we highlight the
space and the "+" along with the content), but it's also the
tip of the iceberg. Even if we learned to find the true
content here (by noticing we are in a 3-way combined diff
and marking _two_ characters from the front of the line as
uninteresting), there are other more complicated cases where
we really do need to handle a 3-way hunk.
Let's just punt for now; we can recognize combined diffs by
the presence of extra "@" symbols in the hunk header, and
treat them as non-diff content.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have a test suite for diff highlight, we can
show off the improvements from 8d00662 (diff-highlight: do
not split multibyte characters, 2015-04-03).
While we're at it, we can also add another case that
_doesn't_ work: combining code points are treated as their
own unit, which means that we may stick colors between them
and the character they are modifying (with the result that
the color is not shown in an xterm, though it's possible
that other terminals err the other way, and show the color
but not the accent). There's no fix here, but let's
document it as a failure.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These are the same as in the normal t/.gitignore, with the
exception of ".prove", as our Makefile does not support it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"diff-highlight" (in contrib/) used to show byte-by-byte
differences, which meant that multi-byte characters can be chopped
in the middle. It learned to pay attention to character boundaries
(assuming the UTF-8 payload).
* jk/colors:
diff-highlight: do not split multibyte characters
When the input is UTF-8 and Perl is operating on bytes instead of
characters, a diff that changes one multibyte character to another
that shares an initial byte sequence will result in a broken diff
display as the common byte sequence prefix will be separated from
the rest of the bytes in the multibyte character.
For example, if a single line contains only the unicode character
U+C9C4 (encoded as UTF-8 0xEC, 0xA7, 0x84) and that line is then
changed to the unicode character U+C9C0 (encoded as UTF-8 0xEC,
0xA7, 0x80), when operating on bytes diff-highlight will show only
the single byte change from 0x84 to 0x80 thus creating invalid UTF-8
and a broken diff display.
Fix this by putting Perl into character mode when splitting the line
and then back into byte mode after the split is finished.
The utf8::xxx functions require Perl 5.8 so we require that as well.
Also, since we are mucking with code in the split_line function, we
change a '*' quantifier to a '+' quantifier when matching the $COLOR
expression which has the side effect of speeding everything up while
eliminating useless '' elements in the returned array.
Reported-by: Yi EungJun <semtlenori@gmail.com>
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"diff-highlight" filter (in contrib/) allows its color output
to be customized via configuration variables.
* jk/colors:
parse_color: drop COLOR_BACKGROUND macro
diff-highlight: allow configurable colors
parse_color: recognize "no$foo" to clear the $foo attribute
parse_color: support 24-bit RGB values
parse_color: refactor color storage
Until now, the highlighting colors were hard-coded in the
script (as "reverse" and "noreverse"), and you had to edit
the script to change them. This patch teaches diff-highlight
to read from color.diff-highlight.* to set them.
In addition, it expands the possiblities considerably by
adding two features:
1. Old/new lines can be colored independently (so you can
use a color scheme that complements existing line
coloring).
2. Normal, unhighlighted parts of the lines can be colored,
too. Technically this can be done by separately
configuring color.diff.old/new and matching it to your
diff-highlight colors. But you may want a different
look for your highlighted diffs versus your regular
diffs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While using diff-highlight with other tools, I have discovered that Python
ignores SIGPIPE by default. Unfortunately, this also means that tools
attempting to launch a pager under Python--and don't realize this is
happening--means that the subprocess inherits this setting. In this case, it
means diff-highlight will be launched with SIGPIPE being ignored. Let's work
with those broken scripts by restoring the default SIGPIPE handler.
Signed-off-by: John Szakmeister <john@szakmeister.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff-highlight script works on heuristics, so it can be
wrong. Let's document some of the wrong-ness in case
somebody feels like working on it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we only bother highlighting single-line hunks. The
rationale was that the purpose of highlighting is to point
out small changes between two similar lines that are
otherwise hard to see. However, that meant we missed similar
cases where two lines were changed together, like:
-foo(buf);
-bar(buf);
+foo(obj->buf);
+bar(obj->buf);
Each of those changes is simple, and would benefit from
highlighting (the "obj->" parts in this case).
This patch considers whole hunks at a time. For now, we
consider only the case where the hunk has the same number of
removed and added lines, and assume that the lines from each
segment correspond one-to-one. While this is just a
heuristic, in practice it seems to generate sensible
results (especially because we now omit highlighting on
completely-changed lines, so when our heuristic is wrong, we
tend to avoid highlighting at all).
Based on an original idea and implementation by Michał
Kiedrowicz.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current code structure assumes that we will only look at
a pair of lines at any given time, and that the end result
should always be to output that pair. However, we want to
eventually handle multi-line hunks, which will involve
collating pairs of removed/added lines. Let's refactor the
code to return highlighted pairs instead of printing them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have a change like:
-foo
+bar
we end up highlighting the entirety of both lines (since the
whole thing is changed). But the point of diff highlighting
is to pinpoint the specific change in a pair of lines that
are mostly identical. In this case, the highlighting is just
noise, since there is nothing to pinpoint, and we are better
off doing nothing.
The implementation looks for "interesting" pairs by checking
to see whether they actually have a matching prefix or
suffix that does not simply consist of colorization and
whitespace. However, the implementation makes it easy to
plug in other heuristics, too, like:
1. Depending on the source material, the set of "boring"
characters could be tweaked to include language-specific
stuff (like braces or semicolons for C).
2. Instead of saying "an interesting line has at least one
character of prefix or suffix", we could require that
less than N percent of the line be highlighted.
The simple "ignore whitespace, and highlight if there are
any matched characters" implemented by this patch seems to
give good results on git.git. I'll leave experimentation
with other heuristics to somebody who has a dataset that
does not look good with the current code.
Based on an original idea and implementation by Michał
Kiedrowicz.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These perl features can catch bugs, and we shouldn't be
violating any of the strict rules or creating any warnings,
so let's turn them on.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a simple and stupid script for highlighting
differing parts of lines in a unified diff. See the README
for a discussion of the limitations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>