Commit graph

124 commits

Author SHA1 Message Date
Junio C Hamano
9bf8e0c73d Merge branch 'pw/unquote-path-in-git-pm'
Code refactoring.

* pw/unquote-path-in-git-pm:
  t9700: add tests for Git::unquote_path()
  Git::unquote_path(): throw an exception on bad path
  Git::unquote_path(): handle '\a'
  add -i: move unquote_path() to Git.pm
2017-07-10 13:42:50 -07:00
Phillip Wood
1d542a5487 add -i: move unquote_path() to Git.pm
Move unquote_path() from git-add--interactive to Git.pm so it can be
used by other scripts. Note this is a straight copy, it does not
handle '\a'. That will be fixed in the next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 08:02:20 -07:00
Junio C Hamano
54e6ce5960 Merge branch 'jk/add-p-commentchar-fix'
"git add -p" were updated in 2.12 timeframe to cope with custom
core.commentchar but the implementation was buggy and a
metacharacter like $ and * did not work.

* jk/add-p-commentchar-fix:
  add--interactive: quote commentChar regex
  add--interactive: handle EOF in prompt_yesno
2017-06-26 14:09:31 -07:00
Jeff King
d85d7ecb80 add--interactive: quote commentChar regex
Since c9d961647 (i18n: add--interactive: mark
edit_hunk_manually message for translation, 2016-12-14),
when the user asks to edit a hunk manually, we respect
core.commentChar in generating the edit instructions.
However, when we then strip out comment lines, we use a
simple regex like:

  /^$commentChar/

If your chosen comment character is a regex metacharacter,
then that will behave in a confusing manner ("$", for
instance, would only eliminate blank lines, not actual
comment lines).

We can fix that by telling perl not to respect
metacharacters.

Reported-by: Christian Rösch <christian@croesch.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21 14:06:20 -07:00
Jeff King
d5addcf522 add--interactive: handle EOF in prompt_yesno
The prompt_yesno function loops indefinitely waiting for a
"y" or "n" response. But it doesn't handle EOF, meaning
that we can end up in an infinite loop of reading EOF from
stdin. One way to simulate that is with:

  echo e | GIT_EDITOR='echo corrupt >' git add -p

Let's break out of the loop and propagate the undef to the
caller. Without modifying the callers that effectively turns
it into a "no" response. This is reasonable for both of the
current callers, and it leaves room for any future caller to
check for undef explicitly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21 14:06:09 -07:00
Jeff King
1fa8a66bf7 add--interactive: drop diff.indentHeuristic handling
Now that diff.indentHeuristic is handled automatically by the plumbing
commands, there's no need to propagate it manually.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-09 12:24:35 +09:00
Junio C Hamano
34130cc06b Merge branch 'va/i18n-perl-scripts'
Message fix.

* va/i18n-perl-scripts:
  git-add--interactive.perl: add missing dot in a message
2017-04-19 21:37:17 -07:00
Ralf Thielow
0301f1fd92 git-add--interactive.perl: add missing dot in a message
One message appears twice in the translations and the only
difference is a dot at the end.  So add this dot to make
the messages being identical.

Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-13 18:01:15 -07:00
Junio C Hamano
153e0d762c Merge branch 'jk/add-i-use-pathspecs'
"git add -p <pathspec>" unnecessarily expanded the pathspec to a
list of individual files that matches the pathspec by running "git
ls-files <pathspec>", before feeding it to "git diff-index" to see
which paths have changes, because historically the pathspec
language supported by "diff-index" was weaker.  These days they are
equivalent and there is no reason to internally expand it.  This
helps both performance and avoids command line argument limit on
some platforms.

* jk/add-i-use-pathspecs:
  add--interactive: do not expand pathspecs with ls-files
2017-03-17 13:50:26 -07:00
Jeff King
7288e12cce add--interactive: do not expand pathspecs with ls-files
When we want to get the list of modified files, we first
expand any user-provided pathspecs with "ls-files", and then
feed the resulting list of paths as arguments to
"diff-index" and "diff-files". If your pathspec expands into
a large number of paths, you may run into one of two
problems:

  1. The OS may complain about the size of the argument
     list, and refuse to run. For example:

       $ (ulimit -s 128 && git add -p drivers)
       Can't exec "git": Argument list too long at .../git-add--interactive line 177.
       Died at .../git-add--interactive line 177.

     That's on the linux.git repository, which has about 20K
     files in the "drivers" directory (none of them modified
     in this case). The "ulimit -s" trick is necessary to
     show the problem on Linux even for such a gigantic set
     of paths. Other operating systems have much smaller
     limits (e.g., a real-world case was seen with only 5K
     files on OS X).

  2. Even when it does work, it's really slow. The pathspec
     code is not optimized for huge numbers of paths. Here's
     the same case without the ulimit:

       $ time git add -p drivers
       No changes.

       real	0m16.559s
       user	0m53.140s
       sys	0m0.220s

We can improve this by skipping "ls-files" completely, and
just feeding the original pathspecs to the diff commands.
This solution was discussed in 2010:

  http://public-inbox.org/git/20100105041438.GB12574@coredump.intra.peff.net/

but at the time the diff code's pathspecs were more
primitive than those used by ls-files (e.g., they did not
support globs). Making the change would have caused a
user-visible regression, so we didn't.

Since then, the pathspec code has been unified, and the diff
commands natively understand pathspecs like '*.c'.

This patch implements that solution. That skips the
argument-list limits, and the result runs much faster:

  $ time git add -p drivers
  No changes.

  real	0m0.149s
  user	0m0.116s
  sys	0m0.080s

There are two new tests. The first just exercises the
globbing behavior to confirm that we are not causing a
regression there. The second checks the actual argument
behavior using GIT_TRACE. We _could_ do it with the "ulimit
-s" trick, as above. But that would mean the test could only
run where "ulimit -s" works. And tests of that sort are
expensive, because we have to come up with enough files to
actually bust the limit (we can't just shrink the "128" down
infinitely, since it is also the in-program stack size).

Finally, two caveats and possibilities for future work:

  a. This fixes one argument-list expansion, but there may
     be others. In fact, it's very likely that if you run
     "git add -i" and select a large number of modified
     files that the script would try to feed them all to a
     single git command.

     In practice this is probably fine. The real issue here
     is that the argument list was growing with the _total_
     number of files, not the number of modified or selected
     files.

  b. If the repository contains filenames with literal wildcard
     characters (e.g., "foo*"), the original code expanded
     them via "ls-files" and then fed those wildcard names
     to "diff-index", which would have treated them as
     wildcards. This was a bug, which is now fixed (though
     unless you really go through some contortions with
     ":(literal)", it's likely that your original pathspec
     would match whatever the accidentally-expanded wildcard
     would anyway).

     So this takes us one step closer to working correctly
     with files whose names contain wildcard characters, but
     it's likely that others remain (e.g., if "git add -i"
     feeds the selected paths to "git add").

Reported-by: Wincent Colaiuta <win@wincent.com>
Reported-by: Mislav Marohnić <mislav.marohnic@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-14 13:27:23 -07:00
Jeff King
c852bd54bd add--interactive: fix missing file prompt for patch mode with "-i"
When invoked as "git add -i", each menu interactive menu
option prompts the user to select a list of files. This
includes the "patch" option, which gets the list before
starting the hunk-selection loop.

As "git add -p", it behaves differently, and jumps straight
to the hunk selection loop.

Since 0539d5e6d (i18n: add--interactive: mark patch prompt
for translation, 2016-12-14), the "add -i" case mistakenly
jumps to straight to the hunk-selection loop. Prior to that
commit the distinction between the two cases was managed by
the $patch_mode variable. That commit used $patch_mode for
something else, and moved the old meaning to the "$cmd"
variable.  But it forgot to update the $patch_mode check
inside patch_update_cmd() which controls the file-list
behavior.

The simplest fix would be to change that line to check $cmd.
But while we're here, let's use a less obscure name for this
flag: $patch_mode_only, a boolean which tells whether we are
in full-interactive mode or only in patch-mode.

Reported-by: Henrik Grubbström <grubba@grubba.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 10:10:38 -08:00
Junio C Hamano
c0588fd61a Merge branch 'rt/align-add-i-help-text'
Doc update.

* rt/align-add-i-help-text:
  git add -i: replace \t with blanks in the help message
2017-02-24 10:48:08 -08:00
Ralf Thielow
e519eccdf4 git add -i: replace \t with blanks in the help message
Within the help message of 'git add -i', the 'diff' command uses one
tab character and blanks to create the space between the name and the
description while the others use blanks only.  So if the tab size is
not at 4 characters, this description will not be in range.
Replace the tab character with blanks.

Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-22 12:51:00 -08:00
Junio C Hamano
2ced5f2c2d Merge branch 'jc/retire-compaction-heuristics'
"git diff" and its family had two experimental heuristics to shift
the contents of a hunk to make the patch easier to read.  One of
them turns out to be better than the other, so leave only the
"--indent-heuristic" option and remove the other one.

* jc/retire-compaction-heuristics:
  diff: retire "compaction" heuristics
2017-01-10 15:24:27 -08:00
Junio C Hamano
1d73f8e86d Merge branch 'va/i18n-perl-scripts'
Porcelain scripts written in Perl are getting internationalized.

* va/i18n-perl-scripts:
  i18n: difftool: mark warnings for translation
  i18n: send-email: mark composing message for translation
  i18n: send-email: mark string with interpolation for translation
  i18n: send-email: mark warnings and errors for translation
  i18n: send-email: mark strings for translation
  i18n: add--interactive: mark status words for translation
  i18n: add--interactive: remove %patch_modes entries
  i18n: add--interactive: mark edit_hunk_manually message for translation
  i18n: add--interactive: i18n of help_patch_cmd
  i18n: add--interactive: mark patch prompt for translation
  i18n: add--interactive: mark plural strings
  i18n: clean.c: match string with git-add--interactive.perl
  i18n: add--interactive: mark strings with interpolation for translation
  i18n: add--interactive: mark simple here-documents for translation
  i18n: add--interactive: mark strings for translation
  Git.pm: add subroutines for commenting lines
2016-12-27 00:11:40 -08:00
Junio C Hamano
3cde4e02ee diff: retire "compaction" heuristics
When a patch inserts a block of lines, whose last lines are the
same as the existing lines that appear before the inserted block,
"git diff" can choose any place between these existing lines as the
boundary between the pre-context and the added lines (adjusting the
end of the inserted block as appropriate) to come up with variants
of the same patch, and some variants are easier to read than others.

We have been trying to improve the choice of this boundary, and Git
2.11 shipped with an experimental "compaction-heuristic".  Since
then another attempt to improve the logic further resulted in a new
"indent-heuristic" logic.  It is agreed that the latter gives better
result overall, and the former outlived its usefulness.

Retire "compaction", and keep "indent" as an experimental feature.
The latter hopefully will be turned on by default in a future
release, but that should be done as a separate step.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-23 12:32:22 -08:00
Vasco Almeida
55aa04423f i18n: add--interactive: mark status words for translation
Mark words 'nothing', 'unchanged' and 'binary' used to display what has
been staged or not, in "git add -i" status command.

Alternatively one could mark N__('nothing') no-op in order to
xgettext(1) extract the string and then trigger the translation at run
time only with __($print->{FILE}), but that has the side effect of triggering
retrieval of translations for the changes indicator too (e.g. +2/-1)
which may or may not be a problem.

To avoid that potential problem, mark only where there is certain to
trigger translation only of those words but in this case we must also
retrieve the translation for the eq tests, since the value assigned was
of the translation, not the English source.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:05 -08:00
Vasco Almeida
ef84426308 i18n: add--interactive: remove %patch_modes entries
Remove unnecessary entries from %patch_modes. After the i18n conversion,
these entries are not used anymore.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:05 -08:00
Vasco Almeida
c9d9616471 i18n: add--interactive: mark edit_hunk_manually message for translation
Mark message of edit_hunk_manually displayed in the editing file when
user chooses 'e' option.  The message had to be unfolded to allow
translation of the $participle verb.

Some messages end up being exactly the same for some use cases, but
left it for easier change in the future, e.g., wanting to change wording
of one particular use case.

The comment character is now used according to the git configuration
core.commentchar.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:05 -08:00
Vasco Almeida
186b52c52a i18n: add--interactive: i18n of help_patch_cmd
Mark help message of help_patch_cmd for translation.  The message must
be unfolded to be free of variables so we can have high quality
translations.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:05 -08:00
Vasco Almeida
0539d5e6d5 i18n: add--interactive: mark patch prompt for translation
Mark prompt message assembled in place for translation, unfolding each
use case for each entry in the %patch_modes hash table.

Previously, this script relied on whether $patch_mode was set to run the
command patch_update_cmd() or show status and loop the main loop. Now,
it uses $cmd to indicate we must run patch_update_cmd() and $patch_mode
is used to tell which flavor of the %patch_modes are we on.  This is
introduced in order to be able to mark and unfold the message prompt
knowing in which context we are.

The tracking of context was done previously by point %patch_mode_flavour
hash table to the correct entry of %patch_modes, focusing only on value
of %patch_modes. Now, we are also interested in the key ('staged',
'stash', 'checkout_head', ...).

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:05 -08:00
Vasco Almeida
c4a85c3b8e i18n: add--interactive: mark plural strings
Mark plural strings for translation.  Unfold each action case in one
entire sentence.

Pass new keyword for xgettext to extract.

Update test to include new subroutine __n() for plural strings handling.

Update documentation to include a description of the new __n()
subroutine.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:05 -08:00
Vasco Almeida
13c58c1754 i18n: add--interactive: mark strings with interpolation for translation
Since at this point Git::I18N.perl lacks support for Perl i18n
placeholder substitution, use of sprintf following die or error_msg is
necessary for placeholder substitution take place.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:04 -08:00
Vasco Almeida
5fa832640a i18n: add--interactive: mark simple here-documents for translation
Mark messages in here-documents without interpolation for translation.

The here-document delimiter \EOF, which is the same as 'EOF', indicates
that the text is to be treated literally without interpolation of its
content.  Unfortunately xgettext is not able to extract here-documents
delimited with \EOF but it is with delimiter enclosed in single quotes.
So change \EOF to 'EOF', although in this case does not make
difference what variation of here-document to use since there is nothing
to interpolate.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:04 -08:00
Vasco Almeida
258e7790b3 i18n: add--interactive: mark strings for translation
Mark simple strings (without interpolation) for translation.

Brackets around first parameter of ternary operator is necessary because
otherwise xgettext fails to extract strings marked for translation from
the rest of the file.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14 11:00:04 -08:00
Michael Haggerty
433860f3d0 diff: improve positioning of add/delete blocks in diffs
Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:

    --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
    +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
    @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
     }

     if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
    +if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {
                            $smtp_server = $_;

The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:

    --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
    +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
    @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
            $initial_reply_to =~ s/(^\s+|\s+$)//g;
     }

    +if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
     if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {

This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.

The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.

Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.

This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.

Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:

1. It measures the following characteristics of a proposed splitting
   position in a `struct split_measurement`:

   * the number of blank lines above the proposed split
   * whether the line directly after the split is blank
   * the number of blank lines following that line
   * the indentation of the nearest non-blank line above the split
   * the indentation of the line directly below the split
   * the indentation of the nearest non-blank line after that line

2. It combines the measured attributes using a bunch of
   empirically-optimized weighting factors to derive a `struct
   split_score` that measures the "badness" of splitting the text at
   that position.

3. It combines the `split_score` for the top and the bottom of the
   slider at each of its possible positions, and selects the position
   that has the best `split_score`.

I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:

| repository            | count |      Git 2.9.0 |     compaction | compaction-fixed |       indent-1 |       indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking          |   109 |    89  (81.7%) |    37  (33.9%) |      37  (33.9%) |     2   (1.8%) |     2   (1.8%) |
| alamofire             |    30 |    18  (60.0%) |    14  (46.7%) |      15  (50.0%) |     0   (0.0%) |     0   (0.0%) |
| angular               |   184 |   127  (69.0%) |    39  (21.2%) |      23  (12.5%) |     5   (2.7%) |     5   (2.7%) |
| animate               |   313 |     2   (0.6%) |     2   (0.6%) |       2   (0.6%) |     2   (0.6%) |     2   (0.6%) |
| ant                   |   380 |   356  (93.7%) |   152  (40.0%) |     148  (38.9%) |    15   (3.9%) |    15   (3.9%) | *
| bugzilla              |   306 |   263  (85.9%) |   109  (35.6%) |      99  (32.4%) |    14   (4.6%) |    15   (4.9%) | *
| corefx                |   126 |    91  (72.2%) |    22  (17.5%) |      21  (16.7%) |     6   (4.8%) |     6   (4.8%) |
| couchdb               |    78 |    44  (56.4%) |    26  (33.3%) |      28  (35.9%) |     6   (7.7%) |     6   (7.7%) | *
| cpython               |   937 |   158  (16.9%) |    50   (5.3%) |      49   (5.2%) |     5   (0.5%) |     5   (0.5%) | *
| discourse             |   160 |    95  (59.4%) |    42  (26.2%) |      36  (22.5%) |    18  (11.2%) |    13   (8.1%) |
| docker                |   307 |   194  (63.2%) |   198  (64.5%) |     253  (82.4%) |     8   (2.6%) |     8   (2.6%) | *
| electron              |   163 |   132  (81.0%) |    38  (23.3%) |      39  (23.9%) |     6   (3.7%) |     6   (3.7%) |
| git                   |   536 |   470  (87.7%) |    73  (13.6%) |      78  (14.6%) |    16   (3.0%) |    16   (3.0%) | *
| gitflow               |   127 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| ionic                 |   133 |    89  (66.9%) |    29  (21.8%) |      38  (28.6%) |     1   (0.8%) |     1   (0.8%) |
| ipython               |   482 |   362  (75.1%) |   167  (34.6%) |     169  (35.1%) |    11   (2.3%) |    11   (2.3%) | *
| junit                 |   161 |   147  (91.3%) |    67  (41.6%) |      66  (41.0%) |     1   (0.6%) |     1   (0.6%) | *
| lighttable            |    15 |     5  (33.3%) |     0   (0.0%) |       2  (13.3%) |     0   (0.0%) |     0   (0.0%) |
| magit                 |    88 |    75  (85.2%) |    11  (12.5%) |       9  (10.2%) |     1   (1.1%) |     0   (0.0%) |
| neural-style          |    28 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| nodejs                |   781 |   649  (83.1%) |   118  (15.1%) |     111  (14.2%) |     4   (0.5%) |     5   (0.6%) | *
| phpmyadmin            |   491 |   481  (98.0%) |    75  (15.3%) |      48   (9.8%) |     2   (0.4%) |     2   (0.4%) | *
| react-native          |   168 |   130  (77.4%) |    79  (47.0%) |      81  (48.2%) |     0   (0.0%) |     0   (0.0%) |
| rust                  |   171 |   128  (74.9%) |    30  (17.5%) |      27  (15.8%) |    16   (9.4%) |    14   (8.2%) |
| spark                 |   186 |   149  (80.1%) |    52  (28.0%) |      52  (28.0%) |     2   (1.1%) |     2   (1.1%) |
| tensorflow            |   115 |    66  (57.4%) |    48  (41.7%) |      48  (41.7%) |     5   (4.3%) |     5   (4.3%) |
| test-more             |    19 |    15  (78.9%) |     2  (10.5%) |       2  (10.5%) |     1   (5.3%) |     1   (5.3%) | *
| test-unit             |    51 |    34  (66.7%) |    14  (27.5%) |       8  (15.7%) |     2   (3.9%) |     2   (3.9%) | *
| xmonad                |    23 |    22  (95.7%) |     2   (8.7%) |       2   (8.7%) |     1   (4.3%) |     1   (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals                |  6668 |  4391  (65.9%) |  1496  (22.4%) |    1491  (22.4%) |   150   (2.2%) |   144   (2.2%) |
| totals (training set) |  4552 |  3195  (70.2%) |  1053  (23.1%) |    1061  (23.3%) |    86   (1.9%) |    88   (1.9%) |
| totals (test set)     |  2116 |  1196  (56.5%) |   443  (20.9%) |     430  (20.3%) |    64   (3.0%) |    56   (2.6%) |

In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are

* "repository" - the name of the repository used. I used the diffs
  between successive non-merge commits on the HEAD branch of the
  corresponding repository.

* "count" - the number of sliders that were human-rated. I chose most,
  but not all, sliders to rate from those among which the various
  algorithms gave different answers.

* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.

* "compaction" - the heuristic used by `git diff --compaction-heuristic`
  in Git 2.9.0.

* "compaction-fixed" - the heuristic used by `git diff
  --compaction-heuristic` after the fixes from earlier in this patch
  series. Note that the results are not dramatically different than
  those for "compaction". Both produce non-ideal diffs only about 1/3 as
  often as the default `git diff`.

* "indent-1" - the new `--indent-heuristic` algorithm, using the first
  set of weighting factors, determined as described above.

* "indent-2" - the new `--indent-heuristic` algorithm, using the final
  set of weighting factors, determined as described below.

* `*` - indicates that repo was part of training set used to determine
  the first set of weighting factors.

The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.

The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)

The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].

This patch adds a new command-line option `--indent-heuristic`, and a
new configuration setting `diff.indentHeuristic`, that activate this
heuristic. This interface is only meant for testing purposes, and should
be finalized before including this change in any release.

[1] https://github.com/mhagger/diff-slider-tools

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:25:11 -07:00
Jeff King
46e3d17f57 add--interactive: respect diff.compactionHeuristic
We use plumbing to generate the diff, so it doesn't
automatically pick up UI config like compactionHeuristic.
Let's forward it on, since interactive adding is porcelain.

Note that we only need to handle the "true" case. There's no
point in passing --no-compaction-heuristic when the variable
is false, since nothing else could have turned it on.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-16 11:38:58 -07:00
Jeff King
01143847db add--interactive: allow custom diff highlighting programs
The patch hunk selector of add--interactive knows how ask
git for colorized diffs, and correlate them with the
uncolored diffs we apply. But there's not any way for
somebody who uses a diff-filter tool like contrib's
diff-highlight to see their normal highlighting.

This patch lets users define an arbitrary shell command to
pipe the colorized diff through. The exact output shouldn't
matter (since we just show the result to humans) as long as
it is line-compatible with the original diff (so that
hunk-splitting can split the colorized version, too).

I left two minor issues with the new system that I don't
think are worth fixing right now, but could be done later:

  1. We only filter colorized diffs. Theoretically a user
     could want to filter a non-colorized diff, but I find
     it unlikely in practice. Users who are doing things
     like diff-highlighting are likely to want color, too.

  2. add--interactive will re-colorize a diff which has been
     hand-edited, but it won't have run through the filter.
     Fixing this is conceptually easy (just pipe the diff
     through the filter), but practically hard to do without
     using tempfiles (it would need to feed data to and read
     the result from the filter without deadlocking; this
     raises portability questions with respect to Windows).

I've punted on both issues for now, and if somebody really
cares later, they can do a patch on top.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-28 10:53:54 -08:00
Junio C Hamano
c1fa3e21bc Merge branch 'ak/add-i-empty-candidates'
The interactive "show a list and let the user choose from it"
interface "add -i" used showed and prompted to the user even when
the candidate list was empty, against which the only "choice" the
user could have made was to choose nothing.

* ak/add-i-empty-candidates:
  add -i: return from list_and_choose if there is no candidate
2015-02-17 10:15:20 -08:00
Alexander Kuleshov
a9c4641df7 add -i: return from list_and_choose if there is no candidate
The list_and_choose() helper is given a prompt and a list, asks the
user to make selection from the list, and then returns a list of
items chosen.  Even when it is given an empty list as the original
candidate set to choose from, it gave a prompt to the user, who can
only say "I am done choosing".

Return an empty result when the input is an empty list without
bothering the user.  The existing caller must already have a logic
to say "Nothing to do" or an equivalent when the returned list is
empty (i.e. the user chose to select nothing) if it is necessary, so
no change to the callers is necessary.

This fixes the case where "add untracked" is asked in "git add -i"
and there is no untracked files in the working tree.  We used to give
an empty list of files to choose from with a prompt, but with this
change, we no longer do.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-22 14:44:36 -08:00
Jeff King
a8bec7abcc add--interactive: leave main loop on read error
The main hunk loop for add--interactive will loop if it does
not get a known input. This is a good thing if the user
typed some invalid input. However, if we have an
uncorrectable read error, we'll end up looping infinitely.
We can fix this by noticing read errors (i.e., <STDIN>
returns undef) and breaking out of the loop.

One easy way to trigger this is if you have an editor that
does not take over the terminal (e.g., one that spawns a
window in an existing process and waits), start the editor
with the hunk-edit command, and hit ^C to send SIGINT. The
editor process dies due to SIGINT, but the perl
add--interactive process does not (perl suspends SIGINT for
the duration of our system() call).

We return to the main loop, but further reads from stdin
don't work. The SIGINT _also_ killed our parent git process,
which orphans our process group, meaning that further reads
from the terminal will always fail. We loop infinitely,
getting EIO on each read.

Note that there are several other spots where we read from
stdin, too. However, in each of those cases, we do something
sane when the read returns undef (breaking out of the loop,
taking the input as "no", etc). They don't need similar
treatment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-15 10:12:20 -08:00
Junio C Hamano
d59c12d7ad Merge branch 'jl/nor-or-nand-and'
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.

* jl/nor-or-nand-and:
  code and test: fix misuses of "nor"
  comments: fix misuses of "nor"
  contrib: fix misuses of "nor"
  Documentation: fix misuses of "nor"
2014-04-08 12:00:28 -07:00
Justin Lebar
235e8d5914 code and test: fix misuses of "nor"
Signed-off-by: Justin Lebar <jlebar@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31 15:29:33 -07:00
Simon Ruderich
b294097bc6 git-add--interactive: warn if module for interactive.singlekey is missing
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 14:11:18 -08:00
Jeff King
954312a3ff add-interactive: handle unborn branch in patch mode
The list_modified function already knows how to handle an
unborn branch by diffing against the empty tree. However,
the diff we perform to get the actual hunks does not. Let's
use the same logic for both diffs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-25 14:54:17 -07:00
Johannes Sixt
df17e77c0a add--interactive: fix external command invocation on Windows
Back in 21e9757e (Hack git-add--interactive to make it work with
ActiveState Perl, 2007-08-01), the invocation of external commands was
changed to use qx{} on Windows. The rationale was that the command
interpreter on Windows is not a POSIX shell, but rather Windows's CMD.
That patch was wrong to include 'msys' in the check whether to use qx{}
or not: 'msys' identifies MSYS perl as shipped with Git for Windows,
which does not need the special treatment; qx{} should be used only with
ActiveState perl, which is identified by 'MSWin32'.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-04 10:35:25 -07:00
Junio C Hamano
e5c2909782 add -i: add extra options at the right place in "diff" command line
Appending "--diff-algorithm=histogram" at the end of canned command
line for various modes of "diff" is correct for most of them but not
for "stash" that has a non-option already wired in, like so:

	'stash' => {
		DIFF => 'diff-index -p HEAD',

Appending an extra option after non-option may happen to work due to
overly lax command line parser, but that is not something we should
rely on.  Instead, splice in the extra argument immediately after the
command name (i.e. 'diff-index', 'diff-files', etc.).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-23 13:39:39 -07:00
John Keeping
2cc0f53b53 add--interactive: respect diff.algorithm
When staging hunks interactively it is sometimes useful to use an
alternative diff algorithm which splits the changes into hunks in a more
logical manner.  This is not possible because the plumbing commands
called by add--interactive ignore the "diff.algorithm" configuration
option (as they should).

Since add--interactive is a porcelain command it should respect this
configuration variable.  To do this, make it read diff.algorithm and
pass its value to the underlying diff-index and diff-files invocations.

At this point, do not add options to "git add", "git reset" or "git
checkout" (all of which can call git-add--interactive).  If a user
wants to override the value on the command line they can use:

	git -c diff.algorithm=$ALGO ...

Signed-off-by: John Keeping <john@keeping.me.uk>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-12 13:41:19 -07:00
Stefano Lattarini
41ccfdd9c9 Correct common spelling mistakes in comments and tests
Most of these were found using Lucas De Marchi's codespell tool.

Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-12 13:38:40 -07:00
Thomas Badie
70969f775d git-add--interactive.perl: Remove two unused variables
The patch 8f0bef6 refactored this script and made the variable $fh
unneeded in subs diff_applies and patch_update_file, but forgot to
remove them.

Signed-off-by: Thomas Badie <badie@lrde.epita.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-25 10:06:09 -07:00
Jeff King
4066bd6797 add--interactive: ignore unmerged entries in patch mode
When "add -p" sees an unmerged entry, it shows the combined
diff and then immediately skips the hunk. This can be
confusing in a variety of ways, depending on whether there
are other changes to stage (in which case you get the
superfluous combined diff output in between other hunks) or
not (in which case you get the combined diff and the program
exits immediately, rather than seeing "No changes").

The current behavior was not planned, and is just what the
implementation happens to do. Instead, let's explicitly
remove unmerged entries from our list of modified files, and
print a warning that we are ignoring them.

We can cheaply find which entries are unmerged by adding
"--raw" output to the "diff-files --numstat" we already run.
There is one non-obvious thing we must change when parsing
this combined output. Before this patch, when we saw a
numstat line for a file that did not have index changes, we
would create a new record with 'unchanged' in the 'INDEX'
field.  Because "--raw" comes before "--numstat", we must
move this special-case down to the raw-line case (and it is
sufficient to move it rather than handle it in both places,
since any file which has a --numstat will also have a --raw
entry).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-05 09:01:03 -07:00
Thomas Rast
b5cc003253 add -i: ignore terminal escape sequences
On the author's terminal, the up-arrow input sequence is ^[[A, and
thus fat-fingering an up-arrow into 'git checkout -p' is quite
dangerous: git-add--interactive.perl will ignore the ^[ and [
characters and happily treat A as "discard everything".

As a band-aid fix, use Term::Cap to get all terminal capabilities.
Then use the heuristic that any capability value that starts with ^[
(i.e., \e in perl) must be a key input sequence.  Finally, given an
input that starts with ^[, read more characters until we have read a
full escape sequence, then return that to the caller.  We use a
timeout of 0.5 seconds on the subsequent reads to avoid getting stuck
if the user actually input a lone ^[.

Since none of the currently recognized keys start with ^[, the net
result is that the sequence as a whole will be ignored and the help
displayed.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-17 20:44:17 -07:00
Junio C Hamano
933e44d3a0 "add -p": work-around an old laziness that does not coalesce hunks
Since 0beee4c (git-add--interactive: remove hunk coalescing, 2008-07-02),
"git add--interactive" behaves lazily and passes overlapping hunks to the
underlying "git apply" without coalescing.  This was partially corrected
by 7a26e65 (its partial revert, 2009-05-16), but overlapping hunks are
still passed when the patch is edited.

Teach "git apply" the --allow-overlap option that disables a safety
feature that avoids misapplication of patches by not applying patches
to overlapping hunks, and pass this option form "add -p" codepath.

Do not even advertise the option, as this is merely a workaround, and the
correct fix is to make "add -p" correctly coalesce adjacent patch hunks.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-29 15:27:59 -07:00
Junio C Hamano
9dce832354 add--interactive.perl: factor out repeated --recount option
Depending on the direction and the target of patch application, we would
need to pass --cached and --reverse to underlying "git apply".  Also we
only pass --check when we are not applying but just checking.

But we always pass --recount since 8cbd431 (git-add--interactive: replace
hunk recounting with apply --recount, 2008-07-02).  Instead of repeating
the same --recount over and over again, move it to a single place that
actually runs the command, namely, "run_git_apply" subroutine.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-29 15:27:59 -07:00
Junio C Hamano
f5ea3f2bb3 add -p: 'q' should really quit
The "quit" command was added in 9a7a1e0 (git add -p: new "quit" command at
the prompt, 2009-04-10) to allow the user to say that hunks other than
what have already been chosen are undesirable, and exit the interactive
loop immediately.  It forgot that there may be an undecided hunk before
the current one.  In such a case, the interactive loop still goes back to
the beginning.

Clear all the USE bit for undecided hunks and exit the loop.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-29 15:25:03 -07:00
Junio C Hamano
1aa6583cc2 Merge branch 'jl/add-p-reverse-message'
* jl/add-p-reverse-message:
  Correct help blurb in checkout -p and friends
2010-11-29 17:52:34 -08:00
Jonathan "Duke" Leto
7b8c705188 Correct help blurb in checkout -p and friends
When git checkout -p from the index or HEAD is run in edit mode, the
help message about removing '-' and '+' lines was backwards. Because it
is reverse applying the patch, the meanings of '-' and '+' are reversed.

Signed-off-by: Jonathan "Duke" Leto <jonathan@leto.net>
Acked-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-28 15:28:39 -07:00
Ævar Arnfjörð Bjarmason
3328acedc6 perl: use "use warnings" instead of -w
Change the Perl scripts to turn on lexical warnings instead of setting
the global $^W variable via the -w switch.

The -w sets warnings for all code that interpreter runs, while "use
warnings" is lexically scoped. The former is probably not what the
authors wanted.

As an auxiliary benefit it's now possible to build Git with:

    PERL_PATH='/usr/bin/env perl'

Which would previously result in failures, since "#!/usr/bin/env perl -w"
doesn't work as a shebang.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-27 12:37:56 -07:00
Ævar Arnfjörð Bjarmason
d48b284183 perl: bump the required Perl version to 5.8 from 5.6.[21]
Formalize our dependency on perl 5.8, bumped from 5.6.[12]. We already
used the three-arg form of open() which was introduced in 5.6.1, but
t/t9700/test.pl explicitly depended on 5.6.2.

However git-add--interactive.pl has been failing on the 5.6 line since
it was introduced in v1.5.0-rc0~12^2~2 back in 2006 due to this open
syntax:

    sub run_cmd_pipe {
           my $fh = undef;
           open($fh, '-|', @_) or die;
           return <$fh>;
    }

Which when executed dies on "Can't use an undefined value as
filehandle reference". Several of our tests also fail on 5.6 (even
more when compiled with NO_PERL_MAKEMAKER=1):

    t2016-checkout-patch.sh
    t3904-stash-patch.sh
    t3701-add-interactive.sh
    t7105-reset-patch.sh
    t7501-commit.sh
    t9700-perl-git.sh

Our code is bitrotting on 5.6 with no-one interested in fixing it, and
pinning us to such an ancient release of Perl is keeping us from using
useful features introduced in the 5.8 release.

The 5.6 series is now over 10 years old, and the 5.6.2 maintenance
release almost 7. 5.8 on the other hand is more than 8 years old.

All the modern Unix-like operating systems have now upgraded to it or
a later version, and 5.8 packages are available for old IRIX, AIX
Solaris and Tru64 systems.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Tor Arntsen <tor@spacetec.no>
Acked-by: Randal L. Schwartz <merlyn@stonehenge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-27 12:37:41 -07:00
Jonathan Nieder
74e42ce122 add-interactive: Clarify “remaining hunks in the file”
The "a" and "d" commands to ‘add --patch’ (accept/reject rest of file)
interact with "j", "g", and "/" (skip some hunks) in a perhaps
confusing way: after accepting or rejecting all _later_ hunks in the
file, they return to the earlier, skipped hunks and prompt the user
about them again.

This behavior can be very useful in practice.  One can still accept or
reject _all_ undecided hunks in a file by using the "g" command to
move to hunk #1 first.

Reported-by: Frédéric Brière <fbriere@fbriere.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-13 10:05:02 -07:00