Commit 3aef54e8b8 ("diff: munmap() file contents before running external
diff") introduced calls to diff_free_filespec_data in
run_external_diff, which may pass NULL pointers.
Fix this and prevent any such bugs in the future by making
`diff_free_filespec_data(NULL)` a no-op.
Fixes: 3aef54e8b8 ("diff: munmap() file contents before running external diff")
Signed-off-by: Jinoh Kang <luke1337@theori.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.
* mk/diff-ignore-regex:
diff: add -I<regex> that ignores matching changes
merge-base, xdiff: zero out xpparam_t structures
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported. Peff suggested we adopt the following names[1]:
- hashmap_clear() - remove all entries and de-allocate any
hashmap-specific data, but be ready for reuse
- hashmap_clear_and_free() - ditto, but free the entries themselves
- hashmap_partial_clear() - remove all entries but don't deallocate
table
- hashmap_partial_clear_and_free() - ditto, but free the entries
This patch provides the new names and converts all existing callers over
to the new naming scheme.
[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression. This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement). The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.
Use the 'ignore' field of xdchange_t for marking a change as ignored or
not. Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I. These two
options can also be used together in the same git invocation (they are
complementary to each other).
Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".
Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).
Also replaced same_contents with may_differ to avoid confusion.
Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git status --short" quoted a path with SP in it when tracked, but
not those that are untracked, ignored or unmerged. They are all
shown quoted consistently.
* jc/quote-path-cleanup:
quote: turn 'nodq' parameter into a set of flags
quote: rename misnamed sq_lookup[] to cq_lookup[]
wt-status: consistently quote paths in "status --short" output
quote_path: code clarification
quote_path: optionally allow quoting a path with SP in it
quote_path: give flags parameter to quote_path()
quote_path: rename quote_path_relative() to quote_path()
quote_c_style() and its friend quote_two_c_style() both take an
optional "please omit the double quotes around the quoted body"
parameter. Turn it into a flag word, assign one bit out of it,
and call it CQUOTE_NODQ bit.
No behaviour change intended.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Yet another subcommand of "git submodule" is getting rewritten in C.
* ss/submodule-summary-in-c:
submodule: port submodule subcommand 'summary' from shell to C
t7421: introduce a test script for verifying 'summary' output
submodule: rename helper functions to avoid ambiguity
submodule: remove extra line feeds between callback struct and macro
"git diff --stat -w" showed 0-line changes for paths whose changes
were only whitespaces, which was not intuitive. We now omit such
paths from the stat output.
* mr/diff-hide-stat-wo-textual-change:
diff: teach --stat to ignore uninteresting modifications
The output from the "diff" family of the commands had abbreviated
object names of blobs involved in the patch, but its length was not
affected by the --abbrev option. Now it is.
* dd/diff-customize-index-line-abbrev:
diff: index-line: respect --abbrev in object's name
t4013: improve diff-post-processor logic
The patch-id computation did not ignore the "incomplete last line"
marker like whitespaces.
* rs/patch-id-with-incomplete-line:
patch-id: ignore newline at end of file in diff_flush_patch_id()
A handful of Git's commands respect `--abbrev' for customizing length
of abbreviation of object names.
For diff-family, Git supports 2 different options for 2 different
purposes, `--full-index' for showing diff-patch object's name in full,
and `--abbrev' to customize the length of object names in diff-raw and
diff-tree header lines, without any options to customise the length of
object names in diff-patch format. When working with diff-patch format,
we only have two options, either full index, or default abbrev length.
Although, that behaviour is documented, it doesn't stop users from
trying to use `--abbrev' with the hope of customising diff-patch's
objects' name's abbreviation.
Let's allow the blob object names shown on the "index" line to be
abbreviated to arbitrary length given via the "--abbrev" option.
To preserve backward compatibility with old script that specify both
`--full-index' and `--abbrev', always show full object id
if `--full-index' is specified.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When options such as --ignore-space-change are in use, files with
modifications can have no interesting textual changes worth showing. In
such cases, "git diff --stat" shows 0 lines of additions and deletions.
Teach "git diff --stat" not to show such a path in its output, which
would be more natural.
However, we don't want to prevent the display of all files that have 0
effective diffs since they could be the result of a rename, permission
change, or other similar operation that may still be of interest so we
special case additions and deletions as they are always interesting.
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whitespace is ignored when calculating patch IDs. This is done by
removing all whitespace from diff lines before hashing them, including
a newline at the end of a file. If that newline is missing, however,
diff reports that fact in a separate line containing "\ No newline at
end of file\n", and this marker is hashed like a context line.
This goes against our goal of making patch IDs independent of
whitespace. Use the same heuristic that 2485eab55c (git-patch-id: do
not trip over "no newline" markers, 2011-02-17) added to git patch-id
instead and skip diff lines that start with a backslash and a space
and are longer than twelve characters.
Reported-by: Tilman Vogel <tilman.vogel@web.de>
Initial-test-by: Tilman Vogel <tilman.vogel@web.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper functions: show_submodule_summary(),
prepare_submodule_summary() and print_submodule_summary() are used by
the builtin_diff() function in diff.c to generate a summary of
submodules in the context of a diff. Functions with similar names are to
be introduced in the upcoming port of submodule's summary subcommand.
So, rename the helper functions to '*_diff_submodule_summary()' to avoid
ambiguity.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").
Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts remaining files from the first half of the alphabet,
to keep the diff to a manageable size.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
and then selectively staging files with "git add '[abcdefghjkl]*'".
We'll deal with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe 's/argv-array.h/strvec.h/'
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce memory usage during "diff --quiet" in a worktree with too
many stat-unmatched paths.
* jk/diff-memuse-optim-with-stat-unmatch:
diff: discard blob data from stat-unmatched pairs
When performing a tree-level diff against the working tree, we may find
that our index stat information is dirty, so we queue a filepair to be
examined later. If the actual content hasn't changed, we call this a
stat-unmatch; the stat information was out of date, but there's no
actual diff. Normally diffcore_std() would detect and remove these
identical filepairs via diffcore_skip_stat_unmatch(). However, when
"--quiet" is used, we want to stop the diff as soon as we see any
changes, so we check for stat-unmatches immediately in diff_change().
That check may require us to actually load the file contents into the
pair of diff_filespecs. If we find that the pair isn't a stat-unmatch,
then no big deal; we'd likely load the contents later anyway to generate
a patch, do rename detection, etc, so we want to hold on to it. But if
it is a stat-unmatch, then we have no more use for that data; the whole
point is that we're going discard the pair. However, we never free the
allocated diff_filespec data.
In most cases, keeping that data isn't a problem. We don't expect a lot
of stat-unmatch entries, and since we're using --quiet, we'd quit as
soon as we saw such a real change anyway. However, there are extreme
cases where it makes a big difference:
1. We'd generally mmap() the working tree half of the pair. And since
the OS may limit the total number of maps, we can run afoul of this
in large repositories. E.g.:
$ cd linux
$ git ls-files | wc -l
67959
$ sysctl vm.max_map_count
vm.max_map_count = 65530
$ git ls-files | xargs touch ;# everything is stat-dirty!
$ git diff --quiet
fatal: mmap failed: Cannot allocate memory
It should be unusual to have so many files stat-dirty, but it's
possible if you've just run a script like "sed -i" or similar.
After this patch, the above correctly exits with code 0.
2. Even if you don't hit mmap limits, the index half of the pair will
have been pulled from the object database into heap memory. Again
in a clone of linux.git, running:
$ git ls-files | head -n 10000 | xargs touch
$ git diff --quiet
peaks at 145MB heap before this patch, and 94MB after.
This patch solves the problem by freeing any diff_filespec data we
picked up during the "--quiet" stat-unmatch check in diff_changes.
Nobody is going to need that data later, so there's no point holding on
to it. There are a few things to note:
- we could skip queueing the pair entirely, which could in theory save
a little work. But there's not much to save, as we need a
diff_filepair to feed to diff_filespec_check_stat_unmatch() anyway.
And since we cache the result of the stat-unmatch checks, a later
call to diffcore_skip_stat_unmatch() call will quickly skip over
them. The diffcore code also counts up the number of stat-unmatched
pairs as it removes them. It's doubtful any callers would care about
that in combination with --quiet, but we'd have to reimplement the
logic here to be on the safe side. So it's not really worth the
trouble.
- I didn't write a test, because we always produce the correct output
unless we run up against system mmap limits, which are both
unportable and expensive to test against. Measuring peak heap
would be interesting, but our perf suite isn't yet capable of that.
- note that diff without "--quiet" does not suffer from the same
problem. In diffcore_skip_stat_unmatch(), we detect the stat-unmatch
entries and drop them immediately, so we're not carrying their data
around.
- you _can_ still trigger the mmap limit problem if you truly have
that many files with actual changes. But it's rather unlikely. The
stat-unmatch check avoids loading the file contents if the sizes
don't match, so you'd need a pretty trivial change in every single
file. Likewise, inexact rename detection might load the data for
many files all at once. But you'd need not just 64k changes, but
that many deletions and additions. The most likely candidate is
perhaps break-detection, which would load the data for all pairs and
keep it around for the content-level diff. But again, you'd need 64k
actually changed files in the first place.
So it's still possible to trigger this case, but it seems like "I
accidentally made all my files stat-dirty" is the most likely case
in the real world.
Reported-by: Jan Christoph Uhde <Jan@UhdeJc.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `diff.relative` boolean option set to `true` shows only changes in
the current directory/value specified by the `path` argument of the
`relative` option and shows pathnames relative to the aforementioned
directory.
Teach `--no-relative` to override earlier `--relative`
Add for git-format-patch(1) options documentation `--relative` and
`--no-relative`
Signed-off-by: Laurent Arnoud <laurent@spkdev.net>
Acked-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" in a partial clone learned to avoid lazy loading blob
objects in more casese when they are not needed.
* jt/avoid-prefetch-when-able-in-diff:
diff: restrict when prefetching occurs
diff: refactor object read
diff: make diff_populate_filespec_options struct
promisor-remote: accept 0 as oid_nr in function
Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-08)
optimized "diff" by prefetching blobs in a partial clone, but there are
some cases wherein blobs do not need to be prefetched. In these cases,
any command that uses the diff machinery will unnecessarily fetch blobs.
diffcore_std() may read blobs when it calls the following functions:
(1) diffcore_skip_stat_unmatch() (controlled by the config variable
diff.autorefreshindex)
(2) diffcore_break() and diffcore_merge_broken() (for break-rewrite
detection)
(3) diffcore_rename() (for rename detection)
(4) diffcore_pickaxe() (for detecting addition/deletion of specified
string)
Instead of always prefetching blobs, teach diffcore_skip_stat_unmatch(),
diffcore_break(), and diffcore_rename() to prefetch blobs upon the first
read of a missing object. This covers (1), (2), and (3): to cover the
rest, teach diffcore_std() to prefetch if the output type is one that
includes blob data (and hence blob data will be required later anyway),
or if it knows that (4) will be run.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the object reads in diff_populate_filespec() to have the first
object read not be in an if/else branch, because in a future patch, a
retry will be added to that first object read.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The behavior of diff_populate_filespec() currently can be customized
through a bitflag, but a subsequent patch requires it to support a
non-boolean option. Replace the bitflag with an options struct.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are 3 callers to promisor_remote_get_direct() that first check if
the number of objects to be fetched is equal to 0. Fold that check into
promisor_remote_get_direct(), and in doing so, be explicit as to what
promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success,
immediately).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have the codebase wired up to pass any additional metadata
to filters, let's collect the additional metadata that we'd like to
pass.
The two main places we pass this metadata are checkouts and archives.
In these two situations, reading HEAD isn't a valid option, since HEAD
isn't updated for checkouts until after the working tree is written and
archives can accept an arbitrary tree. In other situations, HEAD will
usually reflect the refname of the branch in current use.
We pass a smaller amount of data in other cases, such as git cat-file,
where we can really only logically know about the blob.
This commit updates only the parts of the checkout code where we don't
use unpack_trees. That function and callers of it will be handled in a
future commit.
In the archive code, we leak a small amount of memory, since nothing we
pass in the archiver argument structure is freed.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a variety of situations where a filter process can make use of
some additional metadata. For example, some people find the ident
filter too limiting and would like to include the commit or the branch
in their smudged files. This information isn't available during
checkout as HEAD hasn't been updated at that point, and it wouldn't be
available in archives either.
Let's add a way to pass this metadata down to the filter. We pass the
blob we're operating on, the treeish (preferring the commit over the
tree if one exists), and the ref we're operating on. Note that we won't
pass this information in all cases, such as when renormalizing or when
we're performing diffs, since it doesn't make sense in those cases.
The data we currently get from the filter process looks like the
following:
command=smudge
pathname=git.c
0000
With this change, we'll get data more like this:
command=smudge
pathname=git.c
refname=refs/tags/v2.25.1
treeish=c522f061d551c9bb8684a7c3859b2ece4499b56b
blob=7be7ad34bd053884ec48923706e70c81719a8660
0000
There are a couple things to note about this approach. For operations
like checkout, treeish will always be a commit, since we cannot check
out individual trees, but for other operations, like archive, we can end
up operating on only a particular tree, so we'll provide only a tree as
the treeish. Similar comments apply for refname, since there are a
variety of cases in which we won't have a ref.
This commit wires up the code to print this information, but doesn't
pass any of it at this point. In a future commit, we'll have various
code paths pass the actual useful data down.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).
* mt/use-passed-repo-more-in-funcs:
sha1-file: allow check_object_signature() to handle any repo
sha1-file: pass git_hash_algo to hash_object_file()
sha1-file: pass git_hash_algo to write_object_file_prepare()
streaming: allow open_istream() to handle any repo
pack-check: use given repo's hash_algo at verify_packfile()
cache-tree: use given repo's hash_algo at verify_one()
diff: make diff_populate_filespec() honor its repo argument
We parse diff.wsErrorHighlight in git_diff_ui_config(), meaning that it
doesn't take effect for plumbing commands, only for porcelains like
git-diff itself. This is mildly annoying as it means scripts like
add--interactive, which produce a user-visible diff with color, don't
respect the option.
We could teach that script to parse the config and pass it along as
--ws-error-highlight to the diff plumbing. But there's a simpler
solution.
It should be reasonably safe for plumbing to respect this option, as it
only kicks in when color is otherwise enabled. And anybody parsing
colorized output must already deal with the fact that color.diff.* may
change the exact output they see; those options have been part of
git_diff_basic_config() since its inception in 9a1805a872 (add a "basic"
diff config callback, 2008-01-04).
So we can just move it to the "basic" config, which fixes
add--interactive, along with any other script in the same boat, with a
very low risk of hurting any plumbing users.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff_populate_filespec() takes a struct repository argument but it
doesn't get passed down to read_object_file().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The beginning of rewriting "git add -i" in C.
* js/builtin-add-i:
built-in add -i: implement the `help` command
built-in add -i: use color in the main loop
built-in add -i: support `?` (prompt help)
built-in add -i: show unique prefixes of the commands
built-in add -i: implement the main loop
built-in add -i: color the header in the `status` command
built-in add -i: implement the `status` command
diff: export diffstat interface
Start to implement a built-in version of `git add --interactive`
Make the diffstat interface (namely, the diffstat_t struct and
compute_diffstat) no longer be internal to diff.c and allow it to be used
by other parts of git.
This is helpful for code that may want to easily extract information
from files using the diff machinery, while flushing it differently from
how the show_* functions used by diff_flush() do it. One example is the
builtin implementation of git-add--interactive's status.
Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
Signed-off-by: Slavica Đukić <slawica92@hotmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Preparation for SHA-256 upgrade continues.
* bc/object-id-part17: (26 commits)
midx: switch to using the_hash_algo
builtin/show-index: replace sha1_to_hex
rerere: replace sha1_to_hex
builtin/receive-pack: replace sha1_to_hex
builtin/index-pack: replace sha1_to_hex
packfile: replace sha1_to_hex
wt-status: convert struct wt_status to object_id
cache: remove null_sha1
builtin/worktree: switch null_sha1 to null_oid
builtin/repack: write object IDs of the proper length
pack-write: use hash_to_hex when writing checksums
sequencer: convert to use the_hash_algo
bisect: switch to using the_hash_algo
sha1-lookup: switch hard-coded constants to the_hash_algo
config: use the_hash_algo in abbrev comparison
combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo
bundle: switch to use the_hash_algo
connected: switch GIT_SHA1_HEXSZ to the_hash_algo
show-index: switch hard-coded constants to the_hash_algo
blame: remove needless comparison with GIT_SHA1_HEXSZ
...
Since these macros already take a `keyvar' pointer of a known type,
we can rely on OFFSETOF_VAR to get the correct offset without
relying on non-portable `__typeof__' and `offsetof'.
Argument order is also rearranged, so `keyvar' and `member' are
sequential as they are used as: `keyvar->member'
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we cannot rely on a `__typeof__' operator being portable
to use with `offsetof'; we can calculate the pointer offset
using an existing pointer and the address of a member using
pointer arithmetic for compilers without `__typeof__'.
This allows us to simplify usage of hashmap iterator macros
by not having to specify a type when a pointer of that type
is already given.
In the future, list iterator macros (e.g. list_for_each_entry)
may also be implemented using OFFSETOF_VAR to save hackers the
trouble of using container_of/list_entry macros and without
relying on non-portable `__typeof__'.
v3: use `__typeof__' to avoid clang warnings
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`hashmap_free_entries' behaves like `container_of' and passes
the offset of the hashmap_entry struct to the internal
`hashmap_free_' function, allowing the function to free any
struct pointer regardless of where the hashmap_entry field
is located.
`hashmap_free' no longer takes any arguments aside from
the hashmap itself.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Another step in eliminating the requirement of hashmap_entry
being the first member of a struct.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using `container_of' can be verbose and choosing names for
intermediate "struct hashmap_entry" pointers is a hard problem.
So introduce "*_entry" APIs inspired by similar linked-list
APIs in the Linux kernel.
Unfortunately, `__typeof__' is not portable C, so we need an
extra parameter to specify the type.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a step towards removing the requirement for
hashmap_entry being the first field of a struct.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Otherwise, the hashmap_entry.next field appears to remain
uninitialized, which can lead to problems when
add_lines_to_move_detection calls hashmap_add.
I found this through manual inspection when converting
hashmap_add callers to take "struct hashmap_entry *".
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.
* cc/multi-promisor:
Move core_partial_clone_filter_default to promisor-remote.c
Move repository_format_partial_clone to promisor-remote.c
Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
remote: add promisor and partial clone config to the doc
partial-clone: add multiple remotes in the doc
t0410: test fetching from many promisor remotes
builtin/fetch: remove unique promisor remote limitation
promisor-remote: parse remote.*.partialclonefilter
Use promisor_remote_get_direct() and has_promisor_remote()
promisor-remote: use repository_format_partial_clone
promisor-remote: add promisor_remote_reinit()
promisor-remote: implement promisor_remote_get_direct()
Add initial support for many promisor remotes
fetch-object: make functions return an error code
t0410: remove pipes after git commands
On-demand object fetching in lazy clone incorrectly tried to fetch
commits from submodule projects, while still working in the
superproject, which has been corrected.
* jt/diff-lazy-fetch-submodule-fix:
diff: skip GITLINK when lazy fetching missing objs
In 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-08),
diff was taught to batch the fetching of missing objects when operating
on a partial clone, but was not taught to refrain from fetching
GITLINKs. Teach diff to check if an object is a GITLINK before including
it in the set to be fetched.
(As stated in the commit message of that commit, unpack-trees was also
taught a similar thing prior, but unpack-trees correctly checks for
GITLINK before including objects in the set to be fetched.)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the two separate patch-id implementations to use the_hash_algo
in their implementation.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The line count in the outer diff's hunk headers of a range diff is not
all that interesting. It merely shows how far along the inner diff
are on both sides. That number is of no use for human readers, and
range-diffs are not meant to be machine readable.
In a subsequent commit we're going to add some more contextual
information such as the filename corresponding to the diff to the hunk
headers. Remove the unnecessary information, and just keep the "@@"
to indicate that a new hunk of the outer diff is starting.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running an external diff from, say, a diff tool, it is safe to
assume that we want to write the files in question. On Windows, that
means that there cannot be any other process holding an open handle to
said files, or even just a mapped region.
So let's make sure that `git diff` itself is not holding any open handle
to the files in question.
In fact, we will just release the file pair right away, as the external
diff uses the files we just wrote, so we do not need to hold the file
contents in memory anymore.
This fixes https://github.com/git-for-windows/git/issues/1315
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using the repository_format_partial_clone global
and fetch_objects() directly, let's use has_promisor_remote()
and promisor_remote_get_direct().
This way all the configured promisor remotes will be taken
into account, not only the one specified by
extensions.partialClone.
Also when cloning or fetching using a partial clone filter,
remote.origin.promisor will be set to "true" instead of
setting extensions.partialClone to "origin". This makes it
possible to use many promisor remote just by fetching from
them.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--base" option of "format-patch" computed the patch-ids for
prerequisite patches in an unstable way, which has been updated to
compute in a way that is compatible with "git patch-id --stable".
* sb/format-patch-base-patch-id-fix:
format-patch: make --base patch-id output stable
format-patch: inform user that patch-id generation is unstable
Fix two typos introduced by the following commits:
+ 31fba9d3b4 (diff-parseopt: convert --[src|dst]-prefix, 2019-03-24)
+ ed8b4132c8 (remote-curl: mark all error messages for translation,
2019-03-05)
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A brown-paper-bag bugfix to a change already in 'master'.
* nd/diff-parseopt:
parse-options: check empty value in OPT_INTEGER and OPT_ABBREV
diff-parseopt: restore -U (no argument) behavior
diff-parseopt: correct variable types that are used by parseopt
Before d473e2e0e8 (diff.c: convert -U|--unified, 2019-01-27), -U and
--unified are implemented with a custom parser opt_arg() in diff.c. I
didn't check this code carefully and not realize that it's the
equivalent of PARSE_OPT_NONEG | PARSE_OPT_OPTARG.
In other words, if -U is specified without any argument, the option
should be accepted, and the default value should be used. Without
PARSE_OPT_OPTARG, parse_options() will reject this case and cause a
regression.
Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We weren't flushing the context each time we processed a hunk in the
patch-id generation code in diff.c, but we were doing that when we
generated "stable" patch-ids with the 'patch-id' tool. Let's port that
similar logic over from patch-id.c into diff.c so we can get the same
hash when we're generating patch-ids for 'format-patch --base=' types of
command invocations.
Cc: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While running "git diff" in a lazy clone, we can upfront know which
missing blobs we will need, instead of waiting for the on-demand
machinery to discover them one by one. Aim to achieve better
performance by batching the request for these promised blobs.
* jt/batch-fetch-blobs-in-diff:
diff: batch fetching of missing blobs
sha1-file: support OBJECT_INFO_FOR_PREFETCH
When running a command like "git show" or "git diff" in a partial clone,
batch all missing blobs to be fetched as one request.
This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
blobs", 2017-12-08), but for another command.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff_opt_parse() is a heavy hammer to just set diff filter. But it's
the only way because of the diff_status_letters[] mapping. Add a new
API to set diff filter and use it in git-am. diff_opt_parse()'s only
remaining call site in revision.c will be gone soon and having it here
just because of git-am does not make sense.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option is added in commit b73bcbac4a (diff: allow
--no-color-moved-ws - 2018-11-23) in pw/diff-color-moved-ws-fix. To ease
merge conflict resolution, re-implement the option handling here so that
the conflict could be resolved by taking this side of change.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark one more string for translation while at there
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
OPT__ABBREV() has the same behavior as the deleted code with one
difference: it does check for valid number and error out if not. And the
'40' change is self explanatory.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at it, mark one more string for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at it, mark one more string for translation.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark one more string for translation while at there.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command line completion (in contrib/) has been taught to
complete more subcommand parameters.
* nd/completion-more-parameters:
completion: add more parameter value completion
Code clean-up.
* jk/unused-params:
ref-filter: drop unused "sz" parameters
ref-filter: drop unused "obj" parameters
ref-filter: drop unused buf/sz pairs
files-backend: drop refs parameter from split_symref_update()
pack-objects: drop unused parameter from oe_map_new_pack()
merge-recursive: drop several unused parameters
diff: drop complete_rewrite parameter from run_external_diff()
diff: drop unused emit data parameter from sane_truncate_line()
diff: drop unused color reset parameters
diff: drop options parameter from diffcore_fix_diff_index()
The diff machinery, one of the oldest parts of the system, which
long predates the parse-options API, uses fairly long and complex
handcrafted option parser. This is being rewritten to use the
parse-options API.
* nd/diff-parseopt:
diff.c: convert --raw
diff.c: convert -W|--[no-]function-context
diff.c: convert -U|--unified
diff.c: convert -u|-p|--patch
diff.c: prepare to use parse_options() for parsing
diff.h: avoid bit fields in struct diff_flags
diff.h: keep forward struct declarations sorted
parse-options: allow ll_callback with OPTION_CALLBACK
parse-options: avoid magic return codes
parse-options: stop abusing 'callback' for lowlevel callbacks
parse-options: add OPT_BITOP()
parse-options: disable option abbreviation with PARSE_OPT_KEEP_UNKNOWN
parse-options: add one-shot mode
parse-options.h: remove extern on function prototypes
For --rename-empty, see 90d43b0768 (teach diffcore-rename to
optionally ignore empty content - 2012-03-22) for more information.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
--no-find-copies-harder is also added on purpose (because I don't see
why we should not have the --no- version for this)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This also validates that the user specifies a single character in
--output-indicator-*, not a string.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds value completion for a couple more paramters. To make it
easier to maintain these hard coded lists, add a comment at the original
list/code to remind people to update git-completion.bash too.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our builtin_diff() wants to know whether break-detection found a
complete rewrite, because it changes how the diff is shown. However,
when calling out to an external diff, we don't pass this information
along (and doing so would require designing a new interface to the
user-provided program).
Let's drop the unused parameter to make this fact more clear.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We pass the "struct emit_callback" (which contains all of the context
for our diff) into sane_truncate_line(), but that function doesn't
actually use it. In theory we might eventually develop a diff option
that impacts this, but in the meantime let's not mislead anybody reading
the code. Since the function is static, it would be easy to pass it
again if it should ever become useful.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several of the emit_* functions take a "reset" color parameter, but
never actually look at it (instead, they call into emit_diff_symbol,
which handles the colors itself). Let's drop these unused parameters.
Note that emit_line() does still take a color/reset pair, and actually
uses it. It cannot be refactored to match these other functions because
it's the thing that emit_diff_symbol eventually calls into (i.e., it
does not by itself know which colors to use, and must be told by the
caller).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sole purpose of this function is to fix the sorting order of the
queued diff entries. It doesn't need to know about any diff options, so
we can drop the unused parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --color-moved --cc --stat -p" did not work well due to
funny interaction between a bug in color-moved and the rest, which
has been fixed.
* jk/diff-cc-stat-fixes:
combine-diff: treat --dirstat like --stat
combine-diff: treat --summary like --stat
combine-diff: treat --shortstat like --stat
combine-diff: factor out stat-format mask
diff: clear emitted_symbols flag after use
t4006: resurrect commented-out tests
There were many places the code relied on the string returned from
getenv() to be non-volatile, which is not true, that have been
corrected.
* jk/save-getenv-result:
builtin_diff(): read $GIT_DIFF_OPTS closer to use
merge-recursive: copy $GITHEAD strings
init: make a copy of $GIT_DIR string
config: make a copy of $GIT_CONFIG string
commit: copy saved getenv() result
get_super_prefix(): copy getenv() result
The code to drive GIT_EXTERNAL_DIFF command relied on the string
returned from getenv() to be non-volatile, which is not true, that
has been corrected.
* kg/external-diff-save-env:
diff: ensure correct lifetime of external_diff_cmd
This is a preparation step to start using parse_options() to parse
diff/revision options instead of what we have now. There are a couple
of good things from using parse_options():
- better help usage
- easier to add new options
- better completion support
- help usage generation
- better integration with main command option parser. We can just
concat the main command's option array and diffopt's together and
parse all in one go.
- detect colidding options (e.g. --reverse is used by revision code,
so diff code can't use it as long name for -R)
- consistent syntax, e.g. option that takes mandatory argument will
now accept both "--option=value" and "--option value".
The plan is migrate all diff/rev options to parse_options(). Then we
could get rid of diff_opt_parse() and expose parseopts[] directly to
the caller.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's an odd bug when "log --color-moved" is used with the combination
of "--cc --stat -p": the stat for merge commits is erroneously shown
with the diff of the _next_ commit.
The included test demonstrates the issue. Our history looks something
like this:
A-B-M--D
\ /
C
When we run "git log --cc --stat -p --color-moved" starting at D, we get
this sequence of events:
1. The diff for D is using -p, so diff_flush() calls into
diff_flush_patch_all_file_pairs(). There we see that o->color_moved
is in effect, so we point o->emitted_symbols to a static local
struct, causing diff_flush_patch() to queue the symbols instead of
actually writing them out.
We then do our move detection, emit the symbols, and clear the
struct. But we leave o->emitted_symbols pointing to our struct.
2. Next we compute the diff for M. This is a merge, so we use the
combined diff code. In find_paths_generic(), we compute the
pairwise diff between each commit and its parent. Normally this is
done with DIFF_FORMAT_NO_OUTPUT, since we're just looking for
intersecting paths. But since "--stat --cc" shows the first-parent
stat, and since we're computing that diff anyway, we enable
DIFF_FORMAT_DIFFSTAT for the first parent. This outputs the stat
information immediately, saving us from running a separate
first-parent diff later.
But where does that output go? Normally it goes directly to stdout,
but because o->emitted_symbols is set, we queue it. As a result, we
don't actually print the diffstat for the merge commit (yet), which
is wrong.
3. Next we compute the diff for C. We're actually showing a patch
again, so we end up in diff_flush_patch_all_file_pairs(), but this
time we have the queued stat from step 2 waiting in our struct.
We add new elements to it for C's diff, and then flush the whole
thing. And we see the diffstat from M as part of C's diff, which is
wrong.
So triggering the bug really does require the combination of all of
those options.
To fix it, we can simply restore o->emitted_symbols to NULL after
flushing it, so that it does not affect anything outside of
diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
nobody outside of that function is going to bother flushing it, so we
would not want them to write to it either.
In fact, we could take this a step further and turn the local "esm"
struct into a non-static variable that goes away after the function
ends. However, since it contains a dynamically sized array, we benefit
from amortizing the cost of allocations over many calls. So we'll leave
it as static to retain that benefit.
But let's push the zero-ing of esm.nr into the conditional for "if
(o->emitted_symbols)" to make it clear that we do not expect esm to hold
any values if we did not just try to use it. With the code as it is
written now, if we did encounter such a case (which I think would be a
bug), we'd silently leak those values without even bothering to display
them. With this change, we'd at least eventually show them, and somebody
would notice.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The value returned by getenv() is not guaranteed to remain valid across
other environment function calls. But in between our call and using the
value, we run fill_textconv(), which may do quite a bit of work,
including spawning sub-processes.
We can make this safer by calling getenv() right before we actually look
at its value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to getenv(3)'s notes:
The implementation of getenv() is not required to be reentrant. The
string pointed to by the return value of getenv() may be statically
allocated, and can be modified by a subsequent call to getenv(),
putenv(3), setenv(3), or unsetenv(3).
Since strings returned by getenv() are allowed to change on subsequent
calls to getenv(), make sure to duplicate when caching external_diff_cmd
from environment.
This problem becomes apparent on Git for Windows since fe21c6b285
(mingw: reencode environment variables on the fly (UTF-16 <-> UTF-8)),
when the getenv() implementation provided in compat/mingw.c was changed
to keep a certain amount of alloc'ed strings and freeing them on
subsequent calls.
This fixes https://github.com/git-for-windows/git/issues/2007:
$ yes n | git -c difftool.prompt=yes difftool fe21c6b285 fe21c6b285df~100
Viewing (1/404): '.gitignore'
Launch 'bc3' [Y/n]?
Viewing (2/404): 'Documentation/.gitignore'
Launch 'bc3' [Y/n]?
Viewing (3/404): 'Documentation/Makefile'
Launch 'bc3' [Y/n]?
Viewing (4/404): 'Documentation/RelNotes/2.14.5.txt'
Launch 'bc3' [Y/n]?
Viewing (5/404): 'Documentation/RelNotes/2.15.3.txt'
Launch 'bc3' [Y/n]?
Viewing (6/404): 'Documentation/RelNotes/2.16.5.txt'
Launch 'bc3' [Y/n]?
Viewing (7/404): 'Documentation/RelNotes/2.17.2.txt'
Launch 'bc3' [Y/n]?
Viewing (8/404): 'Documentation/RelNotes/2.18.1.txt'
Launch 'bc3' [Y/n]?
Viewing (9/404): 'Documentation/RelNotes/2.19.0.txt'
Launch 'bc3' [Y/n]? error: cannot spawn ¦?: No such file or directory
fatal: external diff died, stopping at Documentation/RelNotes/2.19.1.txt
Signed-off-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using --color-moved-ws=allow-indentation-change allow lines with
the same indentation change to be grouped across blank lines. For now
this only works if the blank lines have been moved as well, not for
blocks that have just had their indentation changed.
This completes the changes to the implementation of
--color-moved=allow-indentation-change. Running
git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0
now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
version of the previous implementation and 66% from the original which
took 14.6s.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].
-extern int stream_filter(struct stream_filter *,
- const char *input, size_t *isize_p,
- char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);
This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has the benefit that any whitespace errors do not
interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.
[1] Note that before the commit to fix the erroneous coloring of moved
lines each line was colored as a different block, since that commit
they are uncolored.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>