Commit graph

514 commits

Author SHA1 Message Date
Junio C Hamano a74352867e revision: propagate flag bits from tags to pointees
With the previous fix 895c5ba3 (revision: do not peel tags used in
range notation, 2013-09-19), handle_revision_arg() that processes
command line arguments for the "git log" family of commands no
longer directly places the object pointed by the tag in the pending
object array when it sees a tag object.  We used to place pointee
there after copying the flag bits like UNINTERESTING and
SYMMETRIC_LEFT.

This change meant that any flag that is relevant to later history
traversal must now be propagated to the pointed objects (most often
these are commits) while starting the traversal, which is partly
done by handle_commit() that is called from prepare_revision_walk().
We did propagate UNINTERESTING, but did not do so for others, most
notably SYMMETRIC_LEFT.  This caused "git log --left-right v1.0..."
(where "v1.0" is a tag) to start losing the "leftness" from the
commit the tag points at.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-15 15:53:51 -08:00
Junio C Hamano 2ac5e4470b revision: mark contents of an uninteresting tree uninteresting
"git rev-list --objects ^A^{tree} B^{tree}" ought to mean "I want a
list of objects inside B's tree, but please exclude the objects that
appear inside A's tree".

we see the top-level tree marked as uninteresting (i.e. ^A^{tree} in
the above example) and call mark_tree_uninteresting() on it; this
unfortunately prevents us from recursing into the tree and marking
the objects in the tree as uninteresting.

The reason why "git log ^A A" yields an empty set of commits,
i.e. we do not have a similar issue for commits, is because we call
mark_parents_uninteresting() after seeing an uninteresting commit.
The uninteresting-ness of the commit itself does not prevent its
parents from being marked as uninteresting.

Introduce mark_tree_contents_uninteresting() and structure the code
in handle_commit() in such a way that it makes it the responsibility
of the callchain leading to this function to mark commits, trees and
blobs as uninteresting, and also make it the responsibility of the
helpers called from this function to mark objects that are reachable
from them.

Note that this is a very old bug that probably dates back to the day
when "rev-list --objects" was introduced.  The line to clear
tree->object.parsed at the end of mark_tree_contents_uninteresting()
can be removed when this fix is merged to the codebase after
6e454b9a (clear parsed flag when we free tree buffers, 2013-06-05).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-15 15:48:58 -08:00
Junio C Hamano ad70448576 Merge branch 'cc/starts-n-ends-with'
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.

* cc/starts-n-ends-with:
  replace {pre,suf}fixcmp() with {starts,ends}_with()
  strbuf: introduce starts_with() and ends_with()
  builtin/remote: remove postfixcmp() and use suffixcmp() instead
  environment: normalize use of prefixcmp() by removing " != 0"
2013-12-17 12:02:44 -08:00
Christian Couder 5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Junio C Hamano 10167eb251 Merge branch 'jc/ref-excludes'
People often wished a way to tell "git log --branches" (and "git
log --remotes --not --branches") to exclude some local branches
from the expansion of "--branches" (similarly for "--tags", "--all"
and "--glob=<pattern>").  Now they have one.

* jc/ref-excludes:
  rev-parse: introduce --exclude=<glob> to tame wildcards
  rev-list --exclude: export add/clear-ref-exclusion and ref-excluded API
  rev-list --exclude: tests
  document --exclude option
  revision: introduce --exclude=<glob> to tame wildcards
2013-12-05 12:59:09 -08:00
Junio C Hamano c6f1b920ac Merge branch 'nd/literal-pathspecs'
Fixes a regression on 'master' since v1.8.4.

* nd/literal-pathspecs:
  pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses
2013-11-18 14:31:29 -08:00
Junio C Hamano ff32d3420a rev-list --exclude: export add/clear-ref-exclusion and ref-excluded API
... while updating their function signature.  To be squashed into
the initial patch to rev-list.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-01 13:09:24 -07:00
Felipe Contreras 9e57ac55ce revision: trivial style fixes
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-31 13:48:05 -07:00
Junio C Hamano 4cebbe6f55 Merge branch 'nd/magic-pathspec'
All callers to parse_pathspec() must choose between getting no
pathspec or one path that is limited to the current directory
when there is no paths given on the command line, but there were
two callers that violated this rule, triggering a BUG().

* nd/magic-pathspec:
  Fix calling parse_pathspec with no paths nor PATHSPEC_PREFER_* flags
2013-10-30 12:10:33 -07:00
Junio C Hamano 2d99baab2f Merge branch 'jc/revision-range-unpeel'
"git rev-list --objects ^v1.0^ v1.0" gave v1.0 tag itself in the
output, but "git rev-list --objects v1.0^..v1.0" did not.

* jc/revision-range-unpeel:
  revision: do not peel tags used in range notation
2013-10-28 10:43:16 -07:00
Nguyễn Thái Ngọc Duy 4a2d5ae262 pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses
Normally parse_pathspec() is used on command line arguments where it
can do fancy thing like parsing magic on each argument or adding magic
for all pathspecs based on --*-pathspecs options.

There's another use of parse_pathspec(), where pathspec is needed, but
the input is known to be pure paths. In this case we usually don't
want --*-pathspecs to interfere. And we definitely do not want to
parse magic in these paths, regardless of --literal-pathspecs.

Add new flag PATHSPEC_LITERAL_PATH for this purpose. When it's set,
--*-pathspecs are ignored, no magic is parsed. And if the caller
allows PATHSPEC_LITERAL (i.e. the next calls can take literal magic),
then PATHSPEC_LITERAL will be set.

This fixes cases where git chokes when GIT_*_PATHSPECS are set because
parse_pathspec() indicates it won't take any magic. But
GIT_*_PATHSPECS add them anyway. These are

   export GIT_LITERAL_PATHSPECS=1
   git blame -- something
   git log --follow something
   git log --merge

"git ls-files --with-tree=path" (aka parse_pathspec() in
overlay_tree_on_cache()) is safe because the input is empty, and
producing one pathspec due to PATHSPEC_PREFER_CWD does not take any
magic into account.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-28 09:57:36 -07:00
Vicent Marti a330de31d1 revision: allow setting custom limiter function
This commit enables users of `struct rev_info` to peform custom limiting
during a revision walk (i.e. `get_revision`).

If the field `include_check` has been set to a callback, this callback
will be issued once for each commit before it is added to the "pending"
list of the revwalk. If the include check returns 0, the commit will be
marked as added but won't be pushed to the pending list, effectively
limiting the walk.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:44:52 -07:00
Nguyễn Thái Ngọc Duy c8556c6213 Fix calling parse_pathspec with no paths nor PATHSPEC_PREFER_* flags
When parse_pathspec() is called with no paths, the behavior could be
either return no paths, or return one path that is cwd. Some commands
do the former, some the latter. parse_pathspec() itself does not make
either the default and requires the caller to specify either flag if
it may run into this situation.

I've grep'd through all parse_pathspec() call sites. Some pass
neither, but those are guaranteed never pass empty path to
parse_pathspec(). There are two call sites that may pass empty path
and are fixed with this patch.

[jc: added a test from Antoine's bug report]

Reported-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-22 10:49:43 -07:00
Junio C Hamano 895c5ba3c1 revision: do not peel tags used in range notation
A range notation "A..B" means exactly the same thing as what "^A B"
means, i.e. the set of commits that are reachable from B but not
from A.  But the internal representation after the revision parser
parsed these two notations are subtly different.

 - "rev-list ^A B" leaves A and B in the revs->pending.objects[]
   array, with the former marked as UNINTERESTING and the revision
   traversal machinery propagates the mark to underlying commit
   objects A^0 and B^0.

 - "rev-list A..B" peels tags and leaves A^0 (marked as
   UNINTERESTING) and B^0 in revs->pending.objects[] array before
   the traversal machinery kicks in.

This difference usually does not matter, but starts to matter when
the --objects option is used.  For example, we see this:

    $ git rev-list --objects v1.8.4^1..v1.8.4 | grep $(git rev-parse v1.8.4)
    $ git rev-list --objects v1.8.4 ^v1.8.4^1 | grep $(git rev-parse v1.8.4)
    04f013dc38 v1.8.4

With the former invocation, the revision traversal machinery never
hears about the tag v1.8.4 (it only sees the result of peeling it,
i.e. the commit v1.8.4^0), and the tag itself does not appear in the
output.  The latter does send the tag object itself to the output.

Make the range notation keep the unpeeled objects and feed them to
the traversal machinery to fix this inconsistency.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-15 16:17:09 -07:00
Junio C Hamano f406140baa Merge branch 'fc/at-head'
Instead of typing four capital letters "HEAD", you can say "@" now,
e.g. "git log @".

* fc/at-head:
  Add new @ shortcut for HEAD
  sha1-name: pass len argument to interpret_branch_name()
2013-09-20 12:38:10 -07:00
Junio C Hamano b8f23112f0 Merge branch 'jk/free-tree-buffer'
* jk/free-tree-buffer:
  clear parsed flag when we free tree buffers
2013-09-17 11:37:33 -07:00
Junio C Hamano b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Felipe Contreras cf99a761d3 sha1-name: pass len argument to interpret_branch_name()
This is useful to make sure we don't step outside the boundaries of what
we are interpreting at the moment. For example while interpreting
foobar@{u}~1, the job of interpret_branch_name() ends right before ~1,
but there's no way to figure that out inside the function, unless the
len argument is passed.

So let's do that.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-03 11:33:00 -07:00
Junio C Hamano e7b432c521 revision: introduce --exclude=<glob> to tame wildcards
People often find "git log --branches" etc. that includes _all_
branches is cumbersome to use when they want to grab most but except
some.  The same applies to --tags, --all and --glob.

Teach the revision machinery to remember patterns, and then upon the
next such a globbing option, exclude those that match the pattern.

With this, I can view only my integration branches (e.g. maint,
master, etc.) without topic branches, which are named after two
letters from primary authors' names, slash and topic name.

    git rev-list --no-walk --exclude=??/* --branches |
    git name-rev --refs refs/heads/* --stdin

This one shows things reachable from local and remote branches that
have not been merged to the integration branches.

    git log --remotes --branches --not --exclude=??/* --branches

It may be a bit rough around the edges, in that the pattern to give
the exclude option depends on what globbing option follows.  In
these examples, the pattern "??/*" is used, not "refs/heads/??/*",
because the globbing option that follows the -"-exclude=<pattern>"
is "--branches".  As each use of globbing option resets previously
set "--exclude", this may not be such a bad thing, though.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-30 16:37:55 -07:00
Thomas Rast 838f9a1566 log: use true parents for diff when walking reflogs
The reflog walking logic (git log -g) replaces the true parent list
with the preceding commit in the reflog.  This results in bogus commit
diffs when combined with options such as -p; the diff is against the
reflog predecessor, not the parent of the commit.

Save the true parents on the side, extending the functions from the
previous commit.  The diff logic picks them up and uses them to show
the correct diffs.

We do have to be somewhat careful about repeated calling of
save_parents(), since the reflog may list a commit more than once.  We
now store (commit_list*)-1 to distinguish the "not saved yet" and
"root commit" cases.  This lets us preserve an empty parent list even
if save_parents() is repeatedly called.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 08:27:00 -07:00
Thomas Rast 53d00b39ce log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.

This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it.  So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.

However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of

  git log --graph --stat ...
  git log --graph --full-diff --stat ...

(--graph internally kicks in parent simplification, much like
--parents).

To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect.  Then use the stored parents
instead of the simplified ones in the commit display code paths.  The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.

For ordinary commits it should be obvious that this is the right thing
to do.

Merge commits are a bit subtle.  Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.

So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent.  Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place).  This triggers --cc showing
these hunks spuriously.

Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 10:25:48 -07:00
Nguyễn Thái Ngọc Duy 9a08727443 remove init_pathspec() in favor of parse_pathspec()
While at there, move free_pathspec() to pathspec.c

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Nguyễn Thái Ngọc Duy bd1928df1d remove diff_tree_{setup,release}_paths
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Nguyễn Thái Ngọc Duy 0fdc2ae512 convert some get_pathspec() calls to parse_pathspec()
These call sites follow the pattern:

   paths = get_pathspec(prefix, argv);
   init_pathspec(&pathspec, paths);

which can be converted into a single parse_pathspec() call.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Nguyễn Thái Ngọc Duy 9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Junio C Hamano 534f0e0996 Merge branch 'jc/topo-author-date-sort'
"git log" learned the "--author-date-order" option, with which the
output is topologically sorted and commits in parallel histories
are shown intermixed together based on the author timestamp.

* jc/topo-author-date-sort:
  t6003: add --author-date-order test
  topology tests: teach a helper to set author dates as well
  t6003: add --date-order test
  topology tests: teach a helper to take abbreviated timestamps
  t/lib-t6000: style fixes
  log: --author-date-order
  sort-in-topological-order: use prio-queue
  prio-queue: priority queue of pointers to structs
  toposort: rename "lifo" field
2013-07-01 12:41:23 -07:00
Junio C Hamano ede63a195c Merge branch 'mh/reflife'
Define memory ownership and lifetime rules for what for-each-ref
feeds to its callbacks (in short, "you do not own it, so make a
copy if you want to keep it").

* mh/reflife: (25 commits)
  refs: document the lifetime of the args passed to each_ref_fn
  register_ref(): make a copy of the bad reference SHA-1
  exclude_existing(): set existing_refs.strdup_strings
  string_list_add_refs_by_glob(): add a comment about memory management
  string_list_add_one_ref(): rename first parameter to "refname"
  show_head_ref(): rename first parameter to "refname"
  show_head_ref(): do not shadow name of argument
  add_existing(): do not retain a reference to sha1
  do_fetch(): clean up existing_refs before exiting
  do_fetch(): reduce scope of peer_item
  object_array_entry: fix memory handling of the name field
  find_first_merges(): remove unnecessary code
  find_first_merges(): initialize merges variable using initializer
  fsck: don't put a void*-shaped peg in a char*-shaped hole
  object_array_remove_duplicates(): rewrite to reduce copying
  revision: use object_array_filter() in implementation of gc_boundary()
  object_array: add function object_array_filter()
  revision: split some overly-long lines
  cmd_diff(): make it obvious which cases are exclusive of each other
  cmd_diff(): rename local variable "list" -> "entry"
  ...
2013-06-14 08:46:14 -07:00
Junio C Hamano b27a79d16b Merge branch 'kb/full-history-compute-treesame-carefully-2'
Major update to the revision traversal logic to improve culling of
irrelevant parents while traversing a mergy history.

* kb/full-history-compute-treesame-carefully-2:
  revision.c: make default history consider bottom commits
  revision.c: don't show all merges for --parents
  revision.c: discount side branches when computing TREESAME
  revision.c: add BOTTOM flag for commits
  simplify-merges: drop merge from irrelevant side branch
  simplify-merges: never remove all TREESAME parents
  t6012: update test for tweaked full-history traversal
  revision.c: Make --full-history consider more merges
  Documentation: avoid "uninteresting"
  rev-list-options.txt: correct TREESAME for P
  t6111: add parents to tests
  t6111: allow checking the parents as well
  t6111: new TREESAME test set
  t6019: test file dropped in -s ours merge
  decorate.c: compact table when growing
2013-06-14 08:45:59 -07:00
Junio C Hamano 81c6b38b67 log: --author-date-order
Sometimes people would want to view the commits in parallel
histories in the order of author dates, not committer dates.

Teach "topo-order" sort machinery to do so, using a commit-info slab
to record the author dates of each commit, and prio-queue to sort
them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11 15:15:21 -07:00
Junio C Hamano 08f704f294 toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are.  When
traversing a forked history like this with "git log C E":

    A----B----C
     \
      D----E

we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.

In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.

Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output).  The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour.  We start the traversal by knowing two
commits, C and E.  While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected.  By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.

When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.

The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.

Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code.  The mechanical replacement rule is:

  "lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
  "lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11 15:15:21 -07:00
Jeff King 6e454b9a31 clear parsed flag when we free tree buffers
Many code paths will free a tree object's buffer and set it
to NULL after finishing with it in order to keep memory
usage down during a traversal. However, out of 8 sites that
do this, only one actually unsets the "parsed" flag back.
Those sites that don't are setting a trap for later users of
the tree object; even after calling parse_tree, the buffer
will remain NULL, causing potential segfaults.

It is not known whether this is triggerable in the current
code. Most commands do not do an in-memory traversal
followed by actually using the objects again. However, it
does not hurt to be safe for future callers.

In most cases, we can abstract this out to a
"free_tree_buffer" helper. However, there are two
exceptions:

  1. The fsck code relies on the parsed flag to know that we
     were able to parse the object at one point. We can
     switch this to using a flag in the "flags" field.

  2. The index-pack code sets the buffer to NULL but does
     not free it (it is freed by a caller). We should still
     unset the parsed flag here, but we cannot use our
     helper, as we do not want to free the buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-06 10:29:12 -07:00
Junio C Hamano ed73fe5642 Merge branch 'tr/line-log'
* tr/line-log:
  git-log(1): remove --full-line-diff description
  line-log: fix documentation formatting
  log -L: improve comments in process_all_files()
  log -L: store the path instead of a diff_filespec
  log -L: test merge of parallel modify/rename
  t4211: pass -M to 'git log -M -L...' test
  log -L: fix overlapping input ranges
  log -L: check range set invariants when we look it up
  Speed up log -L... -M
  log -L: :pattern:file syntax to find by funcname
  Implement line-history search (git log -L)
  Export rewrite_parents() for 'log -L'
  Refactor parse_loc
2013-06-02 16:00:44 -07:00
Michael Haggerty 31faeb2088 object_array_entry: fix memory handling of the name field
Previously, the memory management of the object_array_entry::name
field was inconsistent and undocumented.  object_array_entries are
ultimately created by a single function, add_object_array_with_mode(),
which has an argument "const char *name".  This function used to
simply set the name field to reference the string pointed to by the
name parameter, and nobody on the object_array side ever freed the
memory.  Thus, it assumed that the memory for the name field would be
managed by the caller, and that the lifetime of that string would be
at least as long as the lifetime of the object_array_entry.  But
callers were inconsistent:

* Some passed pointers to constant strings or argv entries, which was
  OK.

* Some passed pointers to newly-allocated memory, but didn't arrange
  for the memory ever to be freed.

* Some passed the return value of sha1_to_hex(), which is a pointer to
  a statically-allocated buffer that can be overwritten at any time.

* Some passed pointers to refnames that they received from a
  for_each_ref()-type iteration, but the lifetimes of such refnames is
  not guaranteed by the refs API.

Bring consistency to this mess by changing object_array to make its
own copy for the object_array_entry::name field and free this memory
when an object_array_entry is deleted from the array.

Many callers were passing the empty string as the name parameter, so
as a performance optimization, treat the empty string specially.
Instead of making a copy, store a pointer to a statically-allocated
empty string to object_array_entry::name.  When deleting such an
entry, skip the free().

Change the callers that were already passing copies to
add_object_array_with_mode() to either skip the copy, or (if the
memory needed to be allocated anyway) freeing the memory itself.

A part of this commit effectively reverts

    70d26c6e76 read_revisions_from_stdin: make copies for handle_revision_arg

because the copying introduced by that commit (which is still
necessary) is now done at a deeper level.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02 15:28:46 -07:00
Michael Haggerty be6754c67f revision: use object_array_filter() in implementation of gc_boundary()
Use object_array_filter(), which will soon be made smarter about
cleaning up discarded entries properly.  Also add a function comment.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 09:25:01 -07:00
Michael Haggerty ff5f5f268f revision: split some overly-long lines
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 09:25:01 -07:00
Michael Haggerty df835d3a0c add_rev_cmdline(): make a copy of the name argument
Instead of assuming that the memory pointed to by the name argument
will live forever, make a local copy of it before storing it in the
ref_cmdline_info.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 09:25:00 -07:00
Kevin Bracey 141efdba57 revision.c: make default history consider bottom commits
Previously, the default history treated bottom commits the same as any
other UNINTERESTING commit, which could force it down side branches.

Consider the following history:

   *A--*B---D--*F         * marks !TREESAME parent paths
     \     /*
      `-C-'

When requesting "B..F", B is UNINTERESTING but TREESAME to D. C is
!UNINTERESTING.

So default following would go from D into the irrelevant side branch C
to A, rather than to B.  Note also that if there had been an extra
!UNINTERESTING commit B1 between B and D, it wouldn't have gone down C.

Change the default following to test relevant_commit() instead of
!UNINTERESTING, so it can proceed straight from D to B, thus finishing
the traversal of that path.

Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 11:51:10 -07:00
Kevin Bracey bf3418b08b revision.c: don't show all merges for --parents
When using --parents or --children, get_commit_action() previously showed
all merges, even if TREESAME to both parents.

This was intended to tie together the topology of the rewritten parents,
but it was excessive - in fact we only need to show merges that have two
or more relevant parents. Merges at the boundary do not necessarily need
to be shown.

Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 11:51:10 -07:00
Kevin Bracey 4d826608e9 revision.c: discount side branches when computing TREESAME
Use the BOTTOM flag to define relevance for pruning. Relevant commits
are those that are !UNINTERESTING or BOTTOM, and this allows us to
identify irrelevant side branches (UNINTERESTING && !BOTTOM).

If a merge has relevant parents, and it is TREESAME to them, then do not
let irrelevant parents cause the merge to be treated as !TREESAME.

When considering simplification, don't always include all merges -
merges with exactly one relevant parent can be simplified, if TREESAME
according to the above rule.

These two changes greatly increase simplification in limited, pruned
revision lists.

Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 11:51:10 -07:00
Kevin Bracey 7f34a46ff5 revision.c: add BOTTOM flag for commits
When performing edge-based operations on the revision graph, it can be
useful to be able to identify the INTERESTING graph's connection(s) to
the bottom commit(s) specified by the user.

Conceptually when the user specifies "A..B" (== B ^A), they are asking
for the history from A to B. The first connection from A onto the
INTERESTING graph is part of that history, and should be considered. If
we consider only INTERESTING nodes and their connections, then we're
really only considering the history from A's immediate descendants to B.

This patch does not change behaviour, but adds a new BOTTOM flag to
indicate the bottom commits specified by the user, ready to be used by
following patches.

We immediately use the BOTTOM flag to return collect_bottom_commits() to
its original approach of examining the pending commit list rather than
the command line. This will ensure alignment of the definition of
"bottom" with future patches.

Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 11:51:10 -07:00
Kevin Bracey 143f1eafdb simplify-merges: drop merge from irrelevant side branch
Reimplement commit 4b7f53da on top of the new simplify-merges
infrastructure, tightening the condition to only consider root parents;
the original version incorrectly dropped parents that were TREESAME to
anything.

Original log message follows.

The merge simplification rule stated in 6546b59 (revision traversal:
show full history with merge simplification, 2008-07-31) still
treated merge commits too specially.  Namely, in a history with this
shape:

	---o---o---M
	          /
         x---x---x

where three 'x' were on a history completely unrelated to the main
history 'o' and do not touch any of the paths we are following, we
still said that after simplifying all of the parents of M, 'x'
(which is the leftmost 'x' that rightmost 'x simplifies down to) and
'o' (which would be the last commit on the main history that touches
the paths we are following) are independent from each other, and
both need to be kept.

That is incorrect; when the side branch 'x' never touches the paths,
it should be removed to allow M to simplify down to the last commit
on the main history that touches the paths.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 11:51:09 -07:00
Kevin Bracey 9c129eab99 simplify-merges: never remove all TREESAME parents
When simplifying an odd merge, such as one that used "-s ours", we may
find ourselves TREESAME to apparently redundant parents. Prevent
simplify_merges() from removing every TREESAME parent; if this would
happen reinstate the first TREESAME parent - the one that the default
log would have followed.

This avoids producing a totally disjoint history from the default log
when the default log is a better explanation of the end result, and aids
visualisation of odd merges.

Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 11:51:09 -07:00
Kevin Bracey d0af663e42 revision.c: Make --full-history consider more merges
History simplification previously always treated merges as TREESAME
if they were TREESAME to any parent.

While this was consistent with the default behaviour, this could be
extremely unhelpful when searching detailed history, and could not be
overridden. For example, if a merge had ignored a change, as if by "-s
ours", then:

  git log -m -p --full-history -Schange file

would successfully locate "change"'s addition but would not locate the
merge that resolved against it.

Futher, simplify_merges could drop the actual parent that a commit
was TREESAME to, leaving it as a normal commit marked TREESAME that
isn't actually TREESAME to its remaining parent.

Now redefine a commit's TREESAME flag to be true only if a commit is
TREESAME to _all_ of its parents. This doesn't affect either the default
simplify_history behaviour (because partially TREESAME merges are turned
into normal commits), or full-history with parent rewriting (because all
merges are output). But it does affect other modes. The clearest
difference is that --full-history will show more merges - sufficient to
ensure that -m -p --full-history log searches can really explain every
change to the file, including those changes' ultimate fate in merges.

Also modify simplify_merges to recalculate TREESAME after removing
a parent. This is achieved by storing per-parent TREESAME flags on the
initial scan, so the combined flag can be easily recomputed.

This fixes some t6111 failures, but creates a couple of new ones -
we are now showing some merges that don't need to be shown.

Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 11:51:09 -07:00
Kevin Bracey a765499a08 revision.c: treat A...B merge bases as if manually specified
The documentation assures users that "A...B" is defined as "A B --not
$(git merge-base --all A B)". This wasn't in fact quite true, because
the calculated merge bases were not sent to add_rev_cmdline().

The main effect of this was that although

  git rev-list --ancestry-path A B --not $(git merge-base --all A B)

worked, the simpler form

  git rev-list --ancestry-path A...B

failed with a "no bottom commits" error.

Other potential users of bottom commits could also be affected by this
problem, if they examine revs->cmdline_info; I came across the issue in
my proposed history traversal refinements series.

So ensure that the calculated merge bases are sent to add_rev_cmdline(),
flagged with new 'whence' enum value REV_CMD_MERGE_BASE.

Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 11:45:34 -07:00
Junio C Hamano e52e6f79cc Merge branch 'nd/pretty-formats'
pretty-printing body of the commit that is stored in non UTF-8
encoding did not work well.  The early part of this series fixes
it.  And then it adds %C(auto) specifier that turns the coloring on
when we are emitting to the terminal, and adds column-aligning
format directives.

* nd/pretty-formats:
  pretty: support %>> that steal trailing spaces
  pretty: support truncating in %>, %< and %><
  pretty: support padding placeholders, %< %> and %><
  pretty: add %C(auto) for auto-coloring
  pretty: split color parsing into a separate function
  pretty: two phase conversion for non utf-8 commits
  utf8.c: add reencode_string_len() that can handle NULs in string
  utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences
  utf8.c: move display_mode_esc_sequence_len() for use by other functions
  pretty: share code between format_decoration and show_decorations
  pretty-formats.txt: wrap long lines
  pretty: get the correct encoding for --pretty:format=%e
  pretty: save commit encoding from logmsg_reencode if the caller needs it
2013-04-23 11:22:48 -07:00
Junio C Hamano 8d41addacb Merge branch 'tr/copy-revisions-from-stdin'
A fix to a long-standing issue in the command line parser for
revisions, which was triggered by mv/sequence-pick-error-diag topic.

* tr/copy-revisions-from-stdin:
  read_revisions_from_stdin: make copies for handle_revision_arg
2013-04-19 13:40:13 -07:00
Nguyễn Thái Ngọc Duy 5a10d23658 pretty: save commit encoding from logmsg_reencode if the caller needs it
The commit encoding is parsed by logmsg_reencode, there's no need for
the caller to re-parse it again. The reencoded message now has the new
encoding, not the original one. The caller would need to read commit
object again before parsing.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:27 -07:00
Thomas Rast 70d26c6e76 read_revisions_from_stdin: make copies for handle_revision_arg
read_revisions_from_stdin() has passed pointers to its read buffer
down to handle_revision_arg() since its inception way back in 42cabc3
(Teach rev-list an option to read revs from the standard input.,
2006-09-05).  Even back then, this was a bug: through
add_pending_object, the argument was recorded in the object_array's
'name' field.

Fix it by making a copy whenever read_revisions_from_stdin() passes an
argument down the callchain.  The other caller runs handle_revision_arg()
on argv[], where it would be redundant to make a copy.

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 11:17:48 -07:00
Junio C Hamano 0290bf1250 Revert 4b7f53da76 (simplify-merges: drop merge from irrelevant side branch, 2013-01-17)
Kevin Bracey reports that the change regresses a case shown in the
user manual.

Trading one fix with another breakage is not worth it.  Just keep
the test to document the existing breakage, and revert the change
for now.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-08 13:10:27 -07:00
Junio C Hamano 92e0d91632 Sync with 1.8.1 maintenance track
* maint-1.8.1:
  Start preparing for 1.8.1.6
  git-tag(1): we tag HEAD by default
  Fix revision walk for commits with the same dates
  t2003: work around path mangling issue on Windows
  pack-refs: add fully-peeled trait
  pack-refs: write peeled entry for non-tags
  use parse_object_or_die instead of die("bad object")
  avoid segfaults on parse_object failure
  entry: fix filter lookup
  t2003: modernize style
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-03 09:18:01 -07:00
Junio C Hamano 64379806a9 Merge branch 'kk/revwalk-slop-too-many-commit-within-a-second' into maint-1.8.1
* kk/revwalk-slop-too-many-commit-within-a-second:
  Fix revision walk for commits with the same dates
2013-04-03 08:44:02 -07:00
Junio C Hamano 74bd52681d Merge branch 'kk/revwalk-slop-too-many-commit-within-a-second'
Allow the revision "slop" code to look deeper while commits with
exactly the same timestamps come next to each other (which can
often happen after a large "am" and "rebase" session).

* kk/revwalk-slop-too-many-commit-within-a-second:
  Fix revision walk for commits with the same dates
2013-03-28 14:38:25 -07:00
Junio C Hamano 436b60ce7a Merge branch 'jc/remove-treesame-parent-in-simplify-merges'
The --simplify-merges logic did not cull irrelevant parents from a
merge that is otherwise not interesting with respect to the paths
we are following.

This touches a fairly core part of the revision traversal
infrastructure; even though I think this change is correct, please
report immediately if you find any unintended side effect.

* jc/remove-treesame-parent-in-simplify-merges:
  simplify-merges: drop merge from irrelevant side branch
2013-03-28 14:37:53 -07:00
Thomas Rast 12da1d1f6f Implement line-history search (git log -L)
This is a rewrite of much of Bo's work, mainly in an effort to split
it into smaller, easier to understand routines.

The algorithm is built around the struct range_set, which encodes a
series of line ranges as intervals [a,b).  This is used in two
contexts:

* A set of lines we are tracking (which will change as we dig through
  history).
* To encode diffs, as pairs of ranges.

The main routine is range_set_map_across_diff().  It processes the
diff between a commit C and some parent P.  It determines which diff
hunks are relevant to the ranges tracked in C, and computes the new
ranges for P.

The algorithm is then simply to process history in topological order
from newest to oldest, computing ranges and (partial) diffs.  At
branch points, we need to merge the ranges we are watching.  We will
find that many commits do not affect the chosen ranges, and mark them
TREESAME (in addition to those already filtered by pathspec limiting).
Another pass of history simplification then gets rid of such commits.

This is wired as an extra filtering pass in the log machinery.  This
currently only reduces code duplication, but should allow for other
simplifications and options to be used.

Finally, we hook a diff printer into the output chain.  Ideally we
would wire directly into the diff logic, to optionally use features
like word diff.  However, that will require some major reworking of
the diff chain, so we completely replace the output with our own diff
for now.

As this was a GSoC project, and has quite some history by now, many
people have helped.  In no particular order, thanks go to

  Jakub Narebski <jnareb@gmail.com>
  Jens Lehmann <Jens.Lehmann@web.de>
  Jonathan Nieder <jrnieder@gmail.com>
  Junio C Hamano <gitster@pobox.com>
  Ramsay Jones <ramsay@ramsay1.demon.co.uk>
  Will Palmer <wmpalmer@gmail.com>

Apologies to everyone I forgot.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 10:29:22 -07:00
Bo Yang c7edcae06e Export rewrite_parents() for 'log -L'
The function rewrite_one is used to rewrite a single
parent of the current commit, and is used by rewrite_parents
to rewrite all the parents.

Decouple the dependence between them by making rewrite_one
a callback function that is passed to rewrite_parents. Then
export rewrite_parents for reuse by the line history browser.

We will use this function in line-log.c.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 10:29:10 -07:00
Kacper Kornet c19d1b4e84 Fix revision walk for commits with the same dates
Logic in still_interesting function allows to stop the commits
traversing if the oldest processed commit is not older then the
youngest commit on the list to process and the list contains only
commits marked as not interesting ones. It can be premature when dealing
with a set of coequal commits. For example git rev-list A^! --not B
provides wrong answer if all commits in the range A..B had the same
commit time and there are more then 7 of them.

To fix this problem the relevant part of the logic in still_interesting
is changed to: the walk can be stopped if the oldest processed commit is
younger then the youngest commit on the list to processed.

Signed-off-by: Kacper Kornet <draenog@pld-linux.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-22 16:15:48 -07:00
Jeff King 04deccda11 log: re-encode commit messages before grepping
If you run "git log --grep=foo", we will run your regex on
the literal bytes of the commit message. This can provide
confusing results if the commit message is not in the same
encoding as your grep expression (or worse, you have commits
in multiple encodings, in which case your regex would need
to be written to match either encoding). On top of this, we
might also be grepping in the commit's notes, which are
already re-encoded, potentially leading to grepping in a
buffer with mixed encodings concatenated. This is insanity,
but most people never noticed, because their terminal and
their commit encodings all match.

Instead, let's massage the to-be-grepped commit into a
standardized encoding. There is not much point in adding a
flag for "this is the encoding I expect my grep pattern to
match"; the only sane choice is for it to use the log output
encoding. That is presumably what the user's terminal is
using, and it means that the patterns found by the grep will
match the output produced by git.

As a bonus, this fixes a potential segfault in commit_match
when commit->buffer is NULL, as we now build on logmsg_reencode,
which handles reading the commit buffer from disk if
necessary. The segfault can be triggered with:

        git commit -m 'text1' --allow-empty
        git commit -m 'text2' --allow-empty
        git log --graph --no-walk --grep 'text2'

which arguably does not make any sense (--graph inherently
wants a connected history, and by --no-walk the command line
is telling us to show discrete points in history without
connectivity), and we probably should forbid the
combination, but that is a separate issue.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-11 13:11:45 -08:00
Junio C Hamano 4b7f53da76 simplify-merges: drop merge from irrelevant side branch
The merge simplification rule stated in 6546b59 (revision traversal:
show full history with merge simplification, 2008-07-31) still
treated merge commits too specially.  Namely, in a history with this
shape:

	---o---o---M
	          /
         x---x---x

where three 'x' were on a history completely unrelated to the main
history 'o' and do not touch any of the paths we are following, we
still said that after simplifying all of the parents of M, 'x'
(which is the leftmost 'x' that rightmost 'x simplifies down to) and
'o' (which would be the last commit on the main history that touches
the paths we are following) are independent from each other, and
both need to be kept.

That is incorrect; when the side branch 'x' never touches the paths,
it should be removed to allow M to simplify down to the last commit
on the main history that touches the paths.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-17 15:22:48 -08:00
Junio C Hamano df874fa82e log --use-mailmap: optimize for cases without --author/--committer search
When we taught the commit_match() mechanism to pay attention to the
new --use-mailmap option, we started to unconditionally copy the
commit object to a temporary buffer, just in case we need the author
and committer lines updated via the mailmap mechanism, and rewrite
author and committer using the mailmap.

It turns out that this has a rather unpleasant performance
implications.  In the linux kernel repository, running

  $ git log --author='Junio C Hamano' --pretty=short >/dev/null

under /usr/bin/time, with and without --use-mailmap (the .mailmap
file is 118 entries long, the particular author does not appear in
it), cost (with warm cache):

  [without --use-mailmap]
  5.42user 0.26system 0:05.70elapsed 99%CPU (0avgtext+0avgdata 2005936maxresident)k
  0inputs+0outputs (0major+137669minor)pagefaults 0swaps

  [with --use-mailmap]
  6.47user 0.30system 0:06.78elapsed 99%CPU (0avgtext+0avgdata 2006288maxresident)k
  0inputs+0outputs (0major+137692minor)pagefaults 0swaps

which incurs about 20% overhead.  The command is doing extra work,
so the extra cost may be justified.

But it is inexcusable to pay the cost when we do not need
author/committer match.  In the same repository,

  $ git log --grep='fix menuconfig on debian lenny' --pretty=short >/dev/null

shows very similar numbers as the above:

  [without --use-mailmap]
  5.32user 0.30system 0:05.63elapsed 99%CPU (0avgtext+0avgdata 2005984maxresident)k
  0inputs+0outputs (0major+137672minor)pagefaults 0swaps

  [with --use-mailmap]
  6.64user 0.24system 0:06.89elapsed 99%CPU (0avgtext+0avgdata 2006320maxresident)k
  0inputs+0outputs (0major+137694minor)pagefaults 0swaps

The latter case is an unnecessary performance regression.  We may
want to _show_ the result with mailmap applied, but we do not have
to copy and rewrite the author/committer of all commits we try to
match if we do not query for these fields.

Trivially optimize this performace regression by limiting the
rewrites for only when we are matching with author/committer fields.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-10 12:33:09 -08:00
Antoine Pelisse d72fbe8111 log: grep author/committer using mailmap
Currently you can use mailmap to display log authors and committers
but you can't use the mailmap to find commits with mapped values.

This commit allows you to run:

    git log --use-mailmap --author mapped_name_or_email
    git log --use-mailmap --committer mapped_name_or_email

Of course it only works if the --use-mailmap option is used.

The new name and email are copied only when necessary.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-10 12:33:08 -08:00
Junio C Hamano 4ad4fce63a Merge branch 'jc/prettier-pretty-note'
Emit the notes attached to the commit in "format-patch --notes"
output after three-dashes.

* jc/prettier-pretty-note:
  format-patch: add a blank line between notes and diffstat
  Doc User-Manual: Patch cover letter, three dashes, and --notes
  Doc format-patch: clarify --notes use case
  Doc notes: Include the format-patch --notes option
  Doc SubmittingPatches: Mention --notes option after "cover letter"
  Documentation: decribe format-patch --notes
  format-patch --notes: show notes after three-dashes
  format-patch: append --signature after notes
  pretty_print_commit(): do not append notes message
  pretty: prepare notes message at a centralized place
  format_note(): simplify API
  pretty: remove reencode_commit_message()
2012-11-15 10:25:05 -08:00
Junio C Hamano 76141e2e62 format_note(): simplify API
We either stuff the notes message without modification for %N
userformat, or format it for human consumption.  Using two bits
is an overkill that does not benefit anybody.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-17 22:42:40 -07:00
Junio C Hamano 727b6fc3ed log --grep: accept --basic-regexp and --perl-regexp
When we added the "--perl-regexp" option (or "-P") to "git grep", we
should have done the same for the commands in the "git log" family,
but somehow we forgot to do so.  This corrects it, but we will
reserve the short-and-sweet "-P" option for something else for now.

Also introduce the "--basic-regexp" option for completeness, so that
the "last one wins" principle can be used to defeat an earlier -E
option, e.g. "git log -E --basic-regexp --grep='<bre>'".  Note that
it cannot have the short "-G" option as the option is to grep in the
patch text in the context of "log" family.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 23:21:30 -07:00
Junio C Hamano 34a4ae55b2 log --grep: use the same helper to set -E/-F options as "git grep"
The command line option parser for "git log -F -E --grep='<ere>'"
did not flip the "fixed" bit, violating the general "last option
wins" principle among conflicting options.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 23:21:29 -07:00
Junio C Hamano 918d4e1c90 revisions: initialize revs->grep_filter using grep_init()
Instead of using the hand-rolled initialization sequence,
use grep_init() to populate the necessary bits.  This opens
the door to allow the calling commands to optionally read
grep.* configuration variables via git_config() if they
want to.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 23:21:29 -07:00
Junio C Hamano 31d69db340 Merge branch 'jc/maint-log-grep-all-match-1' into maint
* jc/maint-log-grep-all-match-1:
  grep.c: make two symbols really file-scope static this time
  t7810-grep: test --all-match with multiple --grep and --author options
  t7810-grep: test interaction of multiple --grep and --author options
  t7810-grep: test multiple --author with --all-match
  t7810-grep: test multiple --grep with and without --all-match
  t7810-grep: bring log --grep tests in common form
  grep.c: mark private file-scope symbols as static
  log: document use of multiple commit limiting options
  log --grep/--author: honor --all-match honored for multiple --grep patterns
  grep: show --debug output only once
  grep: teach --debug option to dump the parse tree
2012-09-29 22:30:56 -07:00
Nguyễn Thái Ngọc Duy 38cfe915bf revision: make --grep search in notes too if shown
Notes are shown after commit body. From user perspective it looks
pretty much like commit body and they may assume --grep would search
in that part too.

Make it so.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29 12:15:05 -07:00
Junio C Hamano baa6378ff2 log --grep-reflog: reject the option without -g
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29 12:07:04 -07:00
Nguyễn Thái Ngọc Duy 72fd13f71c revision: add --grep-reflog to filter commits by reflog messages
Similar to --author/--committer which filters commits by author and
committer header fields. --grep-reflog adds a fake "reflog" header to
commit and a grep filter to search on that line.

All rules to --author/--committer apply except no timestamp stripping.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29 11:41:14 -07:00
Junio C Hamano 3d7535e424 Merge branch 'jc/maint-log-grep-all-match'
Fix a long-standing bug in "git log --grep" when multiple "--grep"
are used together with "--all-match" and "--author" or "--committer".

* jc/maint-log-grep-all-match:
  t7810-grep: test --all-match with multiple --grep and --author options
  t7810-grep: test interaction of multiple --grep and --author options
  t7810-grep: test multiple --author with --all-match
  t7810-grep: test multiple --grep with and without --all-match
  t7810-grep: bring log --grep tests in common form
  grep.c: mark private file-scope symbols as static
  log: document use of multiple commit limiting options
  log --grep/--author: honor --all-match honored for multiple --grep patterns
  grep: show --debug output only once
  grep: teach --debug option to dump the parse tree
2012-09-18 14:37:54 -07:00
Junio C Hamano 78ed88d80a Merge branch 'mz/cherry-pick-cmdline-order' into maint
* mz/cherry-pick-cmdline-order:
  cherry-pick/revert: respect order of revisions to pick
  demonstrate broken 'git cherry-pick three one two'
  teach log --no-walk=unsorted, which avoids sorting
2012-09-14 21:24:18 -07:00
Junio C Hamano 17bf35a3c7 grep: teach --debug option to dump the parse tree
Our "grep" allows complex boolean expressions to be formed to match
each individual line with operators like --and, '(', ')' and --not.
Introduce the "--debug" option to show the parse tree to help people
who want to debug and enhance it.

Also "log" learns "--grep-debug" option to do the same.  The command
line parser to the log family is a lot more limited than the general
"git grep" parser, but it has special handling for header matching
(e.g. "--author"), and a parse tree is valuable when working on it.

Note that "--all-match" is *not* any individual node in the parse
tree.  It is an instruction to the evaluator to check all the nodes
in the top-level backbone have matched and reject a document as
non-matching otherwise.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-14 10:10:35 -07:00
Junio C Hamano 3503e9ab32 Merge branch 'maint-1.7.11' into maint 2012-09-12 14:08:05 -07:00
Junio C Hamano eaff724bbc Merge branch 'jc/dotdot-is-parent-directory' into maint-1.7.11
"git log .." errored out saying it is both rev range and a path when
there is no disambiguating "--" is on the command line.  Update the
command line parser to interpret ".." as a path in such a case.

* jc/dotdot-is-parent-directory:
  specifying ranges: we did not mean to make ".." an empty set
2012-09-12 14:00:34 -07:00
Junio C Hamano 1c88a6d174 Sync with 1.7.11.6
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-11 11:23:54 -07:00
Junio C Hamano 738c218760 Merge branch 'tr/void-diff-setup-done' into maint-1.7.11
* tr/void-diff-setup-done:
  diff_setup_done(): return void
2012-09-11 10:53:40 -07:00
Junio C Hamano c2b927932d Merge branch 'mz/cherry-pick-cmdline-order'
"git cherry-pick A C B" used to replay changes in A and then B and
then C if these three commits had committer timestamps in that
order, which is not what the user who said "A C B" naturally expects.

* mz/cherry-pick-cmdline-order:
  cherry-pick/revert: respect order of revisions to pick
  demonstrate broken 'git cherry-pick three one two'
  teach log --no-walk=unsorted, which avoids sorting
2012-09-10 15:42:55 -07:00
Junio C Hamano e3f26752b5 Merge branch 'maint-1.7.11' into maint
* maint-1.7.11:
  Almost 1.7.11.6
  gitweb: URL-decode $my_url/$my_uri when stripping PATH_INFO
  rebase -i: use full onto sha1 in reflog
  sh-setup: protect from exported IFS
  receive-pack: do not leak output from auto-gc to standard output
  t/t5400: demonstrate breakage caused by informational message from prune
  setup: clarify error messages for file/revisions ambiguity
  send-email: improve RFC2047 quote parsing
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-09-10 15:31:06 -07:00
Junio C Hamano 03adeeaad6 Merge branch 'jk/maint-null-in-trees' into maint-1.7.11
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-09-10 15:24:54 -07:00
Junio C Hamano 7764a3b35c Merge branch 'jc/dotdot-is-parent-directory'
"git log .." errored out saying it is both rev range and a path when
there is no disambiguating "--" is on the command line.  Update the
command line parser to interpret ".." as a path in such a case.

* jc/dotdot-is-parent-directory:
  specifying ranges: we did not mean to make ".." an empty set
2012-09-07 11:09:18 -07:00
Martin von Zweigbergk ca92e59e30 teach log --no-walk=unsorted, which avoids sorting
When 'git log' is passed the --no-walk option, no revision walk takes
place, naturally. Perhaps somewhat surprisingly, however, the provided
revisions still get sorted by commit date. So e.g 'git log --no-walk
HEAD HEAD~1' and 'git log --no-walk HEAD~1 HEAD' give the same result
(unless the two revisions share the commit date, in which case they
will retain the order given on the command line). As the commit that
introduced --no-walk (8e64006 (Teach revision machinery about
--no-walk, 2007-07-24)) points out, the sorting is intentional, to
allow things like

 git log --abbrev-commit --pretty=oneline --decorate --all --no-walk

to show all refs in order by commit date.

But there are also other cases where the sorting is not wanted, such
as

 <command producing revisions in order> |
       git log --oneline --no-walk --stdin

To accomodate both cases, leave the decision of whether or not to sort
up to the caller, by allowing --no-walk={sorted,unsorted}, defaulting
to 'sorted' for backward-compatibility reasons.

Signed-off-by: Martin von Zweigbergk <martinvonz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-30 12:26:50 -07:00
Junio C Hamano 3b753148b6 Merge branch 'jk/maint-null-in-trees'
We do not want a link to 0{40} object stored anywhere in our objects.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-08-27 11:54:28 -07:00
Junio C Hamano 003c84f6d2 specifying ranges: we did not mean to make ".." an empty set
Either end of revision range operator can be omitted to default to HEAD,
as in "origin.." (what did I do since I forked) or "..origin" (what did
they do since I forked).  But the current parser interprets ".."  as an
empty range "HEAD..HEAD", and worse yet, because ".." does exist on the
filesystem, we get this annoying output:

  $ cd Documentation/howto
  $ git log .. ;# give me recent commits that touch Documentation/ area.
  fatal: ambiguous argument '..': both revision and filename
  Use '--' to separate filenames from revisions

Surely we could say "git log ../" or even "git log -- .." to disambiguate,
but we shouldn't have to.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-23 14:37:49 -07:00
Junio C Hamano 9cd33bbc52 Merge branch 'tr/void-diff-setup-done'
Remove unnecessary code.

* tr/void-diff-setup-done:
  diff_setup_done(): return void
2012-08-22 11:52:27 -07:00
Thomas Rast 28452655af diff_setup_done(): return void
diff_setup_done() has historically returned an error code, but lost
the last nonzero return in 943d5b7 (allow diff.renamelimit to be set
regardless of -M/-C, 2006-08-09).  The callers were in a pretty
confused state: some actually checked for the return code, and some
did not.

Let it return void, and patch all callers to take this into account.
This conveniently also gets rid of a handful of different(!) error
messages that could never be triggered anyway.

Note that the function can still die().

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-03 12:11:07 -07:00
Jeff King e54501004a diff: do not use null sha1 as a sentinel value
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).

The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.

We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.

This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.

One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29 15:04:32 -07:00
Junio C Hamano d05e56ea67 Merge branch 'jk/revision-walk-stop-at-max-count'
"git log -n 1 -- rarely-touched-path" was spending unnecessary
cycles after showing the first change to find the next one, only to
discard it.

* jk/revision-walk-stop-at-max-count:
  revision: avoid work after --max-count is reached
2012-07-22 12:56:30 -07:00
Junio C Hamano 0958a24d73 Merge branch 'jc/sha1-name-more'
Teaches the object name parser things like a "git describe" output
is always a commit object, "A" in "git log A" must be a committish,
and "A" and "B" in "git log A...B" both must be committish, etc., to
prolong the lifetime of abbreviated object names.

* jc/sha1-name-more: (27 commits)
  t1512: match the "other" object names
  t1512: ignore whitespaces in wc -l output
  rev-parse --disambiguate=<prefix>
  rev-parse: A and B in "rev-parse A..B" refer to committish
  reset: the command takes committish
  commit-tree: the command wants a tree and commits
  apply: --build-fake-ancestor expects blobs
  sha1_name.c: add support for disambiguating other types
  revision.c: the "log" family, except for "show", takes committish
  revision.c: allow handle_revision_arg() to take other flags
  sha1_name.c: introduce get_sha1_committish()
  sha1_name.c: teach lookup context to get_sha1_with_context()
  sha1_name.c: many short names can only be committish
  sha1_name.c: get_sha1_1() takes lookup flags
  sha1_name.c: get_describe_name() by definition groks only commits
  sha1_name.c: teach get_short_sha1() a commit-only option
  sha1_name.c: allow get_short_sha1() to take other flags
  get_sha1(): fix error status regression
  sha1_name.c: restructure disambiguation of short names
  sha1_name.c: correct misnamed "canonical" and "res"
  ...
2012-07-22 12:55:07 -07:00
Jeff King b72a1904ae revision: avoid work after --max-count is reached
During a revision traversal in which --max-count has been
specified, we decrement a counter for each revision returned
by get_revision. When it hits 0, we typically return NULL
(the exception being if we still have boundary commits to
show).

However, before we check the counter, we call get_revision_1
to get the next commit. This might involve looking at a
large number of commits if we have restricted the traversal
(e.g., we might traverse until we find the next commit whose
diff actually matches a pathspec).

There's no need to make this get_revision_1 call when our
counter runs out. If we are not in --boundary mode, we will
just throw away the result and immediately return NULL. If
we are in --boundary mode, then we will still throw away the
result, and then start showing the boundary commits.
However, as git_revision_1 does not impact the boundary
list, it should not have an impact.

In most cases, avoiding this work will not be especially
noticeable. However, in some cases, it can make a big
difference:

  [before]
  $ time git rev-list -1 origin Documentation/RelNotes/1.7.11.2.txt
  8d141a1d56

  real    0m0.301s
  user    0m0.280s
  sys     0m0.016s

  [after]
  $ time git rev-list -1 origin Documentation/RelNotes/1.7.11.2.txt
  8d141a1d56

  real    0m0.010s
  user    0m0.008s
  sys     0m0.000s

Note that the output is produced almost instantaneously in
the first case, and then git uselessly spends a long time
looking for the next commit to touch that file (but there
isn't one, and we traverse all the way down to the roots).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-13 13:51:29 -07:00
Junio C Hamano c8382c1500 Merge branch 'jc/rev-list-simplify-merges-first-parent' into maint
When "git log" gets "--simplify-merges/by-decoration" together with
"--first-parent", the combination of these options makes the
simplification logic to use in-core commit objects that haven't been
examined for relevance, either producing incorrect result or taking
too long to produce any output.  Teach the simplification logic to
ignore commits that the first-parent traversal logic ignored when
both are in effect to work around the issue.

* jc/rev-list-simplify-merges-first-parent:
  revision: ignore side parents while running simplify-merges
  revision: note the lack of free() in simplify_merges()
  revision: "simplify" options imply topo-order sort
2012-07-11 12:46:57 -07:00
Junio C Hamano 9ca724933a Merge branch 'mm/verify-filename-fix' into maint
"git diff COPYING HEAD:COPYING" gave a nonsense error message that
claimed that the treeish HEAD did not have COPYING in it.

* mm/verify-filename-fix:
  verify_filename(): ask the caller to chose the kind of diagnosis
  sha1_name: do not trigger detailed diagnosis for file arguments
2012-07-11 12:45:49 -07:00
Junio C Hamano d5f6b1d756 revision.c: the "log" family, except for "show", takes committish
Add a field to setup_revision_opt structure and allow these callers
to tell the setup_revisions command parsing machinery that short SHA1
it encounters are meant to name committish.

This step does not go all the way to connect the setup_revisions()
to sha1_name.c yet.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09 16:42:22 -07:00
Junio C Hamano 8e676e8ba5 revision.c: allow handle_revision_arg() to take other flags
The existing "cant_be_filename" that tells the function that the
caller knows the arg is not a path (hence it does not have to be
checked for absense of the file whose name matches it) is made into
a bit in the flag word.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09 16:42:22 -07:00
Junio C Hamano cd74e4733d sha1_name.c: introduce get_sha1_committish()
Many callers know that the user meant to name a committish by
syntactical positions where the object name appears.  Calling this
function allows the machinery to disambiguate shorter-than-unique
abbreviated object names between committish and others.

Note that this does NOT error out when the named object is not a
committish. It is merely to give a hint to the disambiguation
machinery.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09 16:42:22 -07:00
Junio C Hamano 33bd598c39 sha1_name.c: teach lookup context to get_sha1_with_context()
The function takes user input string and returns the object name
(binary SHA-1) with mode bits and path when the object was looked
up in a tree.

Additionally give hints to help disambiguation of abbreviated object
names when the caller knows what it is looking for.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09 16:42:22 -07:00
Junio C Hamano 249c8f4a16 sha1_name.c: get rid of get_sha1_with_mode()
There are only two callers, and they will benefit from being able to
pass disambiguation hints to underlying get_sha1_with_context() API
once it happens.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-03 10:24:11 -07:00
Junio C Hamano 331512f988 Merge branch 'jc/rev-list-simplify-merges-first-parent'
When "--simplify-merges/by-decoration" is given together with
"--first-parent" to "git log", the combination of these options
makes the simplification logic to use in-core commit objects that
haven't been examined for relevance, either producing incorrect
result or taking too long to produce any output.  Teach the
simplification logic to ignore commits that the first-parent
traversal logic ignored when both are in effect to work around the
issue.
2012-06-28 15:20:16 -07:00
Junio C Hamano 08080894b7 Merge branch 'mm/verify-filename-fix'
"git diff COPYING HEAD:COPYING" gave a nonsense error message that
claimed that the treeish HEAD did not have COPYING in it.
2012-06-28 15:19:32 -07:00
Matthieu Moy 023e37c377 verify_filename(): ask the caller to chose the kind of diagnosis
verify_filename() can be called in two different contexts. Either we
just tried to interpret a string as an object name, and it fails, so
we try looking for a working tree file (i.e. we finished looking at
revs that come earlier on the command line, and the next argument
must be a pathname), or we _know_ that we are looking for a
pathname, and shouldn't even try interpreting the string as an
object name.

For example, with this change, we get:

  $ git log COPYING HEAD:inexistant
  fatal: HEAD:inexistant: no such path in the working tree.
  Use '-- <path>...' to specify paths that do not exist locally.
  $ git log HEAD:inexistant
  fatal: Path 'inexistant' does not exist in 'HEAD'

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-18 15:21:42 -07:00
Junio C Hamano 6e513ba3a6 revision: ignore side parents while running simplify-merges
The simplify_merges() function needs to look at all history chain to
find the closest ancestor that is relevant after the simplification,
but after --first-parent traversal, side parents haven't been marked
for relevance (they are irrelevant by definition due to the nature
of first-parent-only traversal) nor culled from the parents list of
resulting commits.

We cannot simply remove these side parents from the parents list, as
the output phase still wants to see the parents.  Instead, teach
simplify_one() and its callees to ignore the later parents.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-13 14:04:33 -07:00