Commit graph

536 commits

Author SHA1 Message Date
Junio C Hamano 71957339da Merge branch 'jk/pending-keep-tag-name'
History traversal with "git log --source" that starts with an
annotated tag failed to report the tag as "source", due to an
old regression in the command line parser back in v2.2 days.

* jk/pending-keep-tag-name:
  revision.c: propagate tag names from pending array
2015-12-28 13:58:04 -08:00
Jeff King 728350b76a revision.c: propagate tag names from pending array
When we unwrap a tag to find its commit for a traversal, we
do not propagate the "name" field of the tag in the pending
array (i.e., the ref name the user gave us in the first
place) to the commit (instead, we use an empty string). This
means that "git log --source" will never show the tag-name
for commits we reach through it.

This was broken in 2073949 (traverse_commit_list: support
pending blobs/trees with paths, 2014-10-15). That commit
tried to be careful and avoid propagating the path
information for a tag (which would be nonsensical) to trees
and blobs. But it should not have cut off the "name" field,
which should carry forward to children.

Note that this does mean that the "name" field will carry
forward to blobs and trees, too. Whereas prior to 2073949,
we always gave them an empty string. This is the right thing
to do, but in practice no callers probably use it (since now
we have an explicit separate "path" field, which was the
point of 2073949).

We add tests here not only for the broken case, but also a
basic sanity test of "log --source" in general, which did
not have any coverage in the test suite.

Reported-by: Raymundo <gypark@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-17 10:47:56 -08:00
Junio C Hamano 782ca8c44e Merge branch 'sn/null-pointer-arith-in-mark-tree-uninteresting'
mark_tree_uninteresting() has code to handle the case where it gets
passed a NULL pointer in its 'tree' parameter, but the function had
'object = &tree->object' assignment before checking if tree is
NULL.  This gives a compiler an excuse to declare that tree will
never be NULL and apply a wrong optimization.  Avoid it.

* sn/null-pointer-arith-in-mark-tree-uninteresting:
  revision.c: fix possible null pointer arithmetic
2015-12-11 10:41:01 -08:00
Stefan Naewe a2678df335 revision.c: fix possible null pointer arithmetic
mark_tree_uninteresting() dereferences a tree pointer before
checking if the pointer is valid. Fix that by doing the check first.

Signed-off-by: Stefan Naewe <stefan.naewe@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-07 12:32:02 -08:00
brian m. carlson ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson f2fd0760f6 Convert struct object to object_id
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson 7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Junio C Hamano 0692a6c22c Merge branch 'rs/pop-commit'
Code simplification.

* rs/pop-commit:
  use pop_commit() for consuming the first entry of a struct commit_list
2015-10-30 13:07:03 -07:00
René Scharfe e510ab8988 use pop_commit() for consuming the first entry of a struct commit_list
Instead of open-coding the function pop_commit() just call it.  This
makes the intent clearer and reduces code size.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-26 14:06:46 -07:00
Jeff King 34fa79a6cd prefer memcpy to strcpy
When we already know the length of a string (e.g., because
we just malloc'd to fit it), it's nicer to use memcpy than
strcpy, as it makes it more obvious that we are not going to
overflow the buffer (because the size we pass matches the
size in the allocation).

This also eliminates calls to strcpy, which make auditing
the code base harder.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Junio C Hamano 5af77d1352 Merge branch 'jk/log-missing-default-HEAD' into maint
"git init empty && git -C empty log" said "bad default revision 'HEAD'",
which was found to be a bit confusing to new users.

* jk/log-missing-default-HEAD:
  log: diagnose empty HEAD more clearly
2015-09-03 19:18:04 -07:00
Junio C Hamano 699a0f3748 Merge branch 'jk/log-missing-default-HEAD'
"git init empty && git -C empty log" said "bad default revision 'HEAD'",
which was found to be a bit confusing to new users.

* jk/log-missing-default-HEAD:
  log: diagnose empty HEAD more clearly
2015-09-02 12:50:10 -07:00
Jeff King ce11360467 log: diagnose empty HEAD more clearly
If you init or clone an empty repository, the initial
message from running "git log" is not very friendly:

  $ git init
  Initialized empty Git repository in /home/peff/foo/.git/
  $ git log
  fatal: bad default revision 'HEAD'

Let's detect this situation and write a more friendly
message:

  $ git log
  fatal: your current branch 'master' does not have any commits yet

We also detect the case that 'HEAD' points to a broken ref;
this should be even less common, but is easy to see. Note
that we do not diagnose all possible cases. We rely on
resolve_ref, which means we do not get information about
complex cases. E.g., "--default master" would use dwim_ref
to find "refs/heads/master", but we notice only that
"master" does not exist. Similarly, a complex sha1
expression like "--default HEAD^2" will not resolve as a
ref.

But that's OK. We fall back to a generic error message in
those cases, and they are unlikely to be used anyway.
Catching an empty or broken "HEAD" improves the common case,
and the other cases are not regressed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-31 09:34:20 -07:00
Junio C Hamano 71cc60070f Merge branch 'ad/bisect-cleanup'
Code and documentation clean-up to "git bisect".

* ad/bisect-cleanup:
  bisect: don't mix option parsing and non-trivial code
  bisect: simplify the addition of new bisect terms
  bisect: replace hardcoded "bad|good" by variables
  Documentation/bisect: revise overall content
  Documentation/bisect: move getting help section to the end
  bisect: correction of typo
2015-08-12 14:09:53 -07:00
Antoine Delaite cb46d630ba bisect: simplify the addition of new bisect terms
We create a file BISECT_TERMS in the repository .git to be read during a
bisection. There's no user-interface yet, but "git bisect" works if terms
other than old/new or bad/good are set in .git/BISECT_TERMS. The
fonctions to be changed if we add new terms are quite few.

In git-bisect.sh:
	check_and_set_terms
	bisect_voc

Co-authored-by: Louis Stuber <stuberl@ensimag.grenoble-inp.fr>
Tweaked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Antoine Delaite <antoine.delaite@ensimag.grenoble-inp.fr>
Signed-off-by: Louis Stuber <stuberl@ensimag.grenoble-inp.fr>
Signed-off-by: Valentin Duperray <Valentin.Duperray@ensimag.imag.fr>
Signed-off-by: Franck Jonas <Franck.Jonas@ensimag.imag.fr>
Signed-off-by: Lucien Kong <Lucien.Kong@ensimag.imag.fr>
Signed-off-by: Thomas Nguy <Thomas.Nguy@ensimag.imag.fr>
Signed-off-by: Huynh Khoi Nguyen Nguyen <Huynh-Khoi-Nguyen.Nguyen@ensimag.imag.fr>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-03 11:42:41 -07:00
Junio C Hamano d939af12bd Merge branch 'jk/date-mode-format'
Teach "git log" and friends a new "--date=format:..." option to
format timestamps using system's strftime(3).

* jk/date-mode-format:
  strbuf: make strbuf_addftime more robust
  introduce "format" date-mode
  convert "enum date_mode" into a struct
  show-branch: use DATE_RELATIVE instead of magic number
2015-08-03 11:01:27 -07:00
Junio C Hamano fbdeabf1f0 Merge branch 'jk/still-interesting'
Code clean-up.

* jk/still-interesting:
  revision.c: remove unneeded check for NULL
2015-07-17 10:44:56 -07:00
Jeff King a5481a6c94 convert "enum date_mode" into a struct
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.

Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor.  However, the tricky case is where we use the
enum labels as constants, like:

  show_date(t, tz, DATE_NORMAL);

Ideally we could say:

  show_date(t, tz, &{ DATE_NORMAL });

but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:

  1. Manually add a "struct date_mode d = { DATE_NORMAL }"
     definition to each caller, and pass "&d". This makes
     the callers uglier, because they sometimes do not even
     have their own scope (e.g., they are inside a switch
     statement).

  2. Provide a pre-made global "date_normal" struct that can
     be passed by address. We'd also need "date_rfc2822",
     "date_iso8601", and so forth. But at least the ugliness
     is defined in one place.

  3. Provide a wrapper that generates the correct struct on
     the fly. The big downside is that we end up pointing to
     a single global, which makes our wrapper non-reentrant.
     But show_date is already not reentrant, so it does not
     matter.

This patch implements 3, along with a minor macro to keep
the size of the callers sane.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-29 11:39:07 -07:00
Stefan Beller ae40ebda9b revision.c: remove unneeded check for NULL
The function is called only from one place, which makes sure to have
`interesting_cache` not NULL.  Additionally the variable is a
dereferenced a few lines before unconditionally, which would have
resulted in a segmentation fault before hitting this check.

Signed-off-by: Stefan Beller <sbeller@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-29 09:54:18 -07:00
Junio C Hamano 43262d8d65 Merge branch 'jk/squelch-missing-link-warning-for-unreachable'
Recent "git prune" traverses young unreachable objects to safekeep
old objects in the reachability chain from them, which sometimes
caused error messages that are unnecessarily alarming.

* jk/squelch-missing-link-warning-for-unreachable:
  suppress errors on missing UNINTERESTING links
  silence broken link warnings with revs->ignore_missing_links
  add quieter versions of parse_{tree,commit}
2015-06-11 09:29:59 -07:00
Jeff King ce4e7b2ac3 suppress errors on missing UNINTERESTING links
When we are traversing commit parents along the
UNINTERESTING side of a revision walk, we do not care if
the parent turns out to be missing. That lets us limit
traversals using unreachable and possibly incomplete
sections of history. However, we do still print error
messages about the missing commits; this patch suppresses
the error, as well.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01 09:29:51 -07:00
Jeff King daf7d86783 silence broken link warnings with revs->ignore_missing_links
We set revs->ignore_missing_links to instruct the
revision-walking machinery that we know the history graph
may be incomplete. For example, we use it when walking
unreachable but recent objects; we want to add what we can,
but it's OK if the history is incomplete.

However, we still print error messages for the missing
objects, which can be confusing. This is not an error, but
just a normal situation when transitioning from a repository
last pruned by an older git (which can leave broken segments
of history) to a more recent one (where we try to preserve
whole reachable segments).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01 09:29:50 -07:00
Michael Haggerty a89caf4bd4 handle_one_reflog(): rewrite to take an object_id argument
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:35 -07:00
Michael Haggerty a217dcbd1e handle_one_ref(): rewrite to take an object_id argument
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:28 -07:00
Michael Haggerty 2b2a5be394 each_ref_fn: change to take an object_id parameter
Change typedef each_ref_fn to take a "const struct object_id *oid"
parameter instead of "const unsigned char *sha1".

To aid this transition, implement an adapter that can be used to wrap
old-style functions matching the old typedef, which is now called
"each_ref_sha1_fn"), and make such functions callable via the new
interface. This requires the old function and its cb_data to be
wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
used as the cb_data for an adapter function, each_ref_fn_adapter().

This is an enormous diff, but most of it consists of simple,
mechanical changes to the sites that call any of the "for_each_ref"
family of functions. Subsequent to this change, the call sites can be
rewritten one by one to use the new interface.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:27 -07:00
Junio C Hamano 84e55dcb34 Merge branch 'jk/still-interesting'
"git rev-list --objects $old --not --all" to see if everything that
is reachable from $old is already connected to the existing refs
was very inefficient.

* jk/still-interesting:
  limit_list: avoid quadratic behavior from still_interesting
2015-05-11 14:23:43 -07:00
Jeff King b6e8a3b540 limit_list: avoid quadratic behavior from still_interesting
When we are limiting a rev-list traversal due to
UNINTERESTING refs, we have to walk down the tips (both
interesting and uninteresting) to find where they intersect.
We keep a queue of commits to examine, pop commits off
the queue one by one, and potentially add their parents.  The
size of the queue will naturally fluctuate based on the
"width" of the history graph; i.e., the number of
simultaneous lines of development. But for the most part it
will stay in the same ballpark as the initial number of tips
we fed, shrinking over time (as we hit common ancestors of
the tips). So roughly speaking, if we start with `N` tips,
we'll spend much of the time with a queue around `N` items.

For each UNINTERESTING commit we pop, we call
still_interesting to check whether marking its parents as
UNINTERESTING has made the whole queue uninteresting (in
which case we can quit early).  Because the queue is stored
as a linked list, this is `O(N)`, where `N` is the number of
items in the queue. So processing a queue with `N` commits
marked UNINTERESTING (and one or more interesting commits)
will take `O(N^2)`.

If you feed a lot of positive tips, this isn't a problem.
They aren't UNINTERESTING, so they don't incur the
still_interesting check.  It also isn't a problem if you
traverse from an interesting tip to some UNINTERESTING
bases. We order the queue by recency, so the interesting
commits stay at the front of the queue as we walk down them.
The linear check can exit early as soon as it sees one
interesting commit left in the queue.

But if you want to know whether an older commit is reachable
from a set of newer tips, we end up processing in the
opposite direction: from the UNINTERESTING ones down to the
interesting one. This may happen when we call:

  git rev-list $commits --not --all

in check_everything_connected after a fetch. If we fetched
something much older than most of our refs, and if we have a
large number of refs, the traversal cost is dominated by the
quadratic behavior.

These commands simulate the connectivity check of such a
fetch, when you have `$n` distinct refs in the receiver:

    # positive ref is 100,000 commits deep
    git rev-list --all | head -100000 | tail -1 >input

    # huge number of more recent negative refs
    git rev-list --all | head -$n | sed s/^/^/ >>input

    time git rev-list --stdin <input

Here are timings for various `n` on the linux.git
repository. The `n=1` case provides a baseline for just
walking the commits, which lets us see the still_interesting
overhead. The times marked with `+` subtract that baseline
to show just the extra time growth due to the large number
of refs. The `x` numbers show the slowdown of the adjusted
time versus the prior trial.

       n  | before                 | after
    --------------------------------------------------------
        1 | 0.991s                 | 0.848s
    10000 | 1.120s (+0.129s)       | 0.885s (+0.037s)
    20000 | 1.451s (+0.460s, 3.5x) | 0.923s (+0.075s, 2.0x)
    40000 | 2.731s (+1.740s, 3.8x) | 0.994s (+0.146s, 1.9x)
    80000 | 8.235s (+7.244s, 4.2x) | 1.123s (+0.275s, 1.9x)

Each trial doubles `n`, so you can see the quadratic (`4x`)
behavior before this patch. Afterwards, we have a roughly
linear relationship.

The implementation is fairly straightforward. Whenever we do
the linear search, we cache the interesting commit we find,
and next time check it before doing another linear search.
If that commit is removed from the list or becomes
UNINTERESTING itself, then we fall back to the linear
search. This is very similar to the trick used by fce87ae
(Fix quadratic performance in rewrite_one., 2008-07-12).

I considered and rejected several possible alternatives:

  1. Keep a count of UNINTERESTING commits in the queue.
     This requires managing the count not only when removing
     an item from the queue, but also when marking an item
     as UNINTERESTING. That requires touching the other
     functions which mark commits, and would require knowing
     quickly which commits are in the queue (lookup in the
     queue is linear, so we would need an auxiliary
     structure or to also maintain an IN_QUEUE flag in each
     commit object).

  2. Keep a separate list of interesting commits. Drop items
     from it when they are dropped from the queue, or if
     they become UNINTERESTING. This again suffers from
     extra complexity to maintain the list, not to mention
     CPU and memory.

  3. Use a better data structure for the queue. This is
     something that could help the fix in fce87ae, because
     we order the queue by recency, and it is about
     inserting quickly in recency order. So a normal
     priority queue would help there. But here, we cannot
     disturb the order of the queue, which makes things
     harder. We really do need an auxiliary index to track
     the flag we care about, which is basically option (2)
     above.

The "cache" trick is simple, and the numbers above show that
it works well in practice. This is because the length of
time it takes to find an interesting commit is proportional
to the length of time it will remain cached (i.e., if we
have to walk a long way to find it, it also means we have to
pop a lot of elements in the queue until we get rid of it
and have to find another interesting commit).

The worst case is still quadratic, though. We could have `N`
uninteresting commits at the front of the queue, followed by
`N` interesting commits, where commit `i` has parent `i+N`.
When we pop commit `i`, we will notice that the parent of
the next commit, `i+1+N` is still interesting and cache it.
But then handling commit `i+1`, we will mark its parent
`i+1+N` uninteresting, and immediately invalidate our cache.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-17 15:22:05 -07:00
Junio C Hamano dbd04eba01 Merge branch 'dj/log-graph-with-no-walk'
"git log --graph --no-walk A B..." is a otcnflicting request that
asks nonsense; no-walk tells us show discrete points in the
history, while graph asks to draw connections between these
discrete points. Forbid the combination.

* dj/log-graph-with-no-walk:
  revision: forbid combining --graph and --no-walk
2015-03-25 12:54:22 -07:00
Junio C Hamano 257b204f25 Merge branch 'kd/rev-list-bisect-first-parent'
"git rev-list --bisect --first-parent" does not work (yet) and can
even cause SEGV; forbid it.  "git log --bisect --first-parent"
would not be useful until "git bisect --first-parent" materializes,
so it is also forbidden for now.

* kd/rev-list-bisect-first-parent:
  rev-list: refuse --first-parent combined with --bisect
2015-03-25 12:54:21 -07:00
Kevin Daudt f88851c637 rev-list: refuse --first-parent combined with --bisect
rev-list --bisect is used by git bisect, but never together with
--first-parent. Because rev-list --bisect together with --first-parent
is not handled currently, and even leads to segfaults, refuse to use
both options together.

Because this is not supported, it makes little sense to use git log
--bisect --first parent either, because refs/heads/bad is not limited to
the first parent chain.

Helped-by: Junio C. Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-19 15:26:21 -07:00
Dongcan Jiang 695985f483 revision: forbid combining --graph and --no-walk
Because "--graph" is about connected history while --no-walk is
about discrete points, it does not make sense to allow these two
options at the same time. [1]

This change makes a few calls to "show --graph" fail in t4052, but
asking to show one commit with graph is a nonsensical thing to do.
Thus, tests on "show --graph" in t4052 have been removed [2,3].
Same tests on "show" without --graph option have already been tested
in 4052.

3 testcases have been added to test this patch.

[1]: http://article.gmane.org/gmane.comp.version-control.git/216083
[2]: http://article.gmane.org/gmane.comp.version-control.git/264950
[3]: http://article.gmane.org/gmane.comp.version-control.git/265107

Helped-By: Eric Sunshine <sunshine@sunshineco.com>
Helped-By: René Scharfe <l.s.r@web.de>
Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-19 11:07:51 -07:00
Junio C Hamano c985aaf879 Merge branch 'jc/unused-symbols'
Mark file-local symbols as "static", and drop functions that nobody
uses.

* jc/unused-symbols:
  shallow.c: make check_shallow_file_for_update() static
  remote.c: make clear_cas_option() static
  urlmatch.c: make match_urls() static
  revision.c: make save_parents() and free_saved_parents() static
  line-log.c: make line_log_data_init() static
  pack-bitmap.c: make pack_bitmap_filename() static
  prompt.c: remove git_getpass() nobody uses
  http.c: make finish_active_slot() and handle_curl_result() static
2015-02-11 13:44:07 -08:00
Junio C Hamano 1ba6e860b9 Merge branch 'cj/log-invert-grep'
"git log --invert-grep --grep=WIP" will show only commits that do
not have the string "WIP" in their messages.

* cj/log-invert-grep:
  log: teach --invert-grep option
2015-02-11 13:42:39 -08:00
Junio C Hamano 0131c49096 revision.c: make save_parents() and free_saved_parents() static
No external callers exist.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-15 11:05:47 -08:00
Christoph Junghans 22dfa8a23d log: teach --invert-grep option
"git log --grep=<string>" shows only commits with messages that
match the given string, but sometimes it is useful to be able to
show only commits that do *not* have certain messages (e.g. "show
me ones that are not FIXUP commits").

Originally, we had the invert-grep flag in grep_opt, but because
"git grep --invert-grep" does not make sense except in conjunction
with "--files-with-matches", which is already covered by
"--files-without-matches", it was moved it to revisions structure.
To have the flag there expresses the function to the feature better.

When the newly inserted two tests run, the history would have commits
with messages "initial", "second", "third", "fourth", "fifth", "sixth"
and "Second", committed in this order.  The commits that does not match
either "th" or "Sec" is "second" and "initial". For the case insensitive
case only "initial" matches.

Signed-off-by: Christoph Junghans <ottxor@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-13 10:20:32 -08:00
Junio C Hamano 832258da96 Merge branch 'bc/fetch-thin-less-aggressive-in-normal-repository'
Earlier we made "rev-list --object-edge" more aggressively list the
objects at the edge commits, in order to reduce number of objects
fetched into a shallow repository, but the change affected cases
other than "fetching into a shallow repository" and made it
unusably slow (e.g. fetching into a normal repository should not
have to suffer the overhead from extra processing).  Limit it to a
more specific case by introducing --objects-edge-aggressive, a new
option to rev-list.

* bc/fetch-thin-less-aggressive-in-normal-repository:
  pack-objects: use --objects-edge-aggressive for shallow repos
  rev-list: add an option to mark fewer edges as uninteresting
  Documentation: add missing article in rev-list-options.txt
2015-01-12 11:38:57 -08:00
Junio C Hamano 098501527f Merge branch 'jc/merge-bases'
The get_merge_bases*() API was easy to misuse by careless
copy&paste coders, leaving object flags tainted in the commits that
needed to be traversed.

* jc/merge-bases:
  get_merge_bases(): always clean-up object flags
  bisect: clean flags after checking merge bases
2015-01-07 12:55:05 -08:00
brian m. carlson 1684c1b219 rev-list: add an option to mark fewer edges as uninteresting
In commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting - 2013-08-16), we marked an increasing number
of edges uninteresting.  This change, and the subsequent change to make
this conditional on --objects-edge, are used by --thin to make much
smaller packs for shallow clones.

Unfortunately, they cause a significant performance regression when
pushing non-shallow clones with lots of refs (23.322 seconds vs.
4.785 seconds with 22400 refs).  Add an option to git rev-list,
--objects-edge-aggressive, that preserves this more aggressive behavior,
while leaving --objects-edge to provide more performant behavior.
Preserve the current behavior for the moment by using the aggressive
option.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-29 09:57:55 -08:00
Junio C Hamano 2ce406ccb8 get_merge_bases(): always clean-up object flags
The callers of get_merge_bases() can choose to leave object flags
used during the merge-base traversal by passing cleanup=0 as a
parameter, but in practice a very few callers can afford to do so
(namely, "git merge-base"), as they need to compute merge base in
preparation for other processing of their own and they need to see
the object without contaminate flags.

Change the function signature of get_merge_bases_many() and
get_merge_bases() to drop the cleanup parameter, so that the
majority of the callers do not have to say ", 1" at the end.

Give a new get_merge_bases_many_dirty() API to support only a few
callers that know they do not need to spend cycles cleaning up the
object flags.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-30 12:51:10 -07:00
Ramsay Jones d7702be1e1 revision: remove definition of unused 'add_object' function
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:27:24 -07:00
Jeff King 4fe10219bc rev-list: add --indexed-objects option
There is currently no easy way to ask the revision traversal
machinery to include objects reachable from the index (e.g.,
blobs and trees that have not yet been committed). This
patch adds an option to do so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King 207394908e traverse_commit_list: support pending blobs/trees with paths
When we call traverse_commit_list, we may have trees and
blobs in the pending array. As we process these, we pass the
"name" field from the pending entry as the path of the
object within the tree (which then becomes the root path if
we recurse into subtrees).

When we set up the traversal in prepare_revision_walk,
though, the "name" field of any pending trees and blobs is
likely to be the ref at which we found the object. We would
not want to make this part of the path (e.g., doing so would
make "git rev-list --objects v2.6.11-tree" in linux.git show
paths like "v2.6.11-tree/Makefile", which is nonsensical).
Therefore prepare_revision_walk sets the name field of each
pending tree and blobs to the empty string.

However, this leaves no room for a caller who does know the
correct path of a pending object to propagate that
information to the revision walker. We can fix this by
making two related changes:

  1. Use the "path" field as the path instead of the "name"
     field in traverse_commit_list. If the path is not set,
     default to "" (which is what we always ended up with in
     the current code, because of prepare_revision_walk).

  2. In prepare_revision_walk, make a complete copy of the
     entry. This makes the path field available to the
     walker (if there is one), solving our problem.
     Leaving the name field intact is now OK, as we do not
     use it as a path due to point (1) above (and we can use
     it to make more meaningful error messages if we want).
     We also make the original "mode" field available to the
     walker, though it does not actually use it.

Note that we still re-add the pending objects and free the
old ones (so we may strdup the path and name only to free
the old ones). This could be made more efficient by simply
copying the object_array entries that we are keeping.
However, that would require more restructuring of the code,
and is not done here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:06:31 -07:00
Jeff King 718ccc9731 reachable: reuse revision.c "add all reflogs" code
We want to add all reflog entries as tips for finding
reachable objects. The revision machinery can already do
this (to support "rev-list --reflog"); we can reuse that
code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:38 -07:00
Jeff King 1da1e07c83 clean up name allocation in prepare_revision_walk
When we enter prepare_revision_walk, we have zero or more
entries in our "pending" array. We disconnect that array
from the rev_info, and then process each entry:

  1. If the entry is a commit and the --source option is in
     effect, we keep a pointer to the object name.

  2. Otherwise, we re-add the item to the pending list with
     a blank name.

We then throw away the old array by freeing the array
itself, but do not touch the "name" field of each entry. For
any items of type (2), we leak the memory associated with
the name. This commit fixes that by calling object_array_clear,
which handles the cleanup for us.

That breaks (1), though, because it depends on the memory
pointed to by the name to last forever. We can solve that by
making a copy of the name. This is slightly less efficient,
but it shouldn't matter in practice, as we do it only for
the tip commits of the traversal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:37 -07:00
René Scharfe 2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Junio C Hamano 1ebe6a825a Merge branch 'jk/name-decoration-alloc'
The API to allocate the structure to keep track of commit
decoration was cumbersome to use, inviting lazy code to
overallocate memory.

* jk/name-decoration-alloc:
  log-tree: use FLEX_ARRAY in name_decoration
  log-tree: make name_decoration hash static
  log-tree: make add_name_decoration a public function
2014-09-11 10:33:36 -07:00
Jeff King 2608c24940 log-tree: make name_decoration hash static
In the previous commit, we made add_name_decoration global
so that adders would not have to access the hash directly.
We now make the hash itself static so that callers _have_ to
add through our function, making sure that all additions go
through a single point.  To do this, we have to add one more
accessor function: a way to lookup entries in the hash.

Since the only caller doesn't actually look at the returned
value, but rather only asks whether there is a decoration or
not, we could provide only a boolean "has_name_decoration".
That would allow us to make "struct name_decoration" local
to log-tree, as well.

However, it's unlikely to cause any maintainability harm
making the actual data public, and this interface is more
flexible if we need to look at decorations from other parts
of the code in the future.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-26 10:34:26 -07:00
Jeff King ae18165fbb revision: drop useless string offset when parsing "--pretty"
Once upon a time, we parsed pretty options by looking for
"--pretty" at the start of the string, and then feeding the
rest (including an "=") to get_commit_format. Later, commit
48ded91 (log --pretty: do not accept bogus "--prettyshort",
2008-05-25) split this into a separate check for "--pretty"
versus "--pretty=".

However, when parsing "--pretty", we still passed "arg+8" to
get_commit_format. This is useless, since it will always
point to the NUL terminator at the end of the string. We can
simply pass NULL instead; both parameters are treated the
same by get_commit_format.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-30 12:30:02 -07:00
Junio C Hamano 5c18fde0d9 Merge branch 'jk/commit-buffer-length' into maint
A handful of code paths had to read the commit object more than
once when showing header fields that are usually not parsed.  The
internal data structure to keep track of the contents of the commit
object has been updated to reduce the need for this double-reading,
and to allow the caller find the length of the object.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-16 11:16:38 -07:00
Junio C Hamano 8061ae8b46 Merge branch 'jk/commit-buffer-length'
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths.  Use this to optimize the code paths to
validate GPG signatures in commit objects.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-02 12:53:02 -07:00