Commit graph

358 commits

Author SHA1 Message Date
Junio C Hamano 1fc83452c7 Merge branch 'rs/code-cleaning'
* rs/code-cleaning:
  fsck: simplify fsck_commit_buffer() by using commit_list_count()
  commit: use commit_list_append() instead of duplicating its code
  merge: simplify merge_trivial() by using commit_list_append()
  use strbuf_addch for adding single characters
  use strbuf_addbuf for adding strbufs
2014-07-16 11:33:09 -07:00
Jeff King 73f43f220f paint_down_to_common: use prio_queue
When we are traversing to find merge bases, we keep our
usual commit_list of commits to process, sorted by their
commit timestamp. As we add each parent to the list, we have
to spend "O(width of history)" to do the insertion, where
the width of history is the number of simultaneous lines of
development.

If we instead use a heap-based priority queue, we can do
these insertions in "O(log width)" time. This provides minor
speedups to merge-base calculations (timings in linux.git,
warm cache, best-of-five):

  [before]
  $ git merge-base HEAD v2.6.12
  real    0m3.251s
  user    0m3.148s
  sys     0m0.104s

  [after]
  $ git merge-base HEAD v2.6.12
  real    0m3.234s
  user    0m3.108s
  sys     0m0.128s

That's only an 0.5% speedup, but it does help protect us
against pathological cases.

While we are munging the "interesting" function, we also
take the opportunity to give it a more descriptive name, and
convert the return value to an int (we returned the first
interesting commit, but nobody ever looked at it).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-15 11:02:56 -07:00
Jeff King 8ff226a9d5 add object_as_type helper for casting objects
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:

  1. Look for an existing object struct. If we don't have
     one, allocate and return a new one.

  2. Double check that any object we have is the expected
     type (and complain and return NULL otherwise).

  3. Convert an object with type OBJ_NONE (from a prior
     call to lookup_unknown_object) to the expected type.

We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.

Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.

Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:

  - for lookup_{commit,tree,etc} functions, we are just
    pulling steps 2 and 3 into a function that does the same
    thing.

  - for the call in peel_object, we currently only do step 3
    (but we want to consolidate it with the others, as
    mentioned above). However, step 2 is a noop here, as the
    surrounding conditional makes sure we have OBJ_NONE
    (which we want to keep to avoid an extraneous call to
    sha1_object_info).

  - for the call in lookup_commit_reference_gently, we are
    currently doing step 2 but not step 3. However, step 3
    is a noop here. The object we got will have just come
    from deref_tag, which must have figured out the type for
    each object in order to know when to stop peeling.
    Therefore the type will never be OBJ_NONE.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:05 -07:00
Jeff King d36f51c13b move setting of object->type to alloc_* functions
The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:05 -07:00
René Scharfe cb979dbd8f commit: use commit_list_append() instead of duplicating its code
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-10 14:07:22 -07:00
Junio C Hamano 39177c7f18 Merge branch 'mg/verify-commit'
Add 'verify-commit' to be used in a way similar to 'verify-tag' is
used.  Further work on verifying the mergetags might be needed.

* mg/verify-commit:
  t7510: test verify-commit
  t7510: exit for loop with test result
  verify-commit: scriptable commit signature verification
  gpg-interface: provide access to the payload
  gpg-interface: provide clear helper for struct signature_check
2014-07-10 11:27:34 -07:00
Junio C Hamano e91ae32a01 Merge branch 'jk/skip-prefix'
* jk/skip-prefix:
  http-push: refactor parsing of remote object names
  imap-send: use skip_prefix instead of using magic numbers
  use skip_prefix to avoid repeated calculations
  git: avoid magic number with skip_prefix
  fetch-pack: refactor parsing in get_ack
  fast-import: refactor parsing of spaces
  stat_opt: check extra strlen call
  daemon: use skip_prefix to avoid magic numbers
  fast-import: use skip_prefix for parsing input
  use skip_prefix to avoid repeating strings
  use skip_prefix to avoid magic numbers
  transport-helper: avoid reading past end-of-string
  fast-import: fix read of uninitialized argv memory
  apply: use skip_prefix instead of raw addition
  refactor skip_prefix to return a boolean
  avoid using skip_prefix as a boolean
  daemon: mark some strings as const
  parse_diff_color_slot: drop ofs parameter
2014-07-09 11:33:28 -07:00
Christian Couder 063da62b02 commit: add for_each_mergetag()
In the same way as there is for_each_ref() to iterate on refs,
for_each_mergetag() allows the caller to iterate on the mergetags of
a given commit.  Use it to rewrite show_mergetag() used in "git log".

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 15:32:21 -07:00
Junio C Hamano 8061ae8b46 Merge branch 'jk/commit-buffer-length'
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths.  Use this to optimize the code paths to
validate GPG signatures in commit objects.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-02 12:53:02 -07:00
Michael J Gruber 71c214c840 gpg-interface: provide access to the payload
In contrast to tag signatures, commit signatures are put into the
header, that is between the other header parts and commit messages.

Provide access to the commit content sans the signature, which is the
payload that is actually signed. Commit signature verification does the
parsing anyways, and callers may wish to act on or display the commit
object sans the signature.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-23 15:50:30 -07:00
Jeff King cf4fff579e refactor skip_prefix to return a boolean
The skip_prefix() function returns a pointer to the content
past the prefix, or NULL if the prefix was not found. While
this is nice and simple, in practice it makes it hard to use
for two reasons:

  1. When you want to conditionally skip or keep the string
     as-is, you have to introduce a temporary variable.
     For example:

       tmp = skip_prefix(buf, "foo");
       if (tmp)
	       buf = tmp;

  2. It is verbose to check the outcome in a conditional, as
     you need extra parentheses to silence compiler
     warnings. For example:

       if ((cp = skip_prefix(buf, "foo"))
	       /* do something with cp */

Both of these make it harder to use for long if-chains, and
we tend to use starts_with() instead. However, the first line
of "do something" is often to then skip forward in buf past
the prefix, either using a magic constant or with an extra
strlen(3) (which is generally computed at compile time, but
means we are repeating ourselves).

This patch refactors skip_prefix() to return a simple boolean,
and to provide the pointer value as an out-parameter. If the
prefix is not found, the out-parameter is untouched. This
lets you write:

  if (skip_prefix(arg, "foo ", &arg))
	  do_foo(arg);
  else if (skip_prefix(arg, "bar ", &arg))
	  do_bar(arg);

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:43 -07:00
Jeff King 218aa3a616 reuse cached commit buffer when parsing signatures
When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).

However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:

  [baseline, no signatures]
  $ time git log >/dev/null
  real    0m4.902s
  user    0m4.784s
  sys     0m0.120s

  [before]
  $ time git log --show-signature >/dev/null
  real    0m14.735s
  user    0m9.964s
  sys     0m0.944s

  [after]
  $ time git log --show-signature >/dev/null
  real    0m9.981s
  user    0m5.260s
  sys     0m0.936s

Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.

An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:10:13 -07:00
Jeff King 8597ea3afe commit: record buffer length in cache
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:09:38 -07:00
Jeff King c1b3c71f4b commit: convert commit->buffer to a slab
This will make it easier to manage the buffer cache
independently of the "struct commit" objects. It also
shrinks "struct commit" by one pointer, which may be
helpful.

Unfortunately it does not reduce the max memory size of
something like "rev-list", because rev-list uses
get_cached_commit_buffer() to decide not to show each
commit's output (and due to the design of slab_at, accessing
the slab requires us to extend it, allocating exactly the
same number of buffer pointers we dropped from the commit
structs).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Jeff King ba41c1c93f use get_commit_buffer to avoid duplicate code
For both of these sites, we already do the "fallback to
read_sha1_file" trick. But we can shorten the code by just
using get_commit_buffer.

Note that the error cases are slightly different when
read_sha1_file fails. get_commit_buffer will die() if the
object cannot be loaded, or is a non-commit.

For get_sha1_oneline, this will almost certainly never
happen, as we will have just called parse_object (and if it
does, it's probably worth complaining about).

For record_author_date, the new behavior is probably better;
we notify the user of the error instead of silently ignoring
it. And because it's used only for sorting by author-date,
somebody examining a corrupt repo can fallback to the
regular traversal order.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Jeff King 152ff1cceb provide helpers to access the commit buffer
Many sites look at commit->buffer to get more detailed
information than what is in the parsed commit struct.
However, we sometimes drop commit->buffer to save memory,
in which case the caller would need to read the object
afresh. Some callers do this (leading to duplicated code),
and others do not (which opens the possibility of a segfault
if somebody else frees the buffer).

Let's provide a pair of helpers, "get" and "unuse", that let
callers easily get the buffer. They will use the cached
buffer when possible, and otherwise load from disk using
read_sha1_file.

Note that we also need to add a "get_cached" variant which
returns NULL when we do not have a cached buffer. At first
glance this seems to defeat the purpose of "get", which is
to always provide a return value. However, some log code
paths actually use the NULL-ness of commit->buffer as a
boolean flag to decide whether to try printing the
commit. At least for now, we want to continue supporting
that use.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Jeff King 66c2827ea4 provide a helper to set the commit buffer
Right now this is just a one-liner, but abstracting it will
make it easier to change later.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Jeff King 0fb370da9c provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.

Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it.  But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual).  In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.

Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:07:47 -07:00
Jeff King 969eba6341 commit: push commit_index update into alloc_commit_node
Whenever we create a commit object via lookup_commit, we
give it a unique index to be used with the commit-slab API.
The theory is that any "struct commit" we create would
follow this code path, so any such struct would get an
index. However, callers could use alloc_commit_node()
directly (and get multiple commits with index 0).

Let's push the indexing into alloc_commit_node so that it's
hard for callers to get it wrong.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12 10:29:42 -07:00
Jeff King 3ffefb54c0 commit_tree: take a pointer/len pair rather than a const strbuf
While strbufs are pretty common throughout our code, it is
more flexible for functions to take a pointer/len pair than
a strbuf. It's easy to turn a strbuf into such a pair (by
dereferencing its members), but less easy to go the other
way (you can strbuf_attach, but that has implications about
memory ownership).

This patch teaches commit_tree (and its associated callers
and sub-functions) to take such a pair for the commit
message rather than a strbuf.  This makes passing the buffer
around slightly more verbose, but means we can get rid of
some dangerous strbuf_attach calls in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12 10:29:41 -07:00
Brian Gesiak c4a7b0092b commit.c: rearrange xcalloc arguments
xcalloc() takes two arguments: the number of elements and their size.
reduce_heads() passes the arguments in reverse order, passing the
size of a commit*, followed by the number of commit* to be allocated.

Rearrange them so they are in the correct order.

Signed-off-by: Brian Gesiak <modocache@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27 14:00:43 -07:00
Junio C Hamano b407d40933 Merge branch 'nd/log-show-linear-break'
Attempts to show where a single-strand-of-pearls break in "git log"
output.

* nd/log-show-linear-break:
  log: add --show-linear-break to help see non-linear history
  object.h: centralize object flag allocation
2014-04-03 12:38:11 -07:00
Nguyễn Thái Ngọc Duy 208acbfb82 object.h: centralize object flag allocation
While the field "flags" is mainly used by the revision walker, it is
also used in many other places. Centralize the whole flag allocation to
one place for a better overview (and easier to move flags if we have
too).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-25 15:09:24 -07:00
Junio C Hamano fe9122a352 Merge branch 'dd/use-alloc-grow'
Replace open-coded reallocation with ALLOC_GROW() macro.

* dd/use-alloc-grow:
  sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
  read-cache.c: use ALLOC_GROW() in add_index_entry()
  builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
  attr.c: use ALLOC_GROW() in handle_attr_line()
  dir.c: use ALLOC_GROW() in create_simplify()
  reflog-walk.c: use ALLOC_GROW()
  replace_object.c: use ALLOC_GROW() in register_replace_object()
  patch-ids.c: use ALLOC_GROW() in add_commit()
  diffcore-rename.c: use ALLOC_GROW()
  diff.c: use ALLOC_GROW()
  commit.c: use ALLOC_GROW() in register_commit_graft()
  cache-tree.c: use ALLOC_GROW() in find_subtree()
  bundle.c: use ALLOC_GROW() in add_to_ref_list()
  builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
2014-03-18 13:50:21 -07:00
Junio C Hamano a8e1d711cc Merge branch 'dd/find-graft-with-sha1-pos'
Replace a hand-rolled binary search with a call to our generic
binary search helper function.

* dd/find-graft-with-sha1-pos:
  commit.c: use the generic "sha1_pos" function for lookup
2014-03-18 13:50:11 -07:00
Tanay Abhra 147972b1a6 commit.c: use skip_prefix() instead of starts_with()
In record_author_date() & parse_gpg_output(), the callers of
starts_with() not just want to know if the string starts with the
prefix, but also can benefit from knowing the string that follows
the prefix.

By using skip_prefix(), we can do both at the same time.

Helped-by: Max Horn <max@quendi.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-04 13:26:42 -08:00
Dmitry S. Dolzhenko d6e82b575a commit.c: use ALLOC_GROW() in register_commit_graft()
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 14:47:07 -08:00
Dmitry S. Dolzhenko 0a9136f327 commit.c: use the generic "sha1_pos" function for lookup
Refactor binary search in "commit_graft_pos" function: use
generic "sha1_pos" function.

Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27 10:55:27 -08:00
Junio C Hamano 4243b2d1e4 Merge branch 'vm/octopus-merge-bases-simplify'
* vm/octopus-merge-bases-simplify:
  get_octopus_merge_bases(): cleanup redundant variable
2014-01-10 10:33:45 -08:00
Junio C Hamano 3b9d69ec22 Merge branch 'js/lift-parent-count-limit'
There is no reason to have a hardcoded upper limit of the number of
parents for an octopus merge, created via the graft mechanism.

* js/lift-parent-count-limit:
  Remove the line length limit for graft files
2014-01-10 10:33:36 -08:00
Junio C Hamano 34aacf30a3 Merge branch 'nd/commit-tree-constness'
Code clean-up.

* nd/commit-tree-constness:
  commit.c: make "tree" a const pointer in commit_tree*()
2014-01-10 10:33:13 -08:00
Vasily Makarov 6bc76725ea get_octopus_merge_bases(): cleanup redundant variable
pptr is needless. Some related code got cleaned as well.

Signed-off-by: Vasily Makarov <einmalfel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-03 10:29:03 -08:00
Johannes Schindelin e228c1736f Remove the line length limit for graft files
Support for grafts predates Git's strbuf, and hence it is understandable
that there was a hard-coded line length limit of 1023 characters (which
was chosen a bit awkwardly, given that it is *exactly* one byte short of
aligning with the 41 bytes occupied by a commit name and the following
space or new-line character).

While regular commit histories hardly win comprehensibility in general
if they merge more than twenty-two branches in one go, it is not Git's
business to limit grafts in such a way.

In this particular developer's case, the use case that requires
substantially longer graft lines to be supported is the visualization of
the commits' order implied by their changes: commits are considered to
have an implicit relationship iff exchanging them in an interactive
rebase would result in merge conflicts.

Thusly implied branches tend to be very shallow in general, and the
resulting thicket of implied branches is usually very wide; It is
actually quite common that *most* of the commits in a topic branch have
not even one implied parent, so that a final merge commit has about as
many implied parents as there are commits in said branch.

[jc: squashed in tests by Jonathan]

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-27 16:46:25 -08:00
Nguyễn Thái Ngọc Duy 8785b7654b commit.c: make "tree" a const pointer in commit_tree*()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 11:55:18 -08:00
Junio C Hamano ad70448576 Merge branch 'cc/starts-n-ends-with'
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.

* cc/starts-n-ends-with:
  replace {pre,suf}fixcmp() with {starts,ends}_with()
  strbuf: introduce starts_with() and ends_with()
  builtin/remote: remove postfixcmp() and use suffixcmp() instead
  environment: normalize use of prefixcmp() by removing " != 0"
2013-12-17 12:02:44 -08:00
Christian Couder 5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Junio C Hamano 5bb62059f2 Merge branch 'jk/robustify-parse-commit'
* jk/robustify-parse-commit:
  checkout: do not die when leaving broken detached HEAD
  use parse_commit_or_die instead of custom message
  use parse_commit_or_die instead of segfaulting
  assume parse_commit checks for NULL commit
  assume parse_commit checks commit->object.parsed
  log_tree_diff: die when we fail to parse a commit
2013-12-05 12:54:01 -08:00
Jeff King 5e7d4d3e93 assume parse_commit checks for NULL commit
The parse_commit function will check whether it was passed a
NULL commit pointer, and if so, return an error. There is no
need for callers to check this separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:43:50 -07:00
Jeff King 7059dccc6c log_tree_diff: die when we fail to parse a commit
We currently call parse_commit and then assume we can
dereference the resulting "tree" struct field. If parsing
failed, however, that field is NULL and we end up
segfaulting.

Instead of a segfault, let's print an error message and die
a little more gracefully.

Note that this should never happen in practice, but may
happen in a corrupt repository.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:43:50 -07:00
Junio C Hamano 4ab4a6dfb4 Merge branch 'tr/log-full-diff-keep-true-parents'
Output from "git log --full-diff -- <pathspec>" looked strange,
because comparison was done with the previous ancestor that touched
the specified <pathspec>, causing the patches for paths outside the
pathspec to show more than the single commit has changed.

Tweak "git reflog -p" for the same reason using the same mechanism.

* tr/log-full-diff-keep-true-parents:
  log: use true parents for diff when walking reflogs
  log: use true parents for diff even when rewriting
2013-09-09 14:33:16 -07:00
Junio C Hamano c8abf659f7 Merge branch 'bc/commit-invalid-utf8'
* bc/commit-invalid-utf8:
  commit: typofix for xxFFF[EF] check
2013-08-05 10:11:04 -07:00
Junio C Hamano dc773a67e1 commit: typofix for xxFFF[EF] check
We wanted to catch all codepoints that ends with FFFE and FFFF,
not with 0FFFE and 0FFFF.

Noticed and corrected by Peter Krefting.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 09:53:38 -07:00
Thomas Rast 53d00b39ce log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.

This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it.  So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.

However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of

  git log --graph --stat ...
  git log --graph --full-diff --stat ...

(--graph internally kicks in parent simplification, much like
--parents).

To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect.  Then use the stored parents
instead of the simplified ones in the commit display code paths.  The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.

For ordinary commits it should be obvious that this is the right thing
to do.

Merge commits are a bit subtle.  Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.

So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent.  Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place).  This triggers --cc showing
these hunks spuriously.

Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 10:25:48 -07:00
Junio C Hamano 73f4c9a104 Merge branch 'bc/commit-invalid-utf8'
Logic to auto-detect character encodings in the commit log message
did not reject overlong and invalid UTF-8 characters.

* bc/commit-invalid-utf8:
  commit: reject non-characters
  commit: reject overlong UTF-8 sequences
  commit: reject invalid UTF-8 codepoints
2013-07-18 12:58:19 -07:00
Peter Krefting 81050ac604 commit: reject non-characters
Unicode clause D14 defines all characters U+nFFFE and U+nFFFF (where
0 <= n <= 10h) as well as the range U+FDD0..U+FDEF as non-characters,
reserved for internal use only.  Disallow these characters in commit
messages as they are normally not recommended for interchange.

Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:01:24 -07:00
brian m. carlson e82bd6cc70 commit: reject overlong UTF-8 sequences
The commit code accepts pseudo-UTF-8 sequences that encode a character with more
bytes than necessary.  Reject such sequences, since they are not valid UTF-8.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-04 21:48:45 -07:00
brian m. carlson 28110d4bfc commit: reject invalid UTF-8 codepoints
The commit code already contains code for validating UTF-8, but it does not
check for invalid values, such as guaranteed non-characters and surrogates.  Fix
this by explicitly checking for and rejecting such characters.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-04 21:45:18 -07:00
Jeff King 727377ff65 commit.c: make compare_commits_by_commit_date global
This helper function was introduced as a prio_queue
comparator to help topological sorting. However, other users
of prio_queue who want to replace commit_list_insert_by_date
will want to use it, too. So let's make it public.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-02 12:03:50 -07:00
Junio C Hamano 534f0e0996 Merge branch 'jc/topo-author-date-sort'
"git log" learned the "--author-date-order" option, with which the
output is topologically sorted and commits in parallel histories
are shown intermixed together based on the author timestamp.

* jc/topo-author-date-sort:
  t6003: add --author-date-order test
  topology tests: teach a helper to set author dates as well
  t6003: add --date-order test
  topology tests: teach a helper to take abbreviated timestamps
  t/lib-t6000: style fixes
  log: --author-date-order
  sort-in-topological-order: use prio-queue
  prio-queue: priority queue of pointers to structs
  toposort: rename "lifo" field
2013-07-01 12:41:23 -07:00
Junio C Hamano 55f34c8d39 Merge branch 'jk/commit-info-slab'
Allow adding custom information to commit objects in order to
represent unbound number of flag bits etc.

* jk/commit-info-slab:
  commit-slab: introduce a macro to define a slab for new type
  commit-slab: avoid large realloc
  commit: allow associating auxiliary info on-demand
2013-07-01 12:41:19 -07:00
Junio C Hamano 81c6b38b67 log: --author-date-order
Sometimes people would want to view the commits in parallel
histories in the order of author dates, not committer dates.

Teach "topo-order" sort machinery to do so, using a commit-info slab
to record the author dates of each commit, and prio-queue to sort
them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11 15:15:21 -07:00
Junio C Hamano da24b1044f sort-in-topological-order: use prio-queue
Use the prio-queue data structure to implement a priority queue of
commits sorted by committer date, when handling --date-order.  The
structure can also be used as a simple LIFO stack, which is a good
match for --topo-order processing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11 15:15:21 -07:00
Junio C Hamano 08f704f294 toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are.  When
traversing a forked history like this with "git log C E":

    A----B----C
     \
      D----E

we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.

In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.

Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output).  The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour.  We start the traversal by knowing two
commits, C and E.  While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected.  By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.

When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.

The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.

Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code.  The mechanical replacement rule is:

  "lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
  "lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11 15:15:21 -07:00
Junio C Hamano a84b794ad0 commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab.  Update the "indegree" topological sort facility using it.

To associate 32 flag bits with each commit, you can write:

	define_commit_slab(flag32, uint32);

to declare "struct flag32" type, define an instance of it with

	struct flag32 flags;

and initialize it by calling

	init_flag32(&flags);

After that, a call to flag32_at() function

	uint32 *fp = flag32_at(&flags, commit);

will return a pointer pointing at a uint32 for that commit.  Once
you are done with these flags, clean them up with

	clear_flag32(&flags);

Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.

Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit.  Saying

	typedef uint32 bits320[5];
	define_commit_slab(flagbits, bits320);

at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.

The code can declare a commit slab "struct flagbits" like this
instead:

	define_commit_slab(flagbits, unsigned char);
	struct flagbits flags;

and initialize it by:

	nrefs = ... count number of refs ...
	init_flagbits_with_stride(&flags, (nrefs + 7) / 8);

so that

	unsigned char *fp = flagbits_at(&flags, commit);

will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 10:02:12 -07:00
Junio C Hamano 66eb375d3d commit-slab: avoid large realloc
Instead of using a single "slab" and keep reallocating it as we find
that we need to deal with commits with larger values of commit->index,
make a "slab" an array of many "slab_piece"s. Each access may need
two levels of indirections, but we only need to reallocate the first
level array of pointers when we have to grow the table this way.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-13 22:15:42 -07:00
Jeff King 96c4f4a370 commit: allow associating auxiliary info on-demand
The "indegree" field in the commit object is only used while sorting
a list of commits in topological order, and wasting memory otherwise.

We would prefer to shrink the size of individual commit objects,
which we may have to hold thousands of in-core. We could eject
"indegree" field out from the commit object and represent it as a
dynamic table based on the decoration infrastructure, but the
decoration is meant for sparse annotation and is not a good match.

Instead, let's try a different approach.

 - Assign an integer (commit->index) to each commit we keep in-core
   (reuse the space of "indegree" field for it);

 - When running the topological sort, allocate an array of integers
   in bulk (called "slab"), use the commit->index as an index into
   this array, and store the "indegree" information there.

This does _not_ reduce the memory footprint of a commit object, but
the commit->index can be used as the index to dynamically associate
commits with other kinds of information as needed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-13 21:50:54 -07:00
Junio C Hamano caa7d79f1f Sync with 'maint'
* maint:
  Correct common spelling mistakes in comments and tests
  kwset: fix spelling in comments
  precompose-utf8: fix spelling of "want" in error message
  compat/nedmalloc: fix spelling in comments
  compat/regex: fix spelling and grammar in comments
  obstack: fix spelling of similar
  contrib/subtree: fix spelling of accidentally
  git-remote-mediawiki: spelling fixes
  doc: various spelling fixes
  fast-export: fix argument name in error messages
  Documentation: distinguish between ref and offset deltas in pack-format
  i18n: make the translation of -u advice in one go
2013-04-12 13:54:01 -07:00
Stefano Lattarini 41ccfdd9c9 Correct common spelling mistakes in comments and tests
Most of these were found using Lucas De Marchi's codespell tool.

Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-12 13:38:40 -07:00
Sebastian Götte eb307ae7bb merge/pull Check for untrusted good GPG signatures
When --verify-signatures is specified, abort the merge in case a good
GPG signature from an untrusted key is encountered.

Signed-off-by: Sebastian Götte <jaseg@physik-pool.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-31 22:38:49 -07:00
Sebastian Götte f8aae8d0ef commit.c/GPG signature verification: Also look at the first GPG status line
Signed-off-by: Sebastian Götte <jaseg@physik-pool.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-31 19:16:15 -07:00
Sebastian Götte ffb6d7d5c9 Move commit GPG signature verification to commit.c
Signed-off-by: Sebastian Götte <jaseg@physik-pool.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-31 19:15:11 -07:00
Junio C Hamano 557899ff6b commit.c: use clear_commit_marks_many() in in_merge_bases_many()
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-05 13:39:46 -08:00
Junio C Hamano 4c4b27e8ce commit.c: add in_merge_bases_many()
Similar to in_merge_bases(commit, other) that returns true when
commit is an ancestor (i.e. in the merge bases between the two) of
the other commit, in_merge_bases_many(commit, n_other, other[])
checks if commit is an ancestor of any of the other[] commits.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-05 13:39:46 -08:00
Junio C Hamano e895cb5135 commit.c: add clear_commit_marks_many()
clear_commit_marks(struct commit *, unsigned) only can clear flag
bits starting from a single commit; introduce an API to allow
feeding an array of commits, so that flag bits can be cleared from
commits reachable from any of them with a single traversal.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-05 13:39:45 -08:00
Nguyễn Thái Ngọc Duy efc7df454e Move print_commit_list to libgit.a
This is used by bisect.c, part of libgit.a while it stays in
builtin/rev-list.c. Move it to commit.c so that we won't get undefined
reference if a program that uses libgit.a happens to pull it in.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
2012-10-29 03:08:30 -04:00
Junio C Hamano 683a820d51 Merge branch 'jc/merge-bases-paint-fix'
"git fmt-merge-msg" (an internal helper reduce_heads() it uses) had
a severe performance regression; an empty "git pull" took forever to
finish as the result.

* jc/merge-bases-paint-fix:
  paint_down_to_common(): parse commit before relying on its timestamp
2012-10-08 11:42:15 -07:00
Junio C Hamano d866924a08 paint_down_to_common(): parse commit before relying on its timestamp
When refactoring the merge-base computation to reduce the pairwise
O(n*(n-1)) traversals to parallel O(n) traversals, the code forgot
that timestamp based heuristics needs each commit to have been
parsed.  This caused an empty "git pull" to spend cycles, traversing
the history all the way down to 0 (because an unparsed commit object
has 0 timestamp, and any other commit object with positive timestamp
will be processed for its parents, all getting parsed), only to come
up with a merge message to be used.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-04 15:49:39 -07:00
Junio C Hamano 82a75299fa commit.c: mark a file-scope private symbol as static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-15 22:58:20 -07:00
Junio C Hamano 34f5130af8 Merge branch 'jc/merge-bases'
Optimise the "merge-base" computation a bit, and also update its
users that do not need the full merge-base information to call a
cheaper subset.

* jc/merge-bases:
  reduce_heads(): reimplement on top of remove_redundant()
  merge-base: "--is-ancestor A B"
  get_merge_bases_many(): walk from many tips in parallel
  in_merge_bases(): use paint_down_to_common()
  merge_bases_many(): split out the logic to paint history
  in_merge_bases(): omit unnecessary redundant common ancestor reduction
  http-push: use in_merge_bases() for fast-forward check
  receive-pack: use in_merge_bases() for fast-forward check
  in_merge_bases(): support only one "other" commit
2012-09-11 11:36:05 -07:00
Junio C Hamano ae80b5a892 Merge branch 'lt/commit-tree-guess-utf-8'
Teach "git commit" and "git commit-tree" the "we are told to use
utf-8 in log message, but this does not look like utf-8---attempt to
pass it through convert-from-latin1-to-utf8 and see if it makes
sense" heuristics "git mailinfo" already uses.

* lt/commit-tree-guess-utf-8:
  commit/commit-tree: correct latin1 to utf-8
2012-09-07 11:08:38 -07:00
Junio C Hamano f37d3c7552 reduce_heads(): reimplement on top of remove_redundant()
This is used by "git merge" and "git merge-base --independent" but
used to use a similar N*(N-1) traversals to reject commits that are
ancestors of other commits.

Reimplement it on top of remove_redundant().  Note that the callers
of this function are allowed to pass the same commit more than once,
but remove_redundant() is designed to be fed each commit only once.
The function removes duplicates before calling remove_redundant().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-31 11:45:36 -07:00
Junio C Hamano 94f0ced0d0 get_merge_bases_many(): walk from many tips in parallel
The get_merge_bases_many() function reduces the result returned by
the merge_bases_many() function, which is a set of possible merge
bases, by excluding commits that can be reached from other commits.
We used to do N*(N-1) traversals for this, but we can check if one
commit reaches which other (N-1) commits by a single traversal, and
repeat it for all the candidates to find the answer.

Introduce remove_redundant() helper function to do this painting; we
should be able to use it to reimplement reduce_heads() as well.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-30 17:25:57 -07:00
Junio C Hamano 6440fdbab4 in_merge_bases(): use paint_down_to_common()
With paint_down_to_common(), we can tell if "commit" is reachable
from "reference" by simply looking at its object flag, instead of
iterating over the merge bases.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-30 17:25:57 -07:00
Junio C Hamano da1f515641 merge_bases_many(): split out the logic to paint history
Introduce a new helper function paint_down_to_common() that takes
the same parameters as merge_bases_many(), but without the first
optimization of not painting anything when "one" is one of the
"twos" (or vice versa), and the last clean-up of removing the common
ancestor that is known to be an ancestor of another common one.

This way, the caller of the new function could tell if "one" is
reachable from any of the "twos" by simply looking at the flag bits
of "one".  If (and only if) it is painted in PARENT2, it is
reachable from one of the "twos".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-30 17:25:57 -07:00
Thomas Rast b0f9e9eeef in_merge_bases(): omit unnecessary redundant common ancestor reduction
The function get_merge_bases() needs to postprocess the result from
merge_bases_many() in order to make sure none of the commit is a
true ancestor of another commit, which is expensive.  However, when
checking if a commit is an ancestor of another commit, we only need
to see if the commit is a common ancestor between the two, and do
not have to care if other common ancestors merge_bases_many() finds
are true merge bases or an ancestor of another merge base.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-28 08:37:07 -07:00
Junio C Hamano a20efee9cf in_merge_bases(): support only one "other" commit
In early days of its life, I planned to make it possible to compute
"is a commit contained in all of these other commits?" with this
function, but it turned out that no caller needed it.

Just make it take two commit objects and add a comment to say what
these two functions do.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27 18:36:39 -07:00
Linus Torvalds 08a94a145c commit/commit-tree: correct latin1 to utf-8
When a line in the message is not a valid utf-8, "git mailinfo"
attempts to convert it to utf-8 assuming the input is latin1 (and
punt if it does not convert cleanly).  Using the same heuristics in
"git commit" and "git commit-tree" lets the editor output be in
latin1 to make the overall system more consistent.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-21 16:10:53 -07:00
Junio C Hamano 0958a24d73 Merge branch 'jc/sha1-name-more'
Teaches the object name parser things like a "git describe" output
is always a commit object, "A" in "git log A" must be a committish,
and "A" and "B" in "git log A...B" both must be committish, etc., to
prolong the lifetime of abbreviated object names.

* jc/sha1-name-more: (27 commits)
  t1512: match the "other" object names
  t1512: ignore whitespaces in wc -l output
  rev-parse --disambiguate=<prefix>
  rev-parse: A and B in "rev-parse A..B" refer to committish
  reset: the command takes committish
  commit-tree: the command wants a tree and commits
  apply: --build-fake-ancestor expects blobs
  sha1_name.c: add support for disambiguating other types
  revision.c: the "log" family, except for "show", takes committish
  revision.c: allow handle_revision_arg() to take other flags
  sha1_name.c: introduce get_sha1_committish()
  sha1_name.c: teach lookup context to get_sha1_with_context()
  sha1_name.c: many short names can only be committish
  sha1_name.c: get_sha1_1() takes lookup flags
  sha1_name.c: get_describe_name() by definition groks only commits
  sha1_name.c: teach get_short_sha1() a commit-only option
  sha1_name.c: allow get_short_sha1() to take other flags
  get_sha1(): fix error status regression
  sha1_name.c: restructure disambiguation of short names
  sha1_name.c: correct misnamed "canonical" and "res"
  ...
2012-07-22 12:55:07 -07:00
Junio C Hamano cd74e4733d sha1_name.c: introduce get_sha1_committish()
Many callers know that the user meant to name a committish by
syntactical positions where the object name appears.  Calling this
function allows the machinery to disambiguate shorter-than-unique
abbreviated object names between committish and others.

Note that this does NOT error out when the named object is not a
committish. It is merely to give a hint to the disambiguation
machinery.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09 16:42:22 -07:00
Jeff King f9bc573fda ident: rename IDENT_ERROR_ON_NO_NAME to IDENT_STRICT
Callers who ask for ERROR_ON_NO_NAME are not so much
concerned that the name will be blank (because, after all,
we will fall back to using the username), but rather it is a
check to make sure that low-quality identities do not end up
in things like commit messages or emails (whereas it is OK
for them to end up in things like reflogs).

When future commits add more quality checks on the identity,
each of these callers would want to use those checks, too.
Rather than modify each of them later to add a new flag,
let's refactor the flag.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-24 17:16:41 -07:00
René Scharfe a81a7fbc1a commit: remove commit_list_reverse()
The function commit_list_reverse() is not used anymore; delete it.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-25 14:51:21 -07:00
René Scharfe 89b5f1d9c5 sequencer: export commit_list_append()
This function can be used in other parts of git.  Give it a new home
in commit.c.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-25 14:51:17 -07:00
Junio C Hamano ba8e6326f1 Merge branch 'rs/commit-list-sort-in-batch'
Setting up a revision traversal with many starting points was inefficient
as these were placed in a date-order priority queue one-by-one.

By René Scharfe (3) and Junio C Hamano (1)
* rs/commit-list-sort-in-batch:
  mergesort: rename it to llist_mergesort()
  revision: insert unsorted, then sort in prepare_revision_walk()
  commit: use mergesort() in commit_list_sort_by_date()
  add mergesort() for linked lists
2012-04-23 12:52:55 -07:00
Junio C Hamano 7365c95d2d mergesort: rename it to llist_mergesort()
Even though the function is generic enough, <anything>sort() inherits
connotations from the standard function qsort() that sorts an array.
Rename it to llist_mergesort() and describe the external interface in
its header file.

This incidentally avoids name clashes with mergesort() some platforms
declare in, and contaminate user namespace with, their <stdlib.h>.

Reported-by: Brian Gernhardt
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-17 11:07:01 -07:00
René Scharfe fbc08ea177 revision: insert unsorted, then sort in prepare_revision_walk()
Speed up prepare_revision_walk() by adding commits without sorting
to the commit_list and at the end sort the list in one go.  Thanks
to mergesort() working behind the scenes, this is a lot faster for
large numbers of commits than the current insert sort.

Also introduce and use commit_list_reverse(), to keep the ordering
of commits sharing the same commit date unchanged.  That's because
commit_list_insert_by_date() sorts commits with descending date,
but adds later entries with the same date entries last, while
commit_list_insert() always inserts entries at the top.  The
following commit_list_sort_by_date() keeps the order of entries
sharing the same date.

Jeff's test case, in a repo with lots of refs, was to run:

  # make a new commit on top of HEAD, but not yet referenced
  sha1=`git commit-tree HEAD^{tree} -p HEAD </dev/null`

  # now do the same "connected" test that receive-pack would do
  git rev-list --objects $sha1 --not --all

With a git.git with a ref for each revision, master needs (best of
five):

	real	0m2.210s
	user	0m2.188s
	sys	0m0.016s

And with this patch:

	real	0m0.480s
	user	0m0.456s
	sys	0m0.020s

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-11 08:50:54 -07:00
René Scharfe 46905893b2 commit: use mergesort() in commit_list_sort_by_date()
Replace the insertion sort in commit_list_sort_by_date() with a
call to the generic mergesort function.  This sets the stage for
using commit_list_sort_by_date() for larger lists, as shown in
the next patch.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-11 08:50:54 -07:00
Junio C Hamano 5a304dd303 Merge branch 'nd/index-pack-no-recurse'
* nd/index-pack-no-recurse:
  index-pack: eliminate unlimited recursion in get_base_data()
  index-pack: eliminate recursion in find_unresolved_deltas
  Eliminate recursion in setting/clearing marks in commit list
2012-01-29 13:18:56 -08:00
Nguyễn Thái Ngọc Duy 941ba8db57 Eliminate recursion in setting/clearing marks in commit list
Recursion in a DAG is generally a bad idea because it could be very
deep. Be defensive and avoid recursion in mark_parents_uninteresting()
and clear_commit_marks().

mark_parents_uninteresting() learns a trick from clear_commit_marks()
to avoid malloc() in (dominant) single-parent case.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-16 14:27:24 -08:00
Junio C Hamano 5de89d3abf Merge branch 'jc/show-sig'
* jc/show-sig:
  log --show-signature: reword the common two-head merge case
  log-tree: show mergetag in log --show-signature output
  log-tree.c: small refactor in show_signature()
  commit --amend -S: strip existing gpgsig headers
  verify_signed_buffer: fix stale comment
  gpg-interface: allow use of a custom GPG binary
  pretty: %G[?GS] placeholders
  test "commit -S" and "log --show-signature"
  log: --show-signature
  commit: teach --gpg-sign option

Conflicts:
	builtin/commit-tree.c
	builtin/commit.c
	builtin/merge.c
	notes-cache.c
	pretty.c
2012-01-06 12:44:07 -08:00
Junio C Hamano c871a1d17b commit --amend -S: strip existing gpgsig headers
Any existing commit signature was made against the contents of the old
commit, including its committer date that is about to change, and will
become invalid by amending it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-05 13:02:26 -08:00
Junio C Hamano f35ccd9be2 Merge branch 'nd/war-on-nul-in-commit'
* nd/war-on-nul-in-commit:
  commit_tree(): refuse commit messages that contain NULs
  Convert commit_tree() to take strbuf as message
  merge: abort if fails to commit

Conflicts:
	builtin/commit.c
	commit.c
	commit.h
2011-12-22 11:27:26 -08:00
Nguyễn Thái Ngọc Duy 37576c1443 commit_tree(): refuse commit messages that contain NULs
Current implementation sees NUL as terminator. If users give a message
with NUL byte in it (e.g. editor set to save as UTF-16), the new commit
message will have NULs. However following operations (displaying or
amending a commit for example) will not keep anything after the first NUL.

Stop user right when they do this. If NUL is added by mistake, they have
their chance to fix. Otherwise, log messages will no longer be text "git
log" and friends would grok.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-15 11:35:10 -08:00
Nguyễn Thái Ngọc Duy 13f8b72d8c Convert commit_tree() to take strbuf as message
There wan't a way for commit_tree() to notice if the message the caller
prepared contained a NUL byte, as it did not take the length of the
message as a parameter. Use a pointer to a strbuf instead, so that we can
either choose to allow low-level plumbing commands to make commits that
contain NUL byte in its message, or forbid NUL everywhere by adding the
check in commit_tree(), in later patches.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-15 10:46:42 -08:00
Junio C Hamano 0c37f1fce6 log: --show-signature
This teaches the "log" family of commands to pass the GPG signature in the
commit objects to "gpg --verify" via the verify_signed_buffer() interface
used to verify signed tag objects. E.g.

    $ git show --show-signature -s HEAD

shows GPG output in the header part of the output.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-12 22:27:38 -08:00
Junio C Hamano ba3c69a9ee commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.

    $ git commit --gpg-sign -m foo
    You need a passphrase to unlock the secret key for
    user: "Junio C Hamano <gitster@pobox.com>"
    4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)

    [master 8457d13] foo
     1 files changed, 1 insertions(+), 0 deletions(-)

The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:

 - The signature won't clutter output from "git log" and friends if it is
   in the extra header. If we place it at the end of the log message, we
   would need to teach "git log" and friends to strip the signature block
   with an option.

 - Teaching new versions of "git log" and "gitk" to optionally verify and
   show signatures is cleaner if we structurally know where the signature
   block is (instead of scanning in the commit log message).

 - The signature needs to be stripped upon various commit rewriting
   operations, e.g. rebase, filter-branch, etc. They all already ignore
   unknown headers, but if we place signature in the log message, all of
   these tools (and third-party tools) also need to learn how a signature
   block would look like.

 - When we added the optional encoding header, all the tools (both in tree
   and third-party) that acts on the raw commit object should have been
   fixed to ignore headers they do not understand, so it is not like that
   new header would be more likely to break than extra text in the commit.

A commit made with the above sample sequence would look like this:

    $ git cat-file commit HEAD
    tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
    parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
    author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
    committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
    gpgsig -----BEGIN PGP SIGNATURE-----
     Version: GnuPG v1.4.10 (GNU/Linux)

     iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
     ...
     =dt98
     -----END PGP SIGNATURE-----

    foo

but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-12 22:27:37 -08:00
Junio C Hamano ed7a42a075 commit: teach --amend to carry forward extra headers
After running "git pull $there for-linus" to merge a signed tag, the
integrator may need to amend the resulting merge commit to fix typoes
in it. Teach --amend option to read the existing extra headers, and
carry them forward.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-09 22:27:11 -08:00
Junio C Hamano 5231c633f2 commit: copy merged signed tags to headers of merge commit
Now MERGE_HEAD records the tag objects without peeling, we could record
the result of manual conflict resolution via "git commit" without losing
the tag information. Introduce a new "mergetag" multi-line header field to
the commit object, and use it to store the entire contents of each signed
tag merged.

A commit header that has a multi-line payload begins with the header tag
(e.g. "mergetag" in this case), SP, the first line of payload, LF, and all
the remaining lines have a SP inserted at the beginning.

In hindsight, it would have been better to make "merge --continue" as the
way to continue from such an interrupted merge, not "commit", but this is
a backward compatibility baggage we would need to carry around for now.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-09 10:28:04 -08:00
Junio C Hamano ae8e4c9ce1 merge: make usage of commit->util more extensible
The merge-recursive code uses the commit->util field directly to annotate
the commit objects given from the command line, i.e. the remote heads to
be merged, with a single string to be used to describe it in its trace
messages and conflict markers.

Correct this short-signtedness by redefining the field to be a pointer to
a structure "struct merge_remote_desc" that later enhancements can add
more information. Store the original objects we were told to merge in a
field "obj" in this struct, so that we can recover the tag we were told to
merge.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-08 10:36:53 -08:00
Junio C Hamano 0941d60545 Merge branch 'rs/pending'
* rs/pending:
  commit: factor out clear_commit_marks_for_object_array
  checkout: use leak_pending flag
  bundle: use leak_pending flag
  bisect: use leak_pending flag
  revision: add leak_pending flag
  checkout: use add_pending_{object,sha1} in orphan check
  revision: factor out add_pending_sha1
  checkout: check for "Previous HEAD" notice in t2020

Conflicts:
	builtin/checkout.c
	revision.c
2011-10-13 19:03:22 -07:00
Junio C Hamano 0fd8cb3fec Merge branch 'nd/maint-autofix-tag-in-head'
* nd/maint-autofix-tag-in-head:
  Accept tags in HEAD or MERGE_HEAD
  merge: remove global variable head[]
  merge: use return value of resolve_ref() to determine if HEAD is invalid
  merge: keep stash[] a local variable

Conflicts:
	builtin/merge.c
2011-10-13 19:03:19 -07:00