Commit graph

311 commits

Author SHA1 Message Date
Junio C Hamano e828d33316 Merge branch 'jk/no-looking-at-dotgit-outside-repo'
A small code cleanup.

* jk/no-looking-at-dotgit-outside-repo:
  sha1_name: make wraparound of the index into ring-buffer explicit
2016-11-01 12:58:49 -07:00
René Scharfe 3e98919a18 sha1_name: make wraparound of the index into ring-buffer explicit
Overflow is defined for unsigned integers, but not for signed ones.
Wrap around explicitly for the new ring-buffer in find_unique_abbrev()
as we did in bb84735c for the ones in sha1_to_hex() and get_pathname(),
thus avoiding signed overflows and getting rid of the magic number 3.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-01 10:56:39 -07:00
Junio C Hamano 0d9c527d59 Merge branch 'jk/no-looking-at-dotgit-outside-repo'
Update "git diff --no-index" codepath not to try to peek into .git/
directory that happens to be under the current directory, when we
know we are operating outside any repository.

* jk/no-looking-at-dotgit-outside-repo:
  diff: handle sha1 abbreviations outside of repository
  diff_aligned_abbrev: use "struct oid"
  diff_unique_abbrev: rename to diff_aligned_abbrev
  find_unique_abbrev: use 4-buffer ring
  test-*-cache-tree: setup git dir
  read info/{attributes,exclude} only when in repository
2016-10-27 14:58:48 -07:00
Junio C Hamano d7ae013a31 Merge branch 'jk/abbrev-auto'
Updates the way approximate count of total objects is computed
while attempting to come up with a unique abbreviated object name,
which in turn needs to estimate how many hexdigits are necessary to
ensure uniqueness.

* jk/abbrev-auto:
  find_unique_abbrev: move logic out of get_short_sha1()
2016-10-27 14:58:47 -07:00
Junio C Hamano 580d820ece Merge branch 'lt/abbrev-auto'
Allow the default abbreviation length, which has historically been
7, to scale as the repository grows.  The logic suggests to use 12
hexdigits for the Linux kernel, and 9 to 10 for Git itself.

* lt/abbrev-auto:
  abbrev: auto size the default abbreviation
  abbrev: prepare for new world order
  abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
2016-10-27 14:58:47 -07:00
Jeff King ef2ed5013c find_unique_abbrev: use 4-buffer ring
Some code paths want to format multiple abbreviated sha1s in
the same output line. Because we use a single static buffer
for our return value, they have to either break their output
into several calls or allocate their own arrays and use
find_unique_abbrev_r().

Intead, let's mimic sha1_to_hex() and use a ring of several
buffers, so that the return value stays valid through
multiple calls. This shortens some of the callers, and makes
it harder to for them to make a silly mistake.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-26 13:30:51 -07:00
Junio C Hamano dec040192f Merge branch 'jk/alt-odb-cleanup'
Codepaths involved in interacting alternate object store have
been cleaned up.

* jk/alt-odb-cleanup:
  alternates: use fspathcmp to detect duplicates
  sha1_file: always allow relative paths to alternates
  count-objects: report alternates via verbose mode
  fill_sha1_file: write into a strbuf
  alternates: store scratch buffer as strbuf
  fill_sha1_file: write "boring" characters
  alternates: use a separate scratch space
  alternates: encapsulate alt->base munging
  alternates: provide helper for allocating alternate
  alternates: provide helper for adding to alternates list
  link_alt_odb_entry: refactor string handling
  link_alt_odb_entry: handle normalize_path errors
  t5613: clarify "too deep" recursion tests
  t5613: do not chdir in main process
  t5613: whitespace/style cleanups
  t5613: use test_must_fail
  t5613: drop test_valid_repo function
  t5613: drop reachable_via function
2016-10-17 13:25:20 -07:00
Jeff King 38dbe5f078 alternates: store scratch buffer as strbuf
We pre-size the scratch buffer to hold a loose object
filename of the form "xx/yyyy...", which leads to allocation
code that is hard to verify. We have to use some magic
numbers during the initial allocation, and then writers must
blindly assume that the buffer is big enough. Using a strbuf
makes it more clear that we cannot overflow.

Unfortunately, we do still need some magic numbers to grow
our strbuf before calling fill_sha1_path(), but the strbuf
growth is much closer to the point of use. This makes it
easier to see that it's correct, and opens the possibility
of pushing it even further down if fill_sha1_path() learns
to work on strbufs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King 597f9134de alternates: use a separate scratch space
The alternate_object_database struct uses a single buffer
both for storing the path to the alternate, and as a scratch
buffer for forming object names. This is efficient (since
otherwise we'd end up storing the path twice), but it makes
life hard for callers who just want to know the path to the
alternate. They have to remember to stop reading after
"alt->name - alt->base" bytes, and to subtract one for the
trailing '/'.

It would be much simpler if they could simply access a
NUL-terminated path string. We could encapsulate this in a
function which puts a NUL in the scratch buffer and returns
the string, but that opens up questions about the lifetime
of the result. The first time another caller uses the
alternate, the scratch buffer may get other data tacked onto
it.

Let's instead just store the root path separately from the
scratch buffer. There aren't enough alternates being stored
for the duplicated data to matter for performance, and this
keeps things simple and safe for the callers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King 7f0fa2c02a alternates: provide helper for allocating alternate
Allocating a struct alternate_object_database is tricky, as
we must over-allocate the buffer to provide scratch space,
and then put in particular '/' and NUL markers.

Let's encapsulate this in a function so that the complexity
doesn't leak into callers (and so that we can modify it
later).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King 8e3f52d778 find_unique_abbrev: move logic out of get_short_sha1()
The get_short_sha1() is only about reading short sha1s; we
do call it in a loop to check "is this long enough" for each
object, but otherwise it should not need to know about
things like our default_abbrev setting.

So instead of asking it to set default_automatic_abbrev as a
side-effect, let's just have find_unique_abbrev() pick the
right place to start its loop.  This requires a separate
approximate_object_count() function, but that naturally
belongs with the rest of sha1_file.c.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 21:03:14 -07:00
Linus Torvalds e6c587c733 abbrev: auto size the default abbreviation
In fairly early days we somehow decided to abbreviate object names
down to 7-hexdigits, but as projects grow, it is becoming more and
more likely to see such a short object names made in earlier days
and recorded in the log messages no longer unique.

Currently the Linux kernel project needs 11 to 12 hexdigits, while
Git itself needs 10 hexdigits to uniquely identify the objects they
have, while many smaller projects may still be fine with the
original 7-hexdigit default.  One-size does not fit all projects.

Introduce a mechanism, where we estimate the number of objects in
the repository upon the first request to abbreviate an object name
with the default setting and come up with a sane default for the
repository.  Based on the expectation that we would see collision in
a repository with 2^(2N) objects when using object names shortened
to first N bits, use sufficient number of hexdigits to cover the
number of objects in the repository.  Each hexdigit (4-bits) we add
to the shortened name allows us to have four times (2-bits) as many
objects in the repository.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 12:54:29 -07:00
Jeff King 5b33cb1fd7 get_short_sha1: make default disambiguation configurable
When we find ambiguous short sha1s, we may get a
disambiguation rule from our caller's context. But if we
don't, we fall back to treating all sha1s the same, even
though most projects will tend to refer only to commits by
their short sha1s.

This patch introduces a configuration option that lets the
user pick a different fallback (e.g., only commits). It's
possible that we may want to make this the default, but it's
a good idea to start as a config option for two reasons:

  1. It lets people experiment with this and see if it's a
     good idea (i.e., the "tend to" above is an assumption;
     we don't really know if this will break some obscure
     cases).

  2. Even if we do flip the default, it gives people an
     escape hatch if it causes problems (you can sometimes
     override it by asking for "1234^{tree}", but not all
     combinations are possible).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 10:29:56 -07:00
Jeff King 1ffa26c461 get_short_sha1: list ambiguous objects on error
When the user gives us an ambiguous short sha1, we print an
error and refuse to resolve it. In some cases, the next step
is for them to feed us more characters (e.g., if they were
retyping or cut-and-pasting from a full sha1). But in other
cases, that might be all they have. For example, an old
commit message may have used a 7-character hex that was
unique at the time, but is now ambiguous.  Git doesn't
provide any information about the ambiguous objects it
found, so it's hard for the user to find out which one they
probably meant.

This patch teaches get_short_sha1() to list the sha1s of the
objects it found, along with a few bits of information that
may help the user decide which one they meant. Here's what
it looks like on git.git:

  $ git rev-parse b2e1
  error: short SHA1 b2e1 is ambiguous
  hint: The candidates are:
  hint:   b2e1196 tag v2.8.0-rc1
  hint:   b2e11d1 tree
  hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
  hint:   b2e1759 blob
  hint:   b2e18954 blob
  hint:   b2e1895c blob
  fatal: ambiguous argument 'b2e1': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

We show the tagname for tags, and the date and subject for
commits. For trees and blobs, in theory we could dig in the
history to find the paths at which they were present. But
that's very expensive (on the order of 30s for the kernel),
and it's not likely to be all that helpful. Most short
references are to commits, so the useful information is
typically going to be that the object in question _isn't_ a
commit. So it's silly to spend a lot of CPU preemptively
digging up the path; the user can do it themselves if they
really need to.

And of course it's somewhat ironic that we abbreviate the
sha1s in the disambiguation hint. But full sha1s would cause
annoying line wrapping for the commit lines, and presumably
the user is going to just re-issue their command immediately
with the corrected sha1.

We also restrict the list to those that match any
disambiguation hint. E.g.:

  $ git rev-parse b2e1:foo
  error: short SHA1 b2e1 is ambiguous
  hint: The candidates are:
  hint:   b2e1196 tag v2.8.0-rc1
  hint:   b2e11d1 tree
  hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
  fatal: Invalid object name 'b2e1'.

does not bother reporting the blobs, because they cannot
work as a treeish.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:55:31 -07:00
Jeff King fad6b9e590 for_each_abbrev: drop duplicate objects
If an object appears multiple times in the object database
(e.g., in both loose and packed form, or in two separate
packs), the disambiguation machinery may see it more than
once. The get_short_sha1() function handles this already,
but for_each_abbrev() blindly fires the callback for each
instance it finds.

We can fix this by collecting the output in a sha1 array and
de-duplicating it.  As a bonus, the sort done for the
de-duplication means that our output will be stable,
regardless of the order in which the objects are found.

Note that the old code normalized the callback's output to
0/1 to store in the 1-bit ds->ambiguous flag (which both
halted the iteration and was returned from the
for_each_abbrev function). Now that we are using sha1_array,
we can return the real value. In practice, it doesn't matter
as the sole caller only ever returns 0.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:46:41 -07:00
Jeff King 0c99171ad2 get_short_sha1: mark ambiguity error for translation
This is a human-readable message, and there's no reason it
should not be translated. While we're at it, let's drop the
period from the end, which is not our usual style.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:46:41 -07:00
Jeff King 59e4e34f69 get_short_sha1: NUL-terminate hex prefix
We store the hex prefix in a 40-byte buffer with the prefix
itself followed by 40-minus-len "x" characters. These x's
serve no purpose, and the lack of NUL termination makes the
prefix string annoying to use. Let's just terminate it.

Note that this is in contrast to the binary prefix, which
_must_ be zero-padded, because we look at the whole thing
during a binary search to find the first potential match in
each pack index. The loose-object hex search cannot use the
same trick because it has to do a linear walk through the
unsorted results of readdir() (and even if it could, you'd
want zeroes instead of x's).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:46:41 -07:00
Jeff King 0016043bf4 get_short_sha1: refactor init of disambiguation code
The disambiguation machinery has two callers: get_short_sha1
and for_each_abbrev. Both need to repeat much of the same
setup: declaring buffers, sanity-checking lengths, preparing
the prefixes, etc.  Let's pull that into a single init
function so we can avoid repeating ourselves.

Pulling the buffers into the "struct disambiguate_state"
isn't strictly necessary, but it does make things simpler
for the callers, who no longer have to worry about sizing
them correctly (i.e., it's an implicit requirement that
the caller provide 20- and 40-byte buffers).

And while we're touching this code, we can convert any
magic-number sizes to the more modern GIT_SHA1_* constants.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:46:39 -07:00
Jeff King 5d5def2aa5 get_short_sha1: parse tags when looking for treeish
The treeish disambiguation function tries to peel tags, but
it does so by calling:

  deref_tag(lookup_object(sha1), ...);

This will only work if we have previously looked at the tag
and created a "struct tag" for it. Since parsing revision
arguments typically happens before anything else, this is
usually not the case, and we would fail to peel the tag (we
are lucky that deref_tag() gracefully handles the NULL and
does not segfault).

Instead, we can use parse_object(). Note that this is the
same fix done by 94d75d1 (get_short_sha1(): correctly
disambiguate type-limited abbreviation, 2013-07-01), but
that commit fixed only the committish disambiguator, and
left the bug in the treeish one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:46:30 -07:00
Jeff King 8a10fea49b get_sha1: propagate flags to child functions
The get_sha1() function is actually implementation by many
sub-functions, but we do not always pass our flags around to
all of those functions. As a result, we may forget that our
caller asked us to resolve with GET_SHA1_QUIETLY and output
messages. The two triggerable cases are:

  1. Resolving treeish:path will resolve the "treeish"
     portion using GET_SHA1_TREEISH, dropping all other
     flags.

  2. The peel_onion() function did not take flags at all
     but recurses to get_sha1_1(), which does.

The solution for both is to bitwise-OR their new flags with
the existing ones (after dropping any mutually exclusive
disambiguation flags).

This bug can trigger with "git rev-parse --quiet", which
asks for quiet resolution. But it can also happen in a more
vanilla code path when we do a follow-up ONLY_TO_DIE
invocation of get_sha1(), and that's what the tests check.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:46:30 -07:00
Jeff King 7243ffdd78 get_sha1: avoid repeating ourselves via ONLY_TO_DIE
When the revision code cannot parse an argument like
"HEAD:foo", it will call maybe_die_on_misspelt_object_name(),
which re-runs get_sha1() with an extra ONLY_TO_DIE flag. We
then spend more effort to generate a better error message.

Unfortunately, a side effect is that our second call may
repeat the same error messages from the original get_sha1()
call. You can see this with:

  $ git show 0017
  error: short SHA1 0017 is ambiguous.
  error: short SHA1 0017 is ambiguous.
  fatal: ambiguous argument '0017': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

where the second "error:" line comes from the ONLY_TO_DIE
call.

To fix this, we can make ONLY_TO_DIE imply QUIETLY. This is
a little odd, because the whole point of ONLY_TO_DIE is to
output error messages. But what we want to do is tell the
rest of the get_sha1() code (particularly get_sha1_1()) that
the _regular_ messages should be quiet, but the only-to-die
ones should not.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:46:30 -07:00
Jeff King 259942f549 get_sha1: detect buggy calls with multiple disambiguators
The get_sha1() family of functions takes a flags field, but
some of the flags are mutually exclusive. In particular, we
can only handle one disambiguating function, and the flags
quietly override each other. Let's instead detect these as
programming bugs.

Technically some of the flags are supersets of the others,
so treating COMMITTISH|TREEISH as just COMMITTISH is not
wrong, but it's a good sign the caller is confused. And
certainly asking for BLOB|TREE does not work.

We can do the check easily with some bit-twiddling, and as a
bonus, the bit-mask of disambiguators will come in handy in
a future patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:21:28 -07:00
brian m. carlson 151b2911c1 sha1_name: convert get_sha1_mb to struct object_id
All of the callers of this function use struct object_id, so rename it
to get_oid_mb and make it take struct object_id instead of
unsigned char *.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson 99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Junio C Hamano 8429f2b42d Merge branch 'bc/object-id'
Move from unsigned char[20] to struct object_id continues.

* bc/object-id:
  match-trees: convert several leaf functions to use struct object_id
  tree-walk: convert tree_entry_extract() to use struct object_id
  struct name_entry: use struct object_id instead of unsigned char sha1[20]
  match-trees: convert shift_tree() and shift_tree_by() to use object_id
  test-match-trees: convert to use struct object_id
  sha1-name: introduce a get_oid() function
2016-05-06 14:45:44 -07:00
brian m. carlson 2764fd93ad sha1-name: introduce a get_oid() function
The get_oid() function is equivalent to the get_sha1() function, but
uses a struct object_id instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-19 15:28:58 -07:00
Jeff King 46c3cd44d7 setup: make startup_info available everywhere
Commit a60645f (setup: remember whether repository was
found, 2010-08-05) introduced the startup_info structure,
which records some parts of the setup_git_directory()
process (notably, whether we actually found a repository or
not).

One of the uses of this data is for functions to behave
appropriately based on whether we are in a repo. But the
startup_info struct is just a pointer to storage provided by
the main program, and the only program that sets it up is
the git.c wrapper. Thus builtins have access to
startup_info, but externally linked programs do not.

Worse, library code which is accessible from both has to be
careful about accessing startup_info. This can be used to
trigger a die("BUG") via get_sha1():

	$ git fast-import <<-\EOF
	tag foo
	from HEAD:./whatever
	EOF

	fatal: BUG: startup_info struct is not initialized.

Obviously that's fairly nonsensical input to feed to
fast-import, but we should never hit a die("BUG"). And there
may be other ways to trigger it if other non-builtins
resolve sha1s.

So let's point the storage for startup_info to a static
variable in setup.c, making it available to all users of the
library code. We _could_ turn startup_info into a regular
extern struct, but doing so would mean tweaking all of the
existing use sites. So let's leave the pointer indirection
in place.  We can, however, drop any checks for NULL, as
they will always be false (and likewise, we can drop the
test covering this case, which was a rather artificial
situation using one of the test-* programs).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-06 17:17:37 -08:00
Junio C Hamano 11529ecec9 Merge branch 'jk/tighten-alloc'
Update various codepaths to avoid manually-counted malloc().

* jk/tighten-alloc: (22 commits)
  ewah: convert to REALLOC_ARRAY, etc
  convert ewah/bitmap code to use xmalloc
  diff_populate_gitlink: use a strbuf
  transport_anonymize_url: use xstrfmt
  git-compat-util: drop mempcpy compat code
  sequencer: simplify memory allocation of get_message
  test-path-utils: fix normalize_path_copy output buffer size
  fetch-pack: simplify add_sought_entry
  fast-import: simplify allocation in start_packfile
  write_untracked_extension: use FLEX_ALLOC helper
  prepare_{git,shell}_cmd: use argv_array
  use st_add and st_mult for allocation size computation
  convert trivial cases to FLEX_ARRAY macros
  use xmallocz to avoid size arithmetic
  convert trivial cases to ALLOC_ARRAY
  convert manual allocations to argv_array
  argv-array: add detach function
  add helpers for allocating flex-array structs
  harden REALLOC_ARRAY and xcalloc against size_t overflow
  tree-diff: catch integer overflow in combine_diff_path allocation
  ...
2016-02-26 13:37:16 -08:00
Junio C Hamano e6a6a768ca Merge branch 'nd/dwim-wildcards-as-pathspecs'
"git show 'HEAD:Foo[BAR]Baz'" did not interpret the argument as a
rev, i.e. the object named by the the pathname with wildcard
characters in a tree object.

* nd/dwim-wildcards-as-pathspecs:
  get_sha1: don't die() on bogus search strings
  check_filename: tighten dwim-wildcard ambiguity
  checkout: reorder check_filename conditional
2016-02-24 13:25:52 -08:00
Jeff King 50a6c8efa2 use st_add and st_mult for allocation size computation
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Junio C Hamano fb795323ce Merge branch 'wp/sha1-name-negative-match'
A new "<branch>^{/!-<pattern>}" notation can be used to name a
commit that is reachable from <branch> that does not match the
given <pattern>.

* wp/sha1-name-negative-match:
  object name: introduce '^{/!-<negative pattern>}' notation
  test for '!' handling in rev-parse's named commits
2016-02-10 14:20:10 -08:00
Jeff King aac4fac168 get_sha1: don't die() on bogus search strings
The get_sha1() function generally returns an error code
rather than dying, and we sometimes speculatively call it
with something that may be a revision or a pathspec, in
order to see which one it might be.

If it sees a bogus ":/" search string, though, it complains,
without giving the caller the opportunity to recover. We can
demonstrate this in t6133 by looking for ":/*.t", which
should mean "*.t at the root of the tree", but instead dies
because of the invalid regex (the "*" has nothing to operate
on).

We can fix this by returning an error rather than calling
die(). Unfortunately, the tradeoff is that the error message
is slightly worse in cases where we _do_ know we have a rev.
E.g., running "git log ':/*.t' --" before yielded:

  fatal: Invalid search pattern: *.t

and now we get only:

  fatal: bad revision ':/*.t'

There's not a simple way to fix this short of passing a
"quiet" flag all the way through the get_sha1() stack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 13:53:21 -08:00
Will Palmer 0769854f3d object name: introduce '^{/!-<negative pattern>}' notation
To name a commit, you can now use the :/!-<negative pattern> regex
style, and consequentially, say

    $ git rev-parse HEAD^{/!-foo}

and it will return the hash of the first commit reachable from HEAD,
whose commit message does not contain "foo". This is the opposite of the
existing <rev>^{/<pattern>} syntax.

The specific use-case this is intended for is to perform an operation,
excluding the most-recent commits containing a particular marker. For
example, if you tend to make "work in progress" commits, with messages
beginning with "WIP", you work, then it could be useful to diff against
"the most recent commit which was not a WIP commit". That sort of thing
now possible, via commands such as:

    $ git diff @^{/!-^WIP}

The leader '/!-', rather than simply '/!', to denote a negative match,
is chosen to leave room for additional modifiers in the future.

Signed-off-by: Will Palmer <wmpalmer@gmail.com>
Signed-off-by: Stephen P. Smith <ischis2@cox.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-01 13:40:37 -08:00
brian m. carlson ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson f2fd0760f6 Convert struct object to object_id
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson 7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Jeff King 43bb66ae0b diagnose_invalid_index_path: use strbuf to avoid strcpy/strcat
We dynamically allocate a buffer and then strcpy and strcat
into it. This isn't buggy, but we'd prefer to avoid these
suspicious functions.

This would be a good candidate for converstion to xstrfmt,
but we need to record the length for dealing with index
entries. A strbuf handles that for us.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Jeff King c3bb0ac796 find_short_object_filename: convert sprintf to xsnprintf
We use sprintf() to format some hex data into a buffer. The
buffer is clearly long enough, and using snprintf here is
not necessary. And in fact, it does not really make anything
easier to audit, as the size we feed to snprintf accounts
for the magic extra 42 bytes found in each alt->name field
of struct alternate_object_database (which is there exactly
to do this formatting).

Still, it is nice to remove an sprintf call and replace it
with an xsnprintf and explanatory comment, which makes it
easier to audit the code base for overflows.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King af49c6d091 add reentrant variants of sha1_to_hex and find_unique_abbrev
The sha1_to_hex and find_unique_abbrev functions always
write into reusable static buffers. There are a few problems
with this:

  - future calls overwrite our result. This is especially
    annoying with find_unique_abbrev, which does not have a
    ring of buffers, so you cannot even printf() a result
    that has two abbreviated sha1s.

  - if you want to put the result into another buffer, we
    often strcpy, which looks suspicious when auditing for
    overflows.

This patch introduces sha1_to_hex_r and find_unique_abbrev_r,
which write into a user-provided buffer. Of course this is
just punting on the overflow-auditing, as the buffer
obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is
much easier to audit, since that is a well-known size.

We retain the non-reentrant forms, which just become thin
wrappers around the reentrant ones. This patch also adds a
strbuf variant of find_unique_abbrev, which will be handy in
later patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King a5481a6c94 convert "enum date_mode" into a struct
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.

Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor.  However, the tricky case is where we use the
enum labels as constants, like:

  show_date(t, tz, DATE_NORMAL);

Ideally we could say:

  show_date(t, tz, &{ DATE_NORMAL });

but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:

  1. Manually add a "struct date_mode d = { DATE_NORMAL }"
     definition to each caller, and pass "&d". This makes
     the callers uglier, because they sometimes do not even
     have their own scope (e.g., they are inside a switch
     statement).

  2. Provide a pre-made global "date_normal" struct that can
     be passed by address. We'd also need "date_rfc2822",
     "date_iso8601", and so forth. But at least the ugliness
     is defined in one place.

  3. Provide a wrapper that generates the correct struct on
     the fly. The big downside is that we end up pointing to
     a single global, which makes our wrapper non-reentrant.
     But show_date is already not reentrant, so it does not
     matter.

This patch implements 3, along with a minor macro to keep
the size of the callers sane.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-29 11:39:07 -07:00
Junio C Hamano 5455ee0573 Merge branch 'bc/object-id'
for_each_ref() callback functions were taught to name the objects
not with "unsigned char sha1[20]" but with "struct object_id".

* bc/object-id: (56 commits)
  struct ref_lock: convert old_sha1 member to object_id
  warn_if_dangling_symref(): convert local variable "junk" to object_id
  each_ref_fn_adapter(): remove adapter
  rev_list_insert_ref(): remove unneeded arguments
  rev_list_insert_ref_oid(): new function, taking an object_oid
  mark_complete(): remove unneeded arguments
  mark_complete_oid(): new function, taking an object_oid
  clear_marks(): rewrite to take an object_id argument
  mark_complete(): rewrite to take an object_id argument
  send_ref(): convert local variable "peeled" to object_id
  upload-pack: rewrite functions to take object_id arguments
  find_symref(): convert local variable "unused" to object_id
  find_symref(): rewrite to take an object_id argument
  write_one_ref(): rewrite to take an object_id argument
  write_refs_to_temp_dir(): convert local variable sha1 to object_id
  submodule: rewrite to take an object_id argument
  shallow: rewrite functions to take object_id arguments
  handle_one_ref(): rewrite to take an object_id argument
  add_info_ref(): rewrite to take an object_id argument
  handle_one_reflog(): rewrite to take an object_id argument
  ...
2015-06-05 12:17:37 -07:00
Junio C Hamano c4a8354bc1 Merge branch 'jk/at-push-sha1'
Introduce <branch>@{push} short-hand to denote the remote-tracking
branch that tracks the branch at the remote the <branch> would be
pushed to.

* jk/at-push-sha1:
  for-each-ref: accept "%(push)" format
  for-each-ref: use skip_prefix instead of starts_with
  sha1_name: implement @{push} shorthand
  sha1_name: refactor interpret_upstream_mark
  sha1_name: refactor upstream_mark
  remote.c: add branch_get_push
  remote.c: return upstream name from stat_tracking_info
  remote.c: untangle error logic in branch_get_upstream
  remote.c: report specific errors from branch_get_upstream
  remote.c: introduce branch_get_upstream helper
  remote.c: hoist read_config into remote_get_1
  remote.c: provide per-branch pushremote name
  remote.c: hoist branch.*.remote lookup out of remote_get_1
  remote.c: drop "remote" pointer from "struct branch"
  remote.c: refactor setup of branch->merge list
  remote.c: drop default_remote_name variable
2015-06-05 12:17:36 -07:00
Junio C Hamano 67f0b6f3b2 Merge branch 'dt/cat-file-follow-symlinks'
"git cat-file --batch(-check)" learned the "--follow-symlinks"
option that follows an in-tree symbolic link when asked about an
object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at
Documentation/RelNotes/2.5.0.txt.  With the new option, the command
behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as
input instead.

* dt/cat-file-follow-symlinks:
  cat-file: add --follow-symlinks to --batch
  sha1_name: get_sha1_with_context learns to follow symlinks
  tree-walk: learn get_tree_entry_follow_symlinks
2015-06-01 12:45:16 -07:00
Michael Haggerty 9c5fe0b846 handle_one_ref(): rewrite to take an object_id argument
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:35 -07:00
Michael Haggerty 2b2a5be394 each_ref_fn: change to take an object_id parameter
Change typedef each_ref_fn to take a "const struct object_id *oid"
parameter instead of "const unsigned char *sha1".

To aid this transition, implement an adapter that can be used to wrap
old-style functions matching the old typedef, which is now called
"each_ref_sha1_fn"), and make such functions callable via the new
interface. This requires the old function and its cb_data to be
wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
used as the cb_data for an adapter function, each_ref_fn_adapter().

This is an enormous diff, but most of it consists of simple,
mechanical changes to the sites that call any of the "for_each_ref"
family of functions. Subsequent to this change, the call sites can be
rewritten one by one to use the new interface.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:27 -07:00
Jeff King adfe5d0434 sha1_name: implement @{push} shorthand
In a triangular workflow, each branch may have two distinct
points of interest: the @{upstream} that you normally pull
from, and the destination that you normally push to. There
isn't a shorthand for the latter, but it's useful to have.

For instance, you may want to know which commits you haven't
pushed yet:

  git log @{push}..

Or as a more complicated example, imagine that you normally
pull changes from origin/master (which you set as your
@{upstream}), and push changes to your own personal fork
(e.g., as myfork/topic). You may push to your fork from
multiple machines, requiring you to integrate the changes
from the push destination, rather than upstream. With this
patch, you can just do:

  git rebase @{push}

rather than typing out the full name.

The heavy lifting is all done by branch_get_push; here we
just wire it up to the "@{push}" syntax.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-22 09:33:08 -07:00
Jeff King 48c58471c2 sha1_name: refactor interpret_upstream_mark
Now that most of the logic for our local get_upstream_branch
has been pushed into the generic branch_get_upstream, we can
fold the remainder into interpret_upstream_mark.

Furthermore, what remains is generic to any branch-related
"@{foo}" we might add in the future, and there's enough
boilerplate that we'd like to reuse it. Let's parameterize
the two operations (parsing the mark and computing its
value) so that we can reuse this for "@{push}" in the near
future.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-22 09:33:08 -07:00
Jeff King a1ad0eb0cb sha1_name: refactor upstream_mark
We will be adding new mark types in the future, so separate
the suffix data from the logic.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-22 09:33:08 -07:00
Jeff King 3a429d0af3 remote.c: report specific errors from branch_get_upstream
When the previous commit introduced the branch_get_upstream
helper, there was one call-site that could not be converted:
the one in sha1_name.c, which gives detailed error messages
for each possible failure.

Let's teach the helper to optionally report these specific
errors. This lets us convert another callsite, and means we
can use the helper in other locations that want to give the
same error messages.

The logic and error messages come straight from sha1_name.c,
with the exception that we start each error with a lowercase
letter, as is our usual style (note that a few tests need
updated as a result).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-21 11:07:46 -07:00
René Scharfe dbe44faadb use file_exists() to check if a file exists in the worktree
Call file_exists() instead of open-coding it.  That's shorter, simpler
and the intent becomes clearer.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 13:49:10 -07:00