Commit graph

1491 commits

Author SHA1 Message Date
Junio C Hamano 906d6906fb Merge branch 'ls/git-open-cloexec'
Git generally does not explicitly close file descriptors that were
open in the parent process when spawning a child process, but most
of the time the child does not want to access them. As Windows does
not allow removing or renaming a file that has a file descriptor
open, a slow-to-exit child can even break the parent process by
holding onto them.  Use O_CLOEXEC flag to open files in various
codepaths.

* ls/git-open-cloexec:
  read-cache: make sure file handles are not inherited by child processes
  sha1_file: open window into packfiles with O_CLOEXEC
  sha1_file: rename git_open_noatime() to git_open()
2016-10-31 13:15:21 -07:00
Junio C Hamano 0d9c527d59 Merge branch 'jk/no-looking-at-dotgit-outside-repo'
Update "git diff --no-index" codepath not to try to peek into .git/
directory that happens to be under the current directory, when we
know we are operating outside any repository.

* jk/no-looking-at-dotgit-outside-repo:
  diff: handle sha1 abbreviations outside of repository
  diff_aligned_abbrev: use "struct oid"
  diff_unique_abbrev: rename to diff_aligned_abbrev
  find_unique_abbrev: use 4-buffer ring
  test-*-cache-tree: setup git dir
  read info/{attributes,exclude} only when in repository
2016-10-27 14:58:48 -07:00
Junio C Hamano d7ae013a31 Merge branch 'jk/abbrev-auto'
Updates the way approximate count of total objects is computed
while attempting to come up with a unique abbreviated object name,
which in turn needs to estimate how many hexdigits are necessary to
ensure uniqueness.

* jk/abbrev-auto:
  find_unique_abbrev: move logic out of get_short_sha1()
2016-10-27 14:58:47 -07:00
Junio C Hamano 580d820ece Merge branch 'lt/abbrev-auto'
Allow the default abbreviation length, which has historically been
7, to scale as the repository grows.  The logic suggests to use 12
hexdigits for the Linux kernel, and 9 to 10 for Git itself.

* lt/abbrev-auto:
  abbrev: auto size the default abbreviation
  abbrev: prepare for new world order
  abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
2016-10-27 14:58:47 -07:00
Jeff King ef2ed5013c find_unique_abbrev: use 4-buffer ring
Some code paths want to format multiple abbreviated sha1s in
the same output line. Because we use a single static buffer
for our return value, they have to either break their output
into several calls or allocate their own arrays and use
find_unique_abbrev_r().

Intead, let's mimic sha1_to_hex() and use a ring of several
buffers, so that the return value stays valid through
multiple calls. This shortens some of the callers, and makes
it harder to for them to make a silly mistake.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-26 13:30:51 -07:00
Junio C Hamano 9fcd14491d Merge branch 'jk/fetch-quick-tag-following'
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.

* jk/fetch-quick-tag-following:
  fetch: use "quick" has_sha1_file for tag following
2016-10-26 13:14:47 -07:00
Junio C Hamano 1c2b1f7018 Merge branch 'bw/ls-files-recurse-submodules'
"git ls-files" learned "--recurse-submodules" option that can be
used to get a listing of tracked files across submodules (i.e. this
only works with "--cached" option, not for listing untracked or
ignored files).  This would be a useful tool to sit on the upstream
side of a pipe that is read with xargs to work on all working tree
files from the top-level superproject.

* bw/ls-files-recurse-submodules:
  ls-files: add pathspec matching for submodules
  ls-files: pass through safe options for --recurse-submodules
  ls-files: optionally recurse into submodules
  git: make super-prefix option
2016-10-26 13:14:44 -07:00
Lars Schneider a5436b5794 sha1_file: rename git_open_noatime() to git_open()
This function is meant to be used when reading from files in the
object store, and the original objective was to avoid smudging atime
of loose object files too often, hence its name.  Because we'll be
extending its role in the next commit to also arrange the file
descriptors they return auto-closed in the child processes, rename
it to lose "noatime" part that is too specific.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-25 10:59:13 -07:00
Junio C Hamano dec040192f Merge branch 'jk/alt-odb-cleanup'
Codepaths involved in interacting alternate object store have
been cleaned up.

* jk/alt-odb-cleanup:
  alternates: use fspathcmp to detect duplicates
  sha1_file: always allow relative paths to alternates
  count-objects: report alternates via verbose mode
  fill_sha1_file: write into a strbuf
  alternates: store scratch buffer as strbuf
  fill_sha1_file: write "boring" characters
  alternates: use a separate scratch space
  alternates: encapsulate alt->base munging
  alternates: provide helper for allocating alternate
  alternates: provide helper for adding to alternates list
  link_alt_odb_entry: refactor string handling
  link_alt_odb_entry: handle normalize_path errors
  t5613: clarify "too deep" recursion tests
  t5613: do not chdir in main process
  t5613: whitespace/style cleanups
  t5613: use test_must_fail
  t5613: drop test_valid_repo function
  t5613: drop reachable_via function
2016-10-17 13:25:20 -07:00
Junio C Hamano 25ab004c53 Merge branch 'jk/quarantine-received-objects'
In order for the receiving end of "git push" to inspect the
received history and decide to reject the push, the objects sent
from the sending end need to be made available to the hook and
the mechanism for the connectivity check, and this was done
traditionally by storing the objects in the receiving repository
and letting "git gc" to expire it.  Instead, store the newly
received objects in a temporary area, and make them available by
reusing the alternate object store mechanism to them only while we
decide if we accept the check, and once we decide, either migrate
them to the repository or purge them immediately.

* jk/quarantine-received-objects:
  tmp-objdir: do not migrate files starting with '.'
  tmp-objdir: put quarantine information in the environment
  receive-pack: quarantine objects until pre-receive accepts
  tmp-objdir: introduce API for temporary object directories
  check_connected: accept an env argument
2016-10-17 13:25:20 -07:00
Jeff King 5827a03545 fetch: use "quick" has_sha1_file for tag following
When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.

This has gone unnoticed for several years because it
requires a fairly unique setup to matter:

  1. You need to have a lot of packs on the client side to
     make reprepare_packed_git() expensive (the most
     expensive part is finding duplicates in an unsorted
     list, which is currently quadratic).

  2. You need a large number of tag refs on the server side
     that are candidates for auto-following (i.e., that the
     client doesn't have). Each one triggers a re-read of
     the pack directory.

  3. Under normal circumstances, the client would
     auto-follow those tags and after one large fetch, (2)
     would no longer be true. But if those tags point to
     history which is disconnected from what the client
     otherwise fetches, then it will never auto-follow, and
     those candidates will impact it on every fetch.

So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).

This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.

Here are results from the included perf script, which sets
up a situation similar to the one described above:

Test            HEAD^               HEAD
----------------------------------------------------------
5550.4: fetch   11.21(10.42+0.78)   0.08(0.04+0.02) -99.3%

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-14 11:31:32 -07:00
Junio C Hamano e6e24c94df Merge branch 'jk/pack-objects-optim-mru'
"git pack-objects" in a repository with many packfiles used to
spend a lot of time looking for/at objects in them; the accesses to
the packfiles are now optimized by checking the most-recently-used
packfile first.

* jk/pack-objects-optim-mru:
  pack-objects: use mru list when iterating over packs
  pack-objects: break delta cycles before delta-search phase
  sha1_file: make packed_object_info public
  provide an initializer for "struct object_info"
2016-10-10 14:03:47 -07:00
Jeff King e34c2e010f tmp-objdir: put quarantine information in the environment
The presence of the GIT_QUARANTINE_PATH variable lets any
called programs know that they're operating in a temporary
object directory (and where that directory is).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:54:02 -07:00
Jeff King 38dbe5f078 alternates: store scratch buffer as strbuf
We pre-size the scratch buffer to hold a loose object
filename of the form "xx/yyyy...", which leads to allocation
code that is hard to verify. We have to use some magic
numbers during the initial allocation, and then writers must
blindly assume that the buffer is big enough. Using a strbuf
makes it more clear that we cannot overflow.

Unfortunately, we do still need some magic numbers to grow
our strbuf before calling fill_sha1_path(), but the strbuf
growth is much closer to the point of use. This makes it
easier to see that it's correct, and opens the possibility
of pushing it even further down if fill_sha1_path() learns
to work on strbufs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King 597f9134de alternates: use a separate scratch space
The alternate_object_database struct uses a single buffer
both for storing the path to the alternate, and as a scratch
buffer for forming object names. This is efficient (since
otherwise we'd end up storing the path twice), but it makes
life hard for callers who just want to know the path to the
alternate. They have to remember to stop reading after
"alt->name - alt->base" bytes, and to subtract one for the
trailing '/'.

It would be much simpler if they could simply access a
NUL-terminated path string. We could encapsulate this in a
function which puts a NUL in the scratch buffer and returns
the string, but that opens up questions about the lifetime
of the result. The first time another caller uses the
alternate, the scratch buffer may get other data tacked onto
it.

Let's instead just store the root path separately from the
scratch buffer. There aren't enough alternates being stored
for the duplicated data to matter for performance, and this
keeps things simple and safe for the callers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King 7f0fa2c02a alternates: provide helper for allocating alternate
Allocating a struct alternate_object_database is tricky, as
we must over-allocate the buffer to provide scratch space,
and then put in particular '/' and NUL markers.

Let's encapsulate this in a function so that the complexity
doesn't leak into callers (and so that we can modify it
later).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King a5b34d2152 alternates: provide helper for adding to alternates list
The submodule code wants to temporarily add an alternate
object store to our in-memory alt_odb list, but does it
manually. Let's provide a helper so it can reuse the code in
link_alt_odb_entry().

While we're adding our new add_to_alternates_memory(), let's
document add_to_alternates_file(), as the two are related.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Brandon Williams 74866d7579 git: make super-prefix option
Add a super-prefix environment variable 'GIT_INTERNAL_SUPER_PREFIX'
which can be used to specify a path from above a repository down to its
root.  When such a super-prefix is specified, the paths reported by Git
are prefixed with it to make them relative to that directory "above".
The paths given by the user on the command line
(e.g. "git subcmd --output-file=path/to/a/file" and pathspecs) are taken
relative to the directory "above" to match.

The immediate use of this option is by commands which have a
--recurse-submodule option in order to give context to submodules about
how they were invoked.  This option is currently only allowed for
builtins which support a super-prefix.

Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 12:14:58 -07:00
Junio C Hamano 66c22ba6fb Merge branch 'jk/ambiguous-short-object-names'
When given an abbreviated object name that is not (or more
realistically, "no longer") unique, we gave a fatal error
"ambiguous argument".  This error is now accompanied by hints that
lists the objects that begins with the given prefix.  During the
course of development of this new feature, numerous minor bugs were
uncovered and corrected, the most notable one of which is that we
gave "short SHA1 xxxx is ambiguous." twice without good reason.

* jk/ambiguous-short-object-names:
  get_short_sha1: make default disambiguation configurable
  get_short_sha1: list ambiguous objects on error
  for_each_abbrev: drop duplicate objects
  sha1_array: let callbacks interrupt iteration
  get_short_sha1: mark ambiguity error for translation
  get_short_sha1: NUL-terminate hex prefix
  get_short_sha1: refactor init of disambiguation code
  get_short_sha1: parse tags when looking for treeish
  get_sha1: propagate flags to child functions
  get_sha1: avoid repeating ourselves via ONLY_TO_DIE
  get_sha1: detect buggy calls with multiple disambiguators
2016-10-06 14:53:10 -07:00
Jeff King 8e3f52d778 find_unique_abbrev: move logic out of get_short_sha1()
The get_short_sha1() is only about reading short sha1s; we
do call it in a loop to check "is this long enough" for each
object, but otherwise it should not need to know about
things like our default_abbrev setting.

So instead of asking it to set default_automatic_abbrev as a
side-effect, let's just have find_unique_abbrev() pick the
right place to start its loop.  This requires a separate
approximate_object_count() function, but that naturally
belongs with the rest of sha1_file.c.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 21:03:14 -07:00
Junio C Hamano 53eb85e623 Merge branch 'nd/init-core-worktree-in-multi-worktree-world'
"git init" tried to record core.worktree in the repository's
'config' file when GIT_WORK_TREE environment variable was set and
it was different from where GIT_DIR appears as ".git" at its top,
but the logic was faulty when .git is a "gitdir:" file that points
at the real place, causing trouble in working trees that are
managed by "git worktree".  This has been corrected.

* nd/init-core-worktree-in-multi-worktree-world:
  init: kill git_link variable
  init: do not set unnecessary core.worktree
  init: kill set_git_dir_init()
  init: call set_git_dir_init() from within init_db()
  init: correct re-initialization from a linked worktree
2016-10-03 13:30:35 -07:00
Linus Torvalds e6c587c733 abbrev: auto size the default abbreviation
In fairly early days we somehow decided to abbreviate object names
down to 7-hexdigits, but as projects grow, it is becoming more and
more likely to see such a short object names made in earlier days
and recorded in the log messages no longer unique.

Currently the Linux kernel project needs 11 to 12 hexdigits, while
Git itself needs 10 hexdigits to uniquely identify the objects they
have, while many smaller projects may still be fine with the
original 7-hexdigit default.  One-size does not fit all projects.

Introduce a mechanism, where we estimate the number of objects in
the repository upon the first request to abbreviate an object name
with the default setting and come up with a sane default for the
repository.  Based on the expectation that we would see collision in
a repository with 2^(2N) objects when using object names shortened
to first N bits, use sufficient number of hexdigits to cover the
number of objects in the repository.  Each hexdigit (4-bits) we add
to the shortened name allows us to have four times (2-bits) as many
objects in the repository.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 12:54:29 -07:00
Junio C Hamano 65acfeacaa abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
We'll be introducing a new way to decide the default abbreviation
length by initialising DEFAULT_ABBREV to -1 to signal the first call
to "find unique abbreviation" codepath to compute a reasonable value
based on the number of objects we have to avoid collisions.

We have long relied on DEFAULT_ABBREV being a positive concrete
value that is used as the abbreviation length when no extra
configuration or command line option has overridden it.  Some
codepaths wants to use such a positive concrete default value
even before making their first request to actually trigger the
computation for the auto sized default.

Introduce FALLBACK_DEFAULT_ABBREV and use it to the code that
attempts to align the report from "git fetch".  For now, this
macro is also used to initialize the default_abbrev variable,
but the auto-sizing code will use -1 and then use the value of
FALLBACK_DEFAULT_ABBREV as the starting point of auto-sizing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 12:54:21 -07:00
Jeff King 5b33cb1fd7 get_short_sha1: make default disambiguation configurable
When we find ambiguous short sha1s, we may get a
disambiguation rule from our caller's context. But if we
don't, we fall back to treating all sha1s the same, even
though most projects will tend to refer only to commits by
their short sha1s.

This patch introduces a configuration option that lets the
user pick a different fallback (e.g., only commits). It's
possible that we may want to make this the default, but it's
a good idea to start as a config option for two reasons:

  1. It lets people experiment with this and see if it's a
     good idea (i.e., the "tend to" above is an assumption;
     we don't really know if this will break some obscure
     cases).

  2. Even if we do flip the default, it gives people an
     escape hatch if it causes problems (you can sometimes
     override it by asking for "1234^{tree}", but not all
     combinations are possible).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 10:29:56 -07:00
Junio C Hamano e683f17e63 Merge branch 'rs/checkout-init-macro'
Code cleanup.

* rs/checkout-init-macro:
  introduce CHECKOUT_INIT
2016-09-26 16:09:21 -07:00
Junio C Hamano ebc63580a1 Merge branch 'tg/add-chmod+x-fix'
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.

* tg/add-chmod+x-fix:
  t3700-add: do not check working tree file mode without POSIXPERM
  t3700-add: create subdirectory gently
  add: modify already added files when --chmod is given
  read-cache: introduce chmod_index_entry
  update-index: add test for chmod flags
2016-09-26 16:09:20 -07:00
Jeff King 259942f549 get_sha1: detect buggy calls with multiple disambiguators
The get_sha1() family of functions takes a flags field, but
some of the flags are mutually exclusive. In particular, we
can only handle one disambiguating function, and the flags
quietly override each other. Let's instead detect these as
programming bugs.

Technically some of the flags are supersets of the others,
so treating COMMITTISH|TREEISH as just COMMITTISH is not
wrong, but it's a good sign the caller is confused. And
certainly asking for BLOB|TREE does not work.

We can do the check easily with some bit-twiddling, and as a
bonus, the bit-mask of disambiguators will come in handy in
a future patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:21:28 -07:00
Nguyễn Thái Ngọc Duy 33158701e2 init: call set_git_dir_init() from within init_db()
The next commit requires that set_git_dir_init() must be called before
init_db(). Let's make sure nobody can do otherwise.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-25 16:32:35 -07:00
René Scharfe 68e3d6292f introduce CHECKOUT_INIT
Add a static initializer for struct checkout and use it throughout the
code base.  It's shorter, avoids a memset(3) call and makes sure the
base_dir member is initialized to a valid (empty) string.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-22 13:42:18 -07:00
Junio C Hamano d845d727cb Merge branch 'jk/setup-sequence-update'
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour.  The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.

* jk/setup-sequence-update:
  t1007: factor out repeated setup
  init: reset cached config when entering new repo
  init: expand comments explaining config trickery
  config: only read .git/config from configured repos
  test-config: setup git directory
  t1302: use "git -C"
  pager: handle early config
  pager: use callbacks instead of configset
  pager: make pager_program a file-local static
  pager: stop loading git_default_config()
  pager: remove obsolete comment
  diff: always try to set up the repository
  diff: handle --no-index prefixes consistently
  diff: skip implicit no-index check when given --no-index
  patch-id: use RUN_SETUP_GENTLY
  hash-object: always try to set up the git repository
2016-09-21 15:15:24 -07:00
Junio C Hamano 4af9a7d344 Merge branch 'bc/object-id'
The "unsigned char sha1[20]" to "struct object_id" conversion
continues.  Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.

It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2").  Extra sets of eyes double-checking for
mismerges are highly appreciated.

* bc/object-id:
  builtin/reset: convert to use struct object_id
  builtin/commit-tree: convert to struct object_id
  builtin/am: convert to struct object_id
  refs: add an update_ref_oid function.
  sha1_name: convert get_sha1_mb to struct object_id
  builtin/update-index: convert file to struct object_id
  notes: convert init_notes to use struct object_id
  builtin/rm: convert to use struct object_id
  builtin/blame: convert file to use struct object_id
  Convert read_mmblob to take struct object_id.
  notes-merge: convert struct notes_merge_pair to struct object_id
  builtin/checkout: convert some static functions to struct object_id
  streaming: make stream_blob_to_fd take struct object_id
  builtin: convert textconv_object to use struct object_id
  builtin/cat-file: convert some static functions to struct object_id
  builtin/cat-file: convert struct expand_data to use struct object_id
  builtin/log: convert some static functions to use struct object_id
  builtin/blame: convert struct origin to use struct object_id
  builtin/apply: convert static functions to struct object_id
  cache: convert struct cache_entry to use struct object_id
2016-09-19 13:47:19 -07:00
Thomas Gummerer 610d55af0f add: modify already added files when --chmod is given
When the chmod option was added to git add, it was hooked up to the diff
machinery, meaning that it only works when the version in the index
differs from the version on disk.

As the option was supposed to mirror the chmod option in update-index,
which always changes the mode in the index, regardless of the status of
the file, make sure the option behaves the same way in git add.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
Thomas Gummerer d9d7096662 read-cache: introduce chmod_index_entry
As there are chmod options for both add and update-index, introduce a
new chmod_index_entry function to do the work.  Use it in update-index,
while it will be used in add in the next patch.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
Jeff King 4543926ba8 init: reset cached config when entering new repo
After we copy the templates into place, we re-read the
config in case we copied in a default config file. But since
git_config() is backed by a cache these days, it's possible
that the call will not actually touch the filesystem at all;
we need to tell it that something has changed behind the
scenes.

Note that we also need to reset the shared_repository
config. At first glance, it seems like this should probably
just be folded into git_config_clear(). But unfortunately
that is not quite right. The shared repository value may
come from config, _or_ it may have been set manually. So
only the caller who knows whether or not they set it is the
one who can clear it (and indeed, if you _do_ put it into
git_config_clear(), then many tests fail, as we have to
clear the config cache any time we set a new config
variable).

There are three tests here. The first two actually pass
already, though it's largely luck: they just don't happen to
actually read any config before we enter the new repo.

But the third one does fail without this patch; we look at
core.sharedrepository while creating the directory, but need
to make sure the value from the template config overrides
it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King b9605bc4f2 config: only read .git/config from configured repos
When git_config() runs, it looks in the system, user-wide,
and repo-level config files. It gets the latter by calling
git_pathdup(), which in turn calls get_git_dir(). If we
haven't set up the git repository yet, this may simply
return ".git", and we will look at ".git/config".  This
seems like it would be helpful (presumably we haven't set up
the repository yet, so it tries to find it), but it turns
out to be a bad idea for a few reasons:

  - it's not sufficient, and therefore hides bugs in a
    confusing way. Config will be respected if commands are
    run from the top-level of the working tree, but not from
    a subdirectory.

  - it's not always true that we haven't set up the
    repository _yet_; we may not want to do it at all. For
    instance, if you run "git init /some/path" from inside
    another repository, it should not load config from the
    existing repository.

  - there might be a path ".git/config", but it is not the
    actual repository we would find via setup_git_directory().
    This may happen, e.g., if you are storing a git
    repository inside another git repository, but have
    munged one of the files in such a way that the
    inner repository is not valid (e.g., by removing HEAD).

We have at least two bugs of the second type in git-init,
introduced by ae5f677 (lazily load core.sharedrepository,
2016-03-11). It causes init to use git_configset(), which
loads all of the config, including values from the current
repo (if any).  This shows up in two ways:

  1. If we happen to be in an existing repository directory,
     we'll read and respect core.sharedrepository from it,
     even though it should have no bearing on the new
     repository. A new test in t1301 covers this.

  2. Similarly, if we're in an existing repo that sets
     core.logallrefupdates, that will cause init to fail to
     set it in a newly created repository (because it thinks
     that the user's templates already did so). A new test
     in t0001 covers this.

We also need to adjust an existing test in t1302, which
gives another example of why this patch is an improvement.

That test creates an embedded repository with a bogus
core.repositoryformatversion of "99". It wants to make sure
that we actually stop at the bogus repo rather than
continuing upward to find the outer repo. So it checks that
"git config core.repositoryformatversion" returns 99. But
that only works because we blindly read ".git/config", even
though we _know_ we're in a repository whose vintage we do
not understand.

After this patch, we avoid reading config from the unknown
vintage repository at all, which is a safer choice.  But we
need to tweak the test, since core.repositoryformatversion
will not return 99; it will claim that it could not find the
variable at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King c0c08897c4 pager: make pager_program a file-local static
This variable is only ever used by the routines in pager.c,
and other parts of the code should always use those routines
(like git_pager()) to make decisions about which pager to
use. Let's reduce its scope to prevent accidents.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Junio C Hamano e4ec05ed93 Merge branch 'rs/hex2chr'
* rs/hex2chr:
  introduce hex2chr() for converting two hexadecimal digits to a character
2016-09-12 15:34:36 -07:00
Junio C Hamano 305d7f1339 Merge branch 'jk/diff-submodule-diff-inline'
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.

* jk/diff-submodule-diff-inline:
  diff: teach diff to display submodule difference with an inline diff
  submodule: refactor show_submodule_summary with helper function
  submodule: convert show_submodule_summary to use struct object_id *
  allow do_submodule_path to work even if submodule isn't checked out
  diff: prepare for additional submodule formats
  graph: add support for --line-prefix on all graph-aware output
  diff.c: remove output_prefix_length field
  cache: add empty_tree_oid object and helper function
2016-09-12 15:34:31 -07:00
Junio C Hamano 02c6c14d6c Merge branch 'sb/submodule-clone-rr'
"git clone --resurse-submodules --reference $path $URL" is a way to
reduce network transfer cost by borrowing objects in an existing
$path repository when cloning the superproject from $URL; it
learned to also peek into $path for presense of corresponding
repositories of submodules and borrow objects from there when able.

* sb/submodule-clone-rr:
  clone: recursive and reference option triggers submodule alternates
  clone: implement optional references
  clone: clarify option_reference as required
  clone: factor out checking for an alternate path
  submodule--helper update-clone: allow multiple references
  submodule--helper module-clone: allow multiple references
  t7408: merge short tests, factor out testing method
  t7408: modernize style
2016-09-08 21:49:50 -07:00
Junio C Hamano f59c6e6ccb Merge branch 'jk/reflog-date' into maint
The reflog output format is documented better, and a new format
--date=unix to report the seconds-since-epoch (without timezone)
has been added.

* jk/reflog-date:
  date: clarify --date=raw description
  date: add "unix" format
  date: document and test "raw-local" mode
  doc/pretty-formats: explain shortening of %gd
  doc/pretty-formats: describe index/time formats for %gd
  doc/rev-list-options: explain "-g" output formats
  doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
2016-09-08 21:35:52 -07:00
Junio C Hamano 7f5885ad2a Merge branch 'jc/renormalize-merge-kill-safer-crlf' into maint
"git merge" with renormalization did not work well with
merge-recursive, due to "safer crlf" conversion kicking in when it
shouldn't.

* jc/renormalize-merge-kill-safer-crlf:
  merge: avoid "safer crlf" during recording of merge results
  convert: unify the "auto" handling of CRLF
2016-09-08 21:35:52 -07:00
brian m. carlson 151b2911c1 sha1_name: convert get_sha1_mb to struct object_id
All of the callers of this function use struct object_id, so rename it
to get_oid_mb and make it take struct object_id instead of
unsigned char *.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson 99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
René Scharfe d23309733a introduce hex2chr() for converting two hexadecimal digits to a character
Add and use a helper function that decodes the char value of two
hexadecimal digits.  It returns a negative number on error, avoids
running over the end of the given string and doesn't shift negative
values.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 10:42:46 -07:00
Jacob Keller 99b43a61f2 allow do_submodule_path to work even if submodule isn't checked out
Currently, do_submodule_path will attempt locating the .git directory by
using read_gitfile on <path>/.git. If this fails it just assumes the
<path>/.git is actually a git directory.

This is good because it allows for handling submodules which were cloned
in a regular manner first before being added to the superproject.

Unfortunately this fails if the <path> is not actually checked out any
longer, such as by removing the directory.

Fix this by checking if the directory we found is actually a gitdir. In
the case it is not, attempt to lookup the submodule configuration and
find the name of where it is stored in the .git/modules/ directory of
the superproject.

If we can't locate the submodule configuration, this might occur because
for example a submodule gitlink was added but the corresponding
.gitmodules file was not properly updated.  A die() here would not be
pleasant to the users of submodule diff formats, so instead, modify
do_submodule_path() to return an error code:

 - git_pathdup_submodule() returns NULL when we fail to find a path.
 - strbuf_git_path_submodule() propagates the error code to the caller.

Modify the callers of these functions to check the error code and fail
properly. This ensures we don't attempt to use a bad path that doesn't
match the corresponding submodule.

Because this change fixes add_submodule_odb() to work even if the
submodule is not checked out, update the wording of the submodule log
diff format to correctly display that the submodule is "not initialized"
instead of "not checked out"

Add tests to ensure this change works as expected.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:10 -07:00
Jacob Keller 8576fde6cb cache: add empty_tree_oid object and helper function
Similar to is_null_oid(), and is_empty_blob_sha1() add an
empty_tree_oid along with helper function is_empty_tree_oid(). For
completeness, also add an "is_empty_tree_sha1()",
"is_empty_blob_sha1()", "is_empty_tree_oid()" and "is_empty_blob_oid()"
helpers.

To ensure we only get one singleton, implement EMPTY_BLOB_SHA1_BIN as
simply getting the hash of empty_blob_oid structure.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:08 -07:00
Stefan Beller 9eeea7d2bc clone: factor out checking for an alternate path
In a later patch we want to determine if a path is suitable as an
alternate from other commands than builtin/clone. Move the checking
functionality of `add_one_reference` to `compute_alternate_path` that is
defined in cache.h.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:28:01 -07:00
Junio C Hamano 6d4960ac7d Merge branch 'jk/trace-fixup'
Various small fixups to the "GIT_TRACE" facility.

* jk/trace-fixup:
  trace: do not fall back to stderr
  write_or_die: drop write_or_whine_pipe()
  trace: disable key after write error
  trace: correct variable name in write() error message
  trace: cosmetic fixes for error messages
  trace: use warning() for printing trace errors
  trace: stop using write_or_whine_pipe()
  trace: handle NULL argument in trace_disable()
2016-08-12 09:47:36 -07:00
Junio C Hamano f4fd627661 Merge branch 'jk/reset-ident-time-per-commit' into maint
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.

* jk/reset-ident-time-per-commit:
  am: reset cached ident date for each patch
2016-08-12 09:16:56 -07:00
Jeff King ca79c98572 sha1_file: make packed_object_info public
Some code may have a pack/offset pair for an object, but
would like to look up more information. Using
sha1_object_info() is too heavy-weight; it starts from the
sha1 and has to find the pack again (so not only does it waste
time, it might not even find the same instance).

In some cases, this problem is solved by helpers like
get_size_from_delta(), which is used by pack-objects to take
a shortcut for objects whose packed representation has
already been found. But there's no similar function for
getting the object type, for instance. Rather than introduce
one, let's just make the whole packed_object_info() available.
It is smart enough to spend effort only on the items the
caller wants.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 10:43:24 -07:00