Commit graph

820 commits

Author SHA1 Message Date
Junio C Hamano d7f8a37852 Merge branch 'jk/loose-object-fsck'
Code cleanup.

* jk/loose-object-fsck:
  sha1_file: remove an used fd variable
2017-04-23 22:07:50 -07:00
Junio C Hamano eb3af74e93 Merge branch 'jk/no-looking-at-dotgit-outside-repo'
Clean up fallouts from recent tightening of the set-up sequence,
where Git barfs when repository information is accessed without
first ensuring that it was started in a repository.

* jk/no-looking-at-dotgit-outside-repo:
  test-read-cache: setup git dir
  has_sha1_file: don't bother if we are not in a repository
2017-04-19 21:37:20 -07:00
Junio C Hamano b1081e4004 Merge branch 'bc/object-id'
Conversion from unsigned char [40] to struct object_id continues.

* bc/object-id:
  Documentation: update and rename api-sha1-array.txt
  Rename sha1_array to oid_array
  Convert sha1_array_for_each_unique and for_each_abbrev to object_id
  Convert sha1_array_lookup to take struct object_id
  Convert remaining callers of sha1_array_lookup to object_id
  Make sha1_array_append take a struct object_id *
  sha1-array: convert internal storage for struct sha1_array to object_id
  builtin/pull: convert to struct object_id
  submodule: convert check_for_new_submodule_commits to object_id
  sha1_name: convert disambiguate_hint_fn to take object_id
  sha1_name: convert struct disambiguate_state to object_id
  test-sha1-array: convert most code to struct object_id
  parse-options-cb: convert sha1_array_append caller to struct object_id
  fsck: convert init_skiplist to struct object_id
  builtin/receive-pack: convert portions to struct object_id
  builtin/pull: convert portions to struct object_id
  builtin/diff: convert to struct object_id
  Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
  Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
  Define new hash-size constants for allocating memory
2017-04-19 21:37:13 -07:00
Junio C Hamano dfe46c5ce6 Merge branch 'jk/loose-object-info-report-error'
Update error handling for codepath that deals with corrupt loose
objects.

* jk/loose-object-info-report-error:
  index-pack: detect local corruption in collision check
  sha1_loose_object_info: return error for corrupted objects
2017-04-16 23:29:30 -07:00
Sebastian Schuberth 0747fb49fd sha1_file: remove an used fd variable
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:29:18 -07:00
Jonathan Nieder 3e8b7d3c77 has_sha1_file: don't bother if we are not in a repository
Most callers to this function already require that they are in a
git repository, but there is an exception: "git apply" uses
has_sha1_file to avoid work if the result of applying a binary
patch is already present in the repository. When run outside any
repository, this produces an error:

 fatal: BUG: setup_git_env called without repository

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-13 18:33:11 -07:00
Jeff King 93cff9a978 sha1_loose_object_info: return error for corrupted objects
When sha1_loose_object_info() finds that a loose object file
cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an
error to the caller.  However, if it found that the loose
object file is corrupt and the object data cannot be used
from it, it stuffs OBJ_BAD into "type" field of the
object_info, but returns zero (i.e., success), which can
confuse callers.

This is due to 052fe5eac (sha1_loose_object_info: make type
lookup optional, 2013-07-12), which switched the return to a
strict success/error, rather than returning the type (but
botched the return).

Callers of regular sha1_object_info() don't notice the
difference, as that function returns the type (which is
OBJ_BAD in this case). However, direct callers of
sha1_object_info_extended() see the function return success,
but without setting any meaningful values in the object_info
struct, leading them to access potentially uninitialized
memory.

The easiest way to see the bug is via "cat-file -s", which
will happily ignore the corruption and report whatever
value happened to be in the "size" variable.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-01 10:45:16 -07:00
Jeff King 1a168e5c86 convert unchecked snprintf into xsnprintf
These calls to snprintf should always succeed, because their
input is small and fixed. Let's use xsnprintf to make sure
this is the case (and to make auditing for actual truncation
easier).

These could be candidates for turning into heap buffers, but
they fall into a few broad categories that make it not worth
doing:

  - formatting single numbers is simple enough that we can
    see the result should fit

  - the size of a sha1 is likewise well-known, and I didn't
    want to cause unnecessary conflicts with the ongoing
    process to convert these constants to GIT_MAX_HEXSZ

  - the interface for curl_errorstr is dictated by curl

Signed-off-by: Jeff King <peff@peff.net>
2017-03-30 14:59:50 -07:00
Junio C Hamano ba5e05ffef Merge branch 'jk/pack-name-cleanups' into maint
Code clean-up.

* jk/pack-name-cleanups:
  index-pack: make pointer-alias fallbacks safer
  replace snprintf with odb_pack_name()
  odb_pack_keep(): stop generating keepfile name
  sha1_file.c: make pack-name helper globally accessible
  move odb_* declarations out of git-compat-util.h
2017-03-28 13:52:25 -07:00
brian m. carlson cd02599c48 Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
Since we will likely be introducing a new hash function at some point,
and that hash function might be longer than 20 bytes, use the constant
GIT_MAX_RAWSZ, which is designed to be suitable for allocations, instead
of GIT_SHA1_RAWSZ.  This will ease the transition down the line by
distinguishing between places where we need to allocate memory suitable
for the largest hash from those where we need to handle the current
hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26 22:08:21 -07:00
brian m. carlson dc01505f7f Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
Since we will likely be introducing a new hash function at some point,
and that hash function might be longer than 40 hex characters, use the
constant GIT_MAX_HEXSZ, which is designed to be suitable for
allocations, instead of GIT_SHA1_HEXSZ.  This will ease the transition
down the line by distinguishing between places where we need to allocate
memory suitable for the largest hash from those where we need to handle
the current hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26 22:08:21 -07:00
Junio C Hamano 45cbc37c5f Merge branch 'jk/pack-name-cleanups'
Code clean-up.

* jk/pack-name-cleanups:
  index-pack: make pointer-alias fallbacks safer
  replace snprintf with odb_pack_name()
  odb_pack_keep(): stop generating keepfile name
  sha1_file.c: make pack-name helper globally accessible
  move odb_* declarations out of git-compat-util.h
2017-03-21 15:07:17 -07:00
Junio C Hamano e36e28e697 Merge branch 'rs/sha1-file-plug-fallback-base-leak' into maint
A leak in a codepath to read from a packed object in (rare) cases
has been plugged.

* rs/sha1-file-plug-fallback-base-leak:
  sha1_file: release fallback base's memory in unpack_entry()
2017-03-21 15:03:27 -07:00
Junio C Hamano e1fae93019 Merge branch 'bc/object-id'
"uchar [40]" to "struct object_id" conversion continues.

* bc/object-id:
  wt-status: convert to struct object_id
  builtin/merge-base: convert to struct object_id
  Convert object iteration callbacks to struct object_id
  sha1_file: introduce an nth_packed_object_oid function
  refs: simplify parsing of reflog entries
  refs: convert each_reflog_ent_fn to struct object_id
  reflog-walk: convert struct reflog_info to struct object_id
  builtin/replace: convert to struct object_id
  Convert remaining callers of resolve_refdup to object_id
  builtin/merge: convert to struct object_id
  builtin/clone: convert to struct object_id
  builtin/branch: convert to struct object_id
  builtin/grep: convert to struct object_id
  builtin/fmt-merge-message: convert to struct object_id
  builtin/fast-export: convert to struct object_id
  builtin/describe: convert to struct object_id
  builtin/diff-tree: convert to struct object_id
  builtin/commit: convert to struct object_id
  hex: introduce parse_oid_hex
2017-03-17 13:50:25 -07:00
Junio C Hamano 94c9b5af70 Merge branch 'cc/split-index-config'
The experimental "split index" feature has gained a few
configuration variables to make it easier to use.

* cc/split-index-config: (22 commits)
  Documentation/git-update-index: explain splitIndex.*
  Documentation/config: add splitIndex.sharedIndexExpire
  read-cache: use freshen_shared_index() in read_index_from()
  read-cache: refactor read_index_from()
  t1700: test shared index file expiration
  read-cache: unlink old sharedindex files
  config: add git_config_get_expiry() from gc.c
  read-cache: touch shared index files when used
  sha1_file: make check_and_freshen_file() non static
  Documentation/config: add splitIndex.maxPercentChange
  t1700: add tests for splitIndex.maxPercentChange
  read-cache: regenerate shared index if necessary
  config: add git_config_get_max_percent_split_change()
  Documentation/git-update-index: talk about core.splitIndex config var
  Documentation/config: add information for core.splitIndex
  t1700: add tests for core.splitIndex
  update-index: warn in case of split-index incoherency
  read-cache: add and then use tweak_split_index()
  split-index: add {add,remove}_split_index() functions
  config: add git_config_get_split_index()
  ...
2017-03-17 13:50:23 -07:00
Jeff King 1cec8c634f sha1_file.c: make pack-name helper globally accessible
We provide sha1_pack_name() and sha1_pack_index_name(), but
the more generic form (which takes its own strbuf and an
arbitrary extension) is only used to implement the other
two.  Let's make it available, but clean up a few things:

  1. Name it odb_pack_name(), as the original
     sha1_get_pack_name() is long but not all that
     descriptive.

  2. Switch the strbuf argument to the beginning, so that it
     matches similar path-building functions like
     git_path_buf().

  3. Clean up the out-dated docstring and move it to the
     public declaration.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:05:17 -07:00
Junio C Hamano 82682e218a Merge branch 'rs/sha1-file-plug-fallback-base-leak'
A leak in a codepath to read from a packed object in (rare) cases
has been plugged.

* rs/sha1-file-plug-fallback-base-leak:
  sha1_file: release fallback base's memory in unpack_entry()
2017-03-10 13:24:23 -08:00
Christian Couder 6a5e6f5e44 sha1_file: make check_and_freshen_file() non static
This function will be used in a commit soon, so let's make
it available globally.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:24:22 -08:00
Junio C Hamano 3ad8b5bf26 Merge branch 'mh/ref-remove-empty-directory'
Deletion of a branch "foo/bar" could remove .git/refs/heads/foo
once there no longer is any other branch whose name begins with
"foo/", but we didn't do so so far.  Now we do.

* mh/ref-remove-empty-directory: (23 commits)
  files_transaction_commit(): clean up empty directories
  try_remove_empty_parents(): teach to remove parents of reflogs, too
  try_remove_empty_parents(): don't trash argument contents
  try_remove_empty_parents(): rename parameter "name" -> "refname"
  delete_ref_loose(): inline function
  delete_ref_loose(): derive loose reference path from lock
  log_ref_write_1(): inline function
  log_ref_setup(): manage the name of the reflog file internally
  log_ref_write_1(): don't depend on logfile argument
  log_ref_setup(): pass the open file descriptor back to the caller
  log_ref_setup(): improve robustness against races
  log_ref_setup(): separate code for create vs non-create
  log_ref_write(): inline function
  rename_tmp_log(): improve error reporting
  rename_tmp_log(): use raceproof_create_file()
  lock_ref_sha1_basic(): use raceproof_create_file()
  lock_ref_sha1_basic(): inline constant
  raceproof_create_file(): new function
  safe_create_leading_directories(): set errno on SCLD_EXISTS
  safe_create_leading_directories_const(): preserve errno
  ...
2017-02-27 13:57:12 -08:00
René Scharfe 886ddf4777 sha1_file: release fallback base's memory in unpack_entry()
If a pack entry that's used as a delta base is corrupt, unpack_entry()
marks it as unusable and then searches the object again in the hope that
it can be found in another pack or in a loose file.  The memory for this
external base object is never released.  Free it after use.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-27 11:00:30 -08:00
brian m. carlson 76c1d9a096 Convert object iteration callbacks to struct object_id
Convert each_loose_object_fn and each_packed_object_fn to take a pointer
to struct object_id.  Update the various callbacks.  Convert several
40-based constants to use GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-22 10:12:15 -08:00
brian m. carlson 068f85e313 sha1_file: introduce an nth_packed_object_oid function
There are places in the code where we would like to provide a struct
object_id *, yet read the hash directly from the pack.  Provide an
nth_packed_object_oid function that is similar to the
nth_packed_object_sha1 function.

In order to avoid a potentially invalid cast, nth_packed_object_oid
provides a variable into which to store the value, which it returns on
success; on error, it returns NULL, as nth_packed_object_sha1 does.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-22 10:12:15 -08:00
Junio C Hamano c54ba283fa Merge branch 'jk/clear-delta-base-cache-fix'
A crashing bug introduced in v2.11 timeframe has been found (it is
triggerable only in fast-import) and fixed.

* jk/clear-delta-base-cache-fix:
  clear_delta_base_cache(): don't modify hashmap while iterating
2017-01-31 13:14:59 -08:00
Junio C Hamano 42ace93e41 Merge branch 'jk/loose-object-fsck'
"git fsck" inspects loose objects more carefully now.

* jk/loose-object-fsck:
  fsck: detect trailing garbage in all object types
  fsck: parse loose object paths directly
  sha1_file: add read_loose_object() function
  t1450: test fsck of packed objects
  sha1_file: fix error message for alternate objects
  t1450: refactor loose-object removal
2017-01-31 13:14:57 -08:00
Jeff King abd5a00268 clear_delta_base_cache(): don't modify hashmap while iterating
On Thu, Jan 19, 2017 at 03:03:46PM +0100, Ulrich Spörlein wrote:

> > I suspect the patch below may fix things for you. It works around it by
> > walking over the lru list (either is fine, as they both contain all
> > entries, and since we're clearing everything, we don't care about the
> > order).
>
> Confirmed. With the patch applied, I can import the whole 55G in one go
> without any crashes or aborts. Thanks much!

Thanks. Here it is rolled up with a commit message.

-- >8 --
Subject: clear_delta_base_cache(): don't modify hashmap while iterating

Removing entries while iterating causes fast-import to
access an already-freed `struct packed_git`, leading to
various confusing errors.

What happens is that clear_delta_base_cache() drops the
whole contents of the cache by iterating over the hashmap,
calling release_delta_base_cache() on each entry. That
function removes the item from the hashmap. The hashmap code
may then shrink the table, but the hashmap_iter struct
retains an offset from the old table.

As a result, the next call to hashmap_iter_next() may claim
that the iteration is done, even though some items haven't
been visited.

The only caller of clear_delta_base_cache() is fast-import,
which wants to clear the cache because it is discarding the
packed_git struct for its temporary pack. So by failing to
remove all of the entries, we still have references to the
freed packed_git.

To make things even more confusing, this doesn't seem to
trigger with the test suite, because it depends on
complexities like the size of the hash table, which entries
got cleared, whether we try to access them before they're
evicted from the cache, etc.

So I've been able to identify the problem with large
imports like freebsd's svn import, or a fast-export of
linux.git. But nothing that would be reasonable to run as
part of the normal test suite.

We can fix this easily by iterating over the lru linked list
instead of the hashmap. They both contain the same entries,
and we can use the "safe" variant of the list iterator,
which exists for exactly this case.

Let's also add a warning to the hashmap API documentation to
reduce the chances of getting bit by this again.

Reported-by: Ulrich Spörlein <uqs@freebsd.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-19 11:17:20 -08:00
Junio C Hamano 55d128ae06 Merge branch 'bw/grep-recurse-submodules'
"git grep" has been taught to optionally recurse into submodules.

* bw/grep-recurse-submodules:
  grep: search history of moved submodules
  grep: enable recurse-submodules to work on <tree> objects
  grep: optionally recurse into submodules
  grep: add submodules as a grep source type
  submodules: load gitmodules file from commit sha1
  submodules: add helper to determine if a submodule is initialized
  submodules: add helper to determine if a submodule is populated
  real_path: canonicalize directory separators in root parts
  real_path: have callers use real_pathdup and strbuf_realpath
  real_path: create real_pathdup
  real_path: convert real_path_internal to strbuf_realpath
  real_path: resolve symlinks by hand
2017-01-18 15:12:11 -08:00
Junio C Hamano bcaf277b4a Merge branch 'jk/quote-env-path-list-component' into maint
A recent update to receive-pack to make it easier to drop garbage
objects made it clear that GIT_ALTERNATE_OBJECT_DIRECTORIES cannot
have a pathname with a colon in it (no surprise!), and this in turn
made it impossible to push into a repository at such a path.  This
has been fixed by introducing a quoting mechanism used when
appending such a path to the colon-separated list.

* jk/quote-env-path-list-component:
  t5615-alternate-env: double-quotes in file names do not work on Windows
  t5547-push-quarantine: run the path separator test on Windows, too
  tmp-objdir: quote paths we add to alternates
  alternates: accept double-quoted paths
2017-01-17 15:11:06 -08:00
Jeff King cce044df7f fsck: detect trailing garbage in all object types
When a loose tree or commit is read by fsck (or any git
program), unpack_sha1_rest() checks whether there is extra
cruft at the end of the object file, after the zlib data.
Blobs that are streamed, however, do not have this check.

For normal git operations, it's not a big deal. We know the
sha1 and size checked out, so we have the object bytes we
wanted.  The trailing garbage doesn't affect what we're
trying to do.

But since the point of fsck is to find corruption or other
problems, it should be more thorough. This patch teaches its
loose-sha1 reader to detect extra bytes after the zlib
stream and complain.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-15 15:59:03 -08:00
Jeff King f6371f9210 sha1_file: add read_loose_object() function
It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.

However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.

The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.

We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().

Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-15 15:59:03 -08:00
Jeff King 771e7d578e sha1_file: fix error message for alternate objects
When we fail to open a corrupt loose object, we report an
error and mention the filename via sha1_file_name().
However, that function will always give us a path in the
local repository, whereas the corrupt object may have come
from an alternate. The result is a very misleading error
message.

Teach the open_sha1_file() and stat_sha1_file() helpers to
pass back the path they found, so that we can report it
correctly.

Note that the pointers we return go to static storage (e.g.,
from sha1_file_name()), which is slightly dangerous.
However, these helpers are static local helpers, and the
names are used for immediately generating error messages.
The simplicity is an acceptable tradeoff for the danger.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-15 15:59:03 -08:00
Junio C Hamano 02d0457eb4 Merge branch 'jc/git-open-cloexec'
The codeflow of setting NOATIME and CLOEXEC on file descriptors Git
opens has been simplified.
We may want to drop the tip one, but we'll see.

* jc/git-open-cloexec:
  sha1_file: stop opening files with O_NOATIME
  git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback
  git_open(): untangle possible NOATIME and CLOEXEC interactions
2017-01-10 15:24:26 -08:00
Michael Haggerty 177978f56a raceproof_create_file(): new function
Add a function that tries to create a file and any containing
directories in a way that is robust against races with other processes
that might be cleaning up empty directories at the same time.

The actual file creation is done by a callback function, which, if it
fails, should set errno to EISDIR or ENOENT according to the convention
of open(). raceproof_create_file() detects such failures, and
respectively either tries to delete empty directories that might be in
the way of the file or tries to create the containing directories. Then
it retries the callback function.

This function is not yet used.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-07 19:30:09 -08:00
Michael Haggerty 204a047f23 safe_create_leading_directories(): set errno on SCLD_EXISTS
The exit path for SCLD_EXISTS wasn't setting errno, which some callers
use to generate error messages for the user. Fix the problem and
document that the function sets errno correctly to help avoid similar
regressions in the future.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-07 19:30:08 -08:00
Michael Haggerty 029443070a safe_create_leading_directories_const(): preserve errno
Some implementations of free() change errno (even thought they
shouldn't):

  https://sourceware.org/bugzilla/show_bug.cgi?id=17924

So preserve the errno from safe_create_leading_directories() across the
call to free().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-07 19:30:08 -08:00
Junio C Hamano fe05033407 Merge branch 'jk/quote-env-path-list-component'
A recent update to receive-pack to make it easier to drop garbage
objects made it clear that GIT_ALTERNATE_OBJECT_DIRECTORIES cannot
have a pathname with a colon in it (no surprise!), and this in turn
made it impossible to push into a repository at such a path.  This
has been fixed by introducing a quoting mechanism used when
appending such a path to the colon-separated list.

* jk/quote-env-path-list-component:
  t5615-alternate-env: double-quotes in file names do not work on Windows
  t5547-push-quarantine: run the path separator test on Windows, too
  tmp-objdir: quote paths we add to alternates
  alternates: accept double-quoted paths
2016-12-21 14:55:02 -08:00
Brandon Williams 4ac9006f83 real_path: have callers use real_pathdup and strbuf_realpath
Migrate callers of real_path() who duplicate the retern value to use
real_pathdup or strbuf_realpath.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-12 15:22:32 -08:00
Jeff King cf3c635210 alternates: accept double-quoted paths
We read lists of alternates from objects/info/alternates
files (delimited by newline), as well as from the
GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable
(delimited by colon or semi-colon, depending on the
platform).

There's no mechanism for quoting the delimiters, so it's
impossible to specify an alternate path that contains a
colon in the environment, or one that contains a newline in
a file. We've lived with that restriction for ages because
both alternates and filenames with colons are relatively
rare, and it's only a problem when the two meet. But since
722ff7f87 (receive-pack: quarantine objects until
pre-receive accepts, 2016-10-03), which builds on the
alternates system, every push causes the receiver to set
GIT_ALTERNATE_OBJECT_DIRECTORIES internally.

It would be convenient to have some way to quote the
delimiter so that we can represent arbitrary paths.

The simplest thing would be an escape character before a
quoted delimiter (e.g., "\:" as a literal colon). But that
creates a backwards compatibility problem: any path which
uses that escape character is now broken, and we've just
shifted the problem. We could choose an unlikely escape
character (e.g., something from the non-printable ASCII
range), but that's awkward to use.

Instead, let's treat names as unquoted unless they begin
with a double-quote, in which case they are interpreted via
our usual C-stylke quoting rules. This also breaks
backwards-compatibility, but in a smaller way: it only
matters if your file has a double-quote as the very _first_
character in the path (whereas an escape character is a
problem anywhere in the path).  It's also consistent with
many other parts of git, which accept either a bare pathname
or a double-quoted one, and the sender can choose to quote
or not as required.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-12 15:10:43 -08:00
Junio C Hamano 0538b84027 Merge branch 'jk/alt-odb-cleanup'
Fix a corner-case regression in a topic that graduated during the
v2.11 cycle.

* jk/alt-odb-cleanup:
  alternates: re-allow relative paths from environment
2016-11-10 13:17:30 -08:00
Jeff King 37a95862c6 alternates: re-allow relative paths from environment
Commit 670c359da (link_alt_odb_entry: handle normalize_path
errors, 2016-10-03) regressed the handling of relative paths
in the GIT_ALTERNATE_OBJECT_DIRECTORIES variable. It's not
entirely clear this was ever meant to work, but it _has_
worked for several years, so this commit restores the
original behavior.

When we get a path in GIT_ALTERNATE_OBJECT_DIRECTORIES, we
add it the path to the list of alternate object directories
as if it were found in objects/info/alternates, but with one
difference: we do not provide the link_alt_odb_entry()
function with a base for relative paths. That function
doesn't turn it into an absolute path, and we end up feeding
the relative path to the strbuf_normalize_path() function.

Most relative paths break out of the top-level directory
(e.g., "../foo.git/objects"), and thus normalizing fails.
Prior to 670c359da, we simply ignored the error, and due to
the way normalize_path_copy() was implemented it happened to
return the original path in this case. We then accessed the
alternate objects using this relative path.

By storing the relative path in the alt_odb list, the path
is relative to wherever we happen to be at the time we do an
object lookup. That means we look from $GIT_DIR in a bare
repository, and from the top of the worktree in a non-bare
repository.

If this were being designed from scratch, it would make
sense to pick a stable location (probably $GIT_DIR, or even
the object directory) and use that as the relative base,
turning the result into an absolute path.  However, given
the history, at this point the minimal fix is to match the
pre-670c359da behavior.

We can do this simply by ignoring the error when we have no
relative base and using the original value (which we now
reliably have, thanks to strbuf_normalize_path()).

That still leaves us with a relative path that foils our
duplicate detection, and may act strangely if we ever
chdir() later in the process. We could solve that by storing
an absolute path based on getcwd(). That may be a good
future direction; for now we'll do just the minimum to fix
the regression.

The new t5615 script demonstrates the fix in its final three
tests. Since we didn't have any tests of the alternates
environment variable at all, it also adds some tests of
absolute paths.

Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
2016-11-08 15:28:22 -05:00
Junio C Hamano b4d065df03 sha1_file: stop opening files with O_NOATIME
When we open object files, we try to do so with O_NOATIME.
This dates back to 144bde78e9 (Use O_NOATIME when opening
the sha1 files., 2005-04-23), which is an optimization to
avoid creating a bunch of dirty inodes when we're accessing
many objects.  But a few things have changed since then:

  1. In June 2005, git learned about packfiles, which means
     we would do a lot fewer atime updates (rather than one
     per object access, we'd generally get one per packfile).

  2. In late 2006, Linux learned about "relatime", which is
     generally the default on modern installs. So
     performance around atimes updates is a non-issue there
     these days.

     All the world isn't Linux, but as it turns out, Linux
     is the only platform to implement O_NOATIME in the
     first place.

So it's very unlikely that this code is helping anybody
these days.

Helped-by: Jeff King <peff@peff.net>
[jc: took idea and log message from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-02 19:34:41 -07:00
Junio C Hamano 1e3001a8e2 git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback
A platform might not support open(2) with O_CLOEXEC but may support
telling the same with fcntl(2) to flip FD_CLOEXEC bit on on an open
file descriptor.  It is a fallback that is inherently racy and this
may not be worth doing, though.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-02 19:34:16 -07:00
Junio C Hamano 906d6906fb Merge branch 'ls/git-open-cloexec'
Git generally does not explicitly close file descriptors that were
open in the parent process when spawning a child process, but most
of the time the child does not want to access them. As Windows does
not allow removing or renaming a file that has a file descriptor
open, a slow-to-exit child can even break the parent process by
holding onto them.  Use O_CLOEXEC flag to open files in various
codepaths.

* ls/git-open-cloexec:
  read-cache: make sure file handles are not inherited by child processes
  sha1_file: open window into packfiles with O_CLOEXEC
  sha1_file: rename git_open_noatime() to git_open()
2016-10-31 13:15:21 -07:00
Junio C Hamano 39000e8499 Merge branch 'jk/fetch-quick-tag-following' into maint
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.

* jk/fetch-quick-tag-following:
  fetch: use "quick" has_sha1_file for tag following
2016-10-28 09:01:17 -07:00
Junio C Hamano 1b8ac5ead5 git_open(): untangle possible NOATIME and CLOEXEC interactions
The way we structured the fallback/retry mechanism for opening with
O_NOATIME and O_CLOEXEC meant that if we failed due to lack of
support to open the file with O_NOATIME option (i.e. EINVAL), we
would still try to drop O_CLOEXEC first and retry, and then drop
O_NOATIME.  A platform on which O_NOATIME is defined in the header
without support from the kernel wouldn't have a chance to open with
O_CLOEXEC option due to this code structure.

Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter
is mostly about performance, while the former can affect correctness.

Instead use O_CLOEXEC to open the file, and then use fcntl(2) to set
O_NOATIME on the resulting file descriptor.  open(2) itself does not
cause atime to be updated according to Linus [*1*].

The helper to do the former can be usable in the codepath in
ce_compare_data() that was recently added to open a file descriptor
with O_CLOEXEC; use it while we are at it.

*1* <CA+55aFw83E+zOd+z5h-CA-3NhrLjVr-anL6pubrSWttYx3zu8g@mail.gmail.com>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-28 06:23:07 -07:00
Junio C Hamano d7ae013a31 Merge branch 'jk/abbrev-auto'
Updates the way approximate count of total objects is computed
while attempting to come up with a unique abbreviated object name,
which in turn needs to estimate how many hexdigits are necessary to
ensure uniqueness.

* jk/abbrev-auto:
  find_unique_abbrev: move logic out of get_short_sha1()
2016-10-27 14:58:47 -07:00
Junio C Hamano 9fcd14491d Merge branch 'jk/fetch-quick-tag-following'
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.

* jk/fetch-quick-tag-following:
  fetch: use "quick" has_sha1_file for tag following
2016-10-26 13:14:47 -07:00
Lars Schneider cd66ada065 sha1_file: open window into packfiles with O_CLOEXEC
All processes that the Git main process spawns inherit the open file
descriptors of the main process. These leaked file descriptors can
cause problems.

Use the O_CLOEXEC flag similar to 05d1ed61 to fix the leaked file
descriptors.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-25 11:09:54 -07:00
Lars Schneider a5436b5794 sha1_file: rename git_open_noatime() to git_open()
This function is meant to be used when reading from files in the
object store, and the original objective was to avoid smudging atime
of loose object files too often, hence its name.  Because we'll be
extending its role in the next commit to also arrange the file
descriptors they return auto-closed in the child processes, rename
it to lose "noatime" part that is too specific.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-25 10:59:13 -07:00
Junio C Hamano dec040192f Merge branch 'jk/alt-odb-cleanup'
Codepaths involved in interacting alternate object store have
been cleaned up.

* jk/alt-odb-cleanup:
  alternates: use fspathcmp to detect duplicates
  sha1_file: always allow relative paths to alternates
  count-objects: report alternates via verbose mode
  fill_sha1_file: write into a strbuf
  alternates: store scratch buffer as strbuf
  fill_sha1_file: write "boring" characters
  alternates: use a separate scratch space
  alternates: encapsulate alt->base munging
  alternates: provide helper for allocating alternate
  alternates: provide helper for adding to alternates list
  link_alt_odb_entry: refactor string handling
  link_alt_odb_entry: handle normalize_path errors
  t5613: clarify "too deep" recursion tests
  t5613: do not chdir in main process
  t5613: whitespace/style cleanups
  t5613: use test_must_fail
  t5613: drop test_valid_repo function
  t5613: drop reachable_via function
2016-10-17 13:25:20 -07:00
Jeff King 5827a03545 fetch: use "quick" has_sha1_file for tag following
When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.

This has gone unnoticed for several years because it
requires a fairly unique setup to matter:

  1. You need to have a lot of packs on the client side to
     make reprepare_packed_git() expensive (the most
     expensive part is finding duplicates in an unsorted
     list, which is currently quadratic).

  2. You need a large number of tag refs on the server side
     that are candidates for auto-following (i.e., that the
     client doesn't have). Each one triggers a re-read of
     the pack directory.

  3. Under normal circumstances, the client would
     auto-follow those tags and after one large fetch, (2)
     would no longer be true. But if those tags point to
     history which is disconnected from what the client
     otherwise fetches, then it will never auto-follow, and
     those candidates will impact it on every fetch.

So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).

This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.

Here are results from the included perf script, which sets
up a situation similar to the one described above:

Test            HEAD^               HEAD
----------------------------------------------------------
5550.4: fetch   11.21(10.42+0.78)   0.08(0.04+0.02) -99.3%

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-14 11:31:32 -07:00