Commit graph

5183 commits

Author SHA1 Message Date
Jeff King c3271a0e47 fsck: do not fallback "git fsck <bogus>" to "git fsck"
Since fsck tries to continue as much as it can after seeing
an error, we still do the reachability check even if some
heads we were given on the command-line are bogus. But if
_none_ of the heads is is valid, we fallback to checking all
refs and the index, which is not what the user asked for at
all.

Instead of checking "heads", the number of successful heads
we got, check "argc" (which we know only has non-options in
it, because parse_options removed the others).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 14:24:33 -08:00
Jeff King c6c7b16d23 fsck: tighten error-checks of "git fsck <head>"
Instead of checking reachability from the refs, you can ask
fsck to check from a particular set of heads. However, the
error checking here is quite lax. In particular:

  1. It claims lookup_object() will report an error, which
     is not true. It only does a hash lookup, and the user
     has no clue that their argument was skipped.

  2. When either the name or sha1 cannot be resolved, we
     continue to exit with a successful error code, even
     though we didn't check what the user asked us to.

This patch fixes both of these cases.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 14:24:33 -08:00
Jeff King 3e3f8bd608 fsck: prepare dummy objects for --connectivity-check
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.

Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.

However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:

  1. mark_object() will not queue objects for examination,
     so recursively following links from commits to trees,
     etc, did nothing. I.e., we were checking the
     reachability of hardly anything at all.

  2. When a set of heads is given on the command-line, we
     use lookup_object() to see if they exist. But without
     the initial pass, we assume nothing exists.

  3. When loading reflog entries, we do a similar
     lookup_object() check, and complain that the reflog is
     broken if the object doesn't exist in our hash.

So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.

One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).

Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.

And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.

One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.

Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.

That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 14:23:20 -08:00
Jeff King b4584e4f66 fsck: report trees as dangling
After checking connectivity, fsck looks through the list of
any objects we've seen mentioned, and reports unreachable
and un-"used" ones as dangling. However, it skips any object
which is not marked as "parsed", as that is an object that
we _don't_ have (but that somebody mentioned).

Since 6e454b9a3 (clear parsed flag when we free tree
buffers, 2013-06-05), that flag can't be relied on, and the
correct method is to check the HAS_OBJ flag. The cleanup in
that commit missed this callsite, though. As a result, we
would generally fail to report dangling trees.

We never noticed because there were no tests in this area
(for trees or otherwise). Let's add some.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 12:49:41 -08:00
Junio C Hamano c34a7daad7 Merge branch 'jc/setup-cleanup-fix'
"git archive" and "git mailinfo" stopped reading from local
configuration file with a recent update.

* jc/setup-cleanup-fix:
  archive: read local configuration
  mailinfo: read local configuration
2016-11-23 11:23:17 -08:00
Junio C Hamano bd53f38d52 Merge branch 'js/rebase-i-commentchar-fix'
"git rebase -i" did not work well with core.commentchar
configuration variable for two reasons, both of which have been
fixed.

* js/rebase-i-commentchar-fix:
  rebase -i: handle core.commentChar=auto
  stripspace: respect repository config
  rebase -i: highlight problems with core.commentchar
2016-11-23 11:23:17 -08:00
Junio C Hamano eb0224c617 archive: read local configuration
Since b9605bc4f2 ("config: only read .git/config from configured
repos", 2016-09-12), we do not read from ".git/config" unless we
know we are in a repository.  "git archive" however didn't do the
repository discovery and instead relied on the old behaviour.

Teach the command to run a "gentle" version of repository discovery
so that local configuration variables are honoured.

[jc: stole tests from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-22 13:55:20 -08:00
Junio C Hamano 3f0ec0687d mailinfo: read local configuration
Since b9605bc4f2 ("config: only read .git/config from configured
repos", 2016-09-12), we do not read from ".git/config" unless we
know we are in a repository.  "git mailinfo" however didn't do the
repository discovery and instead relied on the old behaviour.  This
was mostly OK because it was merely run as a helper program by other
porcelain scripts that first chdir's up to the root of the working
tree.

Teach the command to run a "gentle" version of repository discovery
so that local configuration variables like mailinfo.scissors are
honoured.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-22 13:13:16 -08:00
Johannes Schindelin 92068ae8bf stripspace: respect repository config
The way "git stripspace" reads the configuration was not quite
kosher, in that the code forgot to probe for a possibly existing
repository (note: stripspace is designed to be usable outside the
repository as well).  It read .git/config only when it was run from
the top-level of the working tree by accident.  A recent change
b9605bc4f2 ("config: only read .git/config from configured repos",
2016-09-12) stopped reading the repository-local configuration file
".git/config" unless the repository discovery process is done, so
that .git/config is never read even when run from the top-level,
exposing the old bug more.

When rebasing interactively with a commentChar defined in the
current repository's config, the help text at the bottom of the edit
script potentially used an incorrect comment character. This was not
only funny-looking, but also resulted in tons of warnings like this
one:

	Warning: the command isn't recognized in the following line
	 - #

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-21 11:00:38 -08:00
Junio C Hamano 6846e8734d Merge branch 'jk/create-branch-remove-unused-param'
Code clean-up.

* jk/create-branch-remove-unused-param:
  create_branch: drop unused "head" parameter
2016-11-17 13:45:21 -08:00
Jeff King 4bd488ea7c create_branch: drop unused "head" parameter
This function used to have the caller pass in the current
value of HEAD, in order to make sure we didn't clobber HEAD.
In 55c4a6730, that logic moved to validate_new_branchname(),
which just resolves HEAD itself. The parameter to
create_branch is now unused.

Since we have to update and re-wrap the docstring describing
the parameters anyway, let's take this opportunity to break
it out into a list, which makes it easier to find the
parameters.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-09 14:56:21 -08:00
Junio C Hamano 2f445c17e5 Merge branch 'rs/commit-pptr-simplify'
Code simplification.

* rs/commit-pptr-simplify:
  commit: simplify building parents list
2016-10-31 13:15:25 -07:00
Junio C Hamano dbaa6bdce2 Merge branch 'ls/filter-process'
The smudge/clean filter API expect an external process is spawned
to filter the contents for each path that has a filter defined.  A
new type of "process" filter API has been added to allow the first
request to run the filter for a path to spawn a single process, and
all filtering need is served by this single process for multiple
paths, reducing the process creation overhead.

* ls/filter-process:
  contrib/long-running-filter: add long running filter example
  convert: add filter.<driver>.process option
  convert: prepare filter.<driver>.process option
  convert: make apply_filter() adhere to standard Git error handling
  pkt-line: add functions to read/write flush terminated packet streams
  pkt-line: add packet_write_gently()
  pkt-line: add packet_flush_gently()
  pkt-line: add packet_write_fmt_gently()
  pkt-line: extract set_packet_header()
  pkt-line: rename packet_write() to packet_write_fmt()
  run-command: add clean_on_exit_handler
  run-command: move check_pipe() from write_or_die to run_command
  convert: modernize tests
  convert: quote filter names in error messages
2016-10-31 13:15:21 -07:00
Junio C Hamano 906d6906fb Merge branch 'ls/git-open-cloexec'
Git generally does not explicitly close file descriptors that were
open in the parent process when spawning a child process, but most
of the time the child does not want to access them. As Windows does
not allow removing or renaming a file that has a file descriptor
open, a slow-to-exit child can even break the parent process by
holding onto them.  Use O_CLOEXEC flag to open files in various
codepaths.

* ls/git-open-cloexec:
  read-cache: make sure file handles are not inherited by child processes
  sha1_file: open window into packfiles with O_CLOEXEC
  sha1_file: rename git_open_noatime() to git_open()
2016-10-31 13:15:21 -07:00
René Scharfe de9f7fa3b0 commit: simplify building parents list
Push pptr down into the FROM_MERGE branch of the if/else statement,
where it's actually used, and call commit_list_append() for appending
elements instead of playing tricks with commit_list_insert().  Call
copy_commit_list() in the amend branch instead of open-coding it.  Don't
bother setting pptr in the final branch as it's not used thereafter.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-30 16:03:25 -07:00
Junio C Hamano 650360210a Merge branch 'nd/ita-empty-commit'
When new paths were added by "git add -N" to the index, it was
enough to circumvent the check by "git commit" to refrain from
making an empty commit without "--allow-empty".  The same logic
prevented "git status" to show such a path as "new file" in the
"Changes not staged for commit" section.

* nd/ita-empty-commit:
  commit: don't be fooled by ita entries when creating initial commit
  commit: fix empty commit creation when there's no changes but ita entries
  diff: add --ita-[in]visible-in-index
  diff-lib: allow ita entries treated as "not yet exist in index"
2016-10-27 14:58:50 -07:00
Junio C Hamano cca3be6ea1 Merge branch 'js/prepare-sequencer'
Update of the sequencer codebase to make it reusable to reimplement
"rebase -i" continues.

* js/prepare-sequencer: (27 commits)
  sequencer: mark all error messages for translation
  sequencer: start error messages consistently with lower case
  sequencer: quote filenames in error messages
  sequencer: mark action_name() for translation
  sequencer: remove overzealous assumption in rebase -i mode
  sequencer: teach write_message() to append an optional LF
  sequencer: refactor write_message() to take a pointer/length
  sequencer: roll back lock file if write_message() failed
  sequencer: stop releasing the strbuf in write_message()
  sequencer: left-trim lines read from the script
  sequencer: support cleaning up commit messages
  sequencer: support amending commits
  sequencer: allow editing the commit message on a case-by-case basis
  sequencer: introduce a helper to read files written by scripts
  sequencer: prepare for rebase -i's commit functionality
  sequencer: remember the onelines when parsing the todo file
  sequencer: get rid of the subcommand field
  sequencer: avoid completely different messages for different actions
  sequencer: strip CR from the todo script
  sequencer: completely revamp the "todo" script parsing
  ...
2016-10-27 14:58:50 -07:00
Junio C Hamano 65aaa29f7d Merge branch 'sb/submodule-ignore-trailing-slash'
A minor regression fix for "git submodule".

* sb/submodule-ignore-trailing-slash:
  t0060: sidestep surprising path mangling results on Windows
  submodule: ignore trailing slash in relative url
  submodule: ignore trailing slash on superproject URL
2016-10-27 14:58:49 -07:00
Junio C Hamano 0d9c527d59 Merge branch 'jk/no-looking-at-dotgit-outside-repo'
Update "git diff --no-index" codepath not to try to peek into .git/
directory that happens to be under the current directory, when we
know we are operating outside any repository.

* jk/no-looking-at-dotgit-outside-repo:
  diff: handle sha1 abbreviations outside of repository
  diff_aligned_abbrev: use "struct oid"
  diff_unique_abbrev: rename to diff_aligned_abbrev
  find_unique_abbrev: use 4-buffer ring
  test-*-cache-tree: setup git dir
  read info/{attributes,exclude} only when in repository
2016-10-27 14:58:48 -07:00
Junio C Hamano f9db0c055c Merge branch 'jc/abbrev-auto'
"git push" and "git fetch" reports from what old object to what new
object each ref was updated, using abbreviated refnames, and they
attempt to align the columns for this and other pieces of
information.  The way these codepaths compute how many display
columns to allocate for the object names portion of this output has
been updated to match the recent "auto scale the default
abbreviation length" change.

* jc/abbrev-auto:
  transport: compute summary-width dynamically
  transport: allow summary-width to be computed dynamically
  fetch: pass summary_width down the callchain
  transport: pass summary_width down the callchain
2016-10-27 14:58:48 -07:00
Junio C Hamano 580d820ece Merge branch 'lt/abbrev-auto'
Allow the default abbreviation length, which has historically been
7, to scale as the repository grows.  The logic suggests to use 12
hexdigits for the Linux kernel, and 9 to 10 for Git itself.

* lt/abbrev-auto:
  abbrev: auto size the default abbreviation
  abbrev: prepare for new world order
  abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
2016-10-27 14:58:47 -07:00
Jeff King ef2ed5013c find_unique_abbrev: use 4-buffer ring
Some code paths want to format multiple abbreviated sha1s in
the same output line. Because we use a single static buffer
for our return value, they have to either break their output
into several calls or allocate their own arrays and use
find_unique_abbrev_r().

Intead, let's mimic sha1_to_hex() and use a ring of several
buffers, so that the return value stays valid through
multiple calls. This shortens some of the callers, and makes
it harder to for them to make a silly mistake.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-26 13:30:51 -07:00
Junio C Hamano d347fb6596 Merge branch 'jk/diff-submodule-diff-inline'
A recently graduated topic regressed "git rev-list --header"
output, breaking "gitweb".  This has been fixed.

* jk/diff-submodule-diff-inline:
  rev-list: use hdr_termination instead of a always using a newline
2016-10-26 13:14:51 -07:00
Junio C Hamano 9fcd14491d Merge branch 'jk/fetch-quick-tag-following'
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.

* jk/fetch-quick-tag-following:
  fetch: use "quick" has_sha1_file for tag following
2016-10-26 13:14:47 -07:00
Junio C Hamano 92657ea597 Merge branch 'jk/merge-base-fork-point-without-reflog'
"git rebase" immediately after "git clone" failed to find the fork
point from the upstream.

* jk/merge-base-fork-point-without-reflog:
  merge-base: handle --fork-point without reflog
2016-10-26 13:14:47 -07:00
Junio C Hamano 1c2b1f7018 Merge branch 'bw/ls-files-recurse-submodules'
"git ls-files" learned "--recurse-submodules" option that can be
used to get a listing of tracked files across submodules (i.e. this
only works with "--cached" option, not for listing untracked or
ignored files).  This would be a useful tool to sit on the upstream
side of a pipe that is read with xargs to work on all working tree
files from the top-level superproject.

* bw/ls-files-recurse-submodules:
  ls-files: add pathspec matching for submodules
  ls-files: pass through safe options for --recurse-submodules
  ls-files: optionally recurse into submodules
  git: make super-prefix option
2016-10-26 13:14:44 -07:00
Junio C Hamano 2bee56be7e Merge branch 'js/libify-require-clean-work-tree'
The require_clean_work_tree() helper was recreated in C when "git
pull" was rewritten from shell; the helper is now made available to
other callers in preparation for upcoming "rebase -i" work.

* js/libify-require-clean-work-tree:
  wt-status: begin error messages with lower-case
  wt-status: teach has_{unstaged,uncommitted}_changes() about submodules
  wt-status: export also the has_un{staged,committed}_changes() functions
  wt-status: make the require_clean_work_tree() function reusable
  pull: make code more similar to the shell script again
  pull: drop confusing prefix parameter of die_on_unclean_work_tree()
2016-10-26 13:14:44 -07:00
Lars Schneider a5436b5794 sha1_file: rename git_open_noatime() to git_open()
This function is meant to be used when reading from files in the
object store, and the original objective was to avoid smudging atime
of loose object files too often, hence its name.  Because we'll be
extending its role in the next commit to also arrange the file
descriptors they return auto-closed in the child processes, rename
it to lose "noatime" part that is too specific.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-25 10:59:13 -07:00
Nguyễn Thái Ngọc Duy 2c49f7ffb3 commit: don't be fooled by ita entries when creating initial commit
ita entries are dropped at tree generation phase. If the entire index
consists of just ita entries, the result would be a a commit with no
entries, which should be caught unless --allow-empty is specified. The
test "!!active_nr" is not sufficient to catch this.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-24 10:54:11 -07:00
Nguyễn Thái Ngọc Duy 018ec3c820 commit: fix empty commit creation when there's no changes but ita entries
If i-t-a entries are present and there is no change between the index
and HEAD i-t-a entries, index_differs_from() still returns "dirty, new
entries" (aka, the resulting commit is not empty), but cache-tree will
skip i-t-a entries and produce the exact same tree of current
commit.

index_differs_from() is supposed to catch this so we can abort
git-commit (unless --no-empty is specified). Update it to optionally
ignore i-t-a entries when doing a diff between the index and HEAD so
that it would return "no change" in this case and abort commit.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-24 10:48:23 -07:00
Junio C Hamano 11fd66de9b transport: allow summary-width to be computed dynamically
Now we have identified three callchains that have a set of refs that
they want to show their <old, new> object names in an aligned output,
we can replace their reference to the constant TRANSPORT_SUMMARY_WIDTH
with a helper function call to transport_summary_width() that takes
the set of ref as a parameter.  This step does not yet iterate over
the refs and compute, which is left as an exercise to the readers.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-21 15:28:07 -07:00
Junio C Hamano 901f3d403e fetch: pass summary_width down the callchain
The leaf function on the "fetch" side that uses TRANSPORT_SUMMARY_WIDTH
constant is builtin/fetch.c::format_display() and it has two distinct
callchains.  The one that reports the primary result of fetch originates
at store_updated_refs(); the other one that reports the pruning of
the remote-tracking refs originates at prune_refs().

Teach these two places to pass summary_width down the callchain,
just like we did for the "push" side in the previous commit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-21 15:22:55 -07:00
Johannes Schindelin 2863584f5c sequencer: get rid of the subcommand field
The subcommands are used exactly once, at the very beginning of
sequencer_pick_revisions(), to determine what to do. This is an
unnecessary level of indirection: we can simply call the correct
function to begin with. So let's do that.

While at it, ensure that the subcommands return an error code so that
they do not have to die() all over the place (bad practice for library
functions...).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-21 09:32:34 -07:00
Johannes Schindelin 03a4e260e2 sequencer: plug memory leaks for the option values
The sequencer is our attempt to lib-ify cherry-pick. Yet it behaves
like a one-shot command when it reads its configuration: memory is
allocated and released only when the command exits.

This is kind of okay for git-cherry-pick, which *is* a one-shot
command. All the work to make the sequencer its work horse was
done to allow using the functionality as a library function, though,
including proper clean-up after use.

To remedy that, take custody of the option values in question,
allocating and duping literal constants as needed and freeing them
at end.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-21 09:31:53 -07:00
Jacob Keller 98985c6911 rev-list: use hdr_termination instead of a always using a newline
When adding support for prefixing output of log and other commands using
--line-prefix, commit 660e113ce1 ("graph: add support for
--line-prefix on all graph-aware output", 2016-08-31) accidentally
broke rev-list --header output.

In order to make the output appear with a line-prefix, the flow was
changed to always use the graph subsystem for display. Unfortunately
the graph flow in rev-list did not use info->hdr_termination as it was
assumed that graph output would never need to putput NULs.

Since we now always use the graph code in order to handle the case of
line-prefix, simply replace putchar('\n') with
putchar(info->hdr_termination) which will correct this issue.

Add a test for the --header case to make sure we don't break it in the
future.

Reported-by: Dennis Kaarsemaker <dennis@kaarsemaker.net>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-20 14:44:37 -07:00
Junio C Hamano ad0e95980c Merge branch 'js/reset-usage'
* js/reset-usage:
  reset: fix usage
2016-10-17 13:25:22 -07:00
Junio C Hamano dec040192f Merge branch 'jk/alt-odb-cleanup'
Codepaths involved in interacting alternate object store have
been cleaned up.

* jk/alt-odb-cleanup:
  alternates: use fspathcmp to detect duplicates
  sha1_file: always allow relative paths to alternates
  count-objects: report alternates via verbose mode
  fill_sha1_file: write into a strbuf
  alternates: store scratch buffer as strbuf
  fill_sha1_file: write "boring" characters
  alternates: use a separate scratch space
  alternates: encapsulate alt->base munging
  alternates: provide helper for allocating alternate
  alternates: provide helper for adding to alternates list
  link_alt_odb_entry: refactor string handling
  link_alt_odb_entry: handle normalize_path errors
  t5613: clarify "too deep" recursion tests
  t5613: do not chdir in main process
  t5613: whitespace/style cleanups
  t5613: use test_must_fail
  t5613: drop test_valid_repo function
  t5613: drop reachable_via function
2016-10-17 13:25:20 -07:00
Junio C Hamano 25ab004c53 Merge branch 'jk/quarantine-received-objects'
In order for the receiving end of "git push" to inspect the
received history and decide to reject the push, the objects sent
from the sending end need to be made available to the hook and
the mechanism for the connectivity check, and this was done
traditionally by storing the objects in the receiving repository
and letting "git gc" to expire it.  Instead, store the newly
received objects in a temporary area, and make them available by
reusing the alternate object store mechanism to them only while we
decide if we accept the check, and once we decide, either migrate
them to the repository or purge them immediately.

* jk/quarantine-received-objects:
  tmp-objdir: do not migrate files starting with '.'
  tmp-objdir: put quarantine information in the environment
  receive-pack: quarantine objects until pre-receive accepts
  tmp-objdir: introduce API for temporary object directories
  check_connected: accept an env argument
2016-10-17 13:25:20 -07:00
Junio C Hamano 630e05c4f3 Merge branch 'jk/clone-copy-alternates-fix'
"git clone" of a local repository can be done at the filesystem
level, but the codepath did not check errors while copying and
adjusting the file that lists alternate object stores.

* jk/clone-copy-alternates-fix:
  clone: detect errors in normalize_path_copy
2016-10-17 13:25:18 -07:00
Johannes Schindelin 8a2a0f5341 sequencer: use memoized sequencer directory path
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:52:23 -07:00
Johannes Schindelin ee624c0d3f sequencer: use static initializers for replay_opts
This change is not completely faithful: instead of initializing all fields
to 0, we choose to initialize command and subcommand to -1 (instead of
defaulting to REPLAY_REVERT and REPLAY_NONE, respectively). Practically,
it makes no difference at all, but future-proofs the code to require
explicit assignments for both fields.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:52:23 -07:00
Lars Schneider 81c634e94f pkt-line: rename packet_write() to packet_write_fmt()
packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.

packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:36:50 -07:00
Jeff King 5827a03545 fetch: use "quick" has_sha1_file for tag following
When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.

This has gone unnoticed for several years because it
requires a fairly unique setup to matter:

  1. You need to have a lot of packs on the client side to
     make reprepare_packed_git() expensive (the most
     expensive part is finding duplicates in an unsorted
     list, which is currently quadratic).

  2. You need a large number of tag refs on the server side
     that are candidates for auto-following (i.e., that the
     client doesn't have). Each one triggers a re-read of
     the pack directory.

  3. Under normal circumstances, the client would
     auto-follow those tags and after one large fetch, (2)
     would no longer be true. But if those tags point to
     history which is disconnected from what the client
     otherwise fetches, then it will never auto-follow, and
     those candidates will impact it on every fetch.

So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).

This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.

Here are results from the included perf script, which sets
up a situation similar to the one described above:

Test            HEAD^               HEAD
----------------------------------------------------------
5550.4: fetch   11.21(10.42+0.78)   0.08(0.04+0.02) -99.3%

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-14 11:31:32 -07:00
Jeff King 4f21454b55 merge-base: handle --fork-point without reflog
The --fork-point option looks in the reflog to try to find
where a derived branch forked from a base branch. However,
if the reflog for the base branch is totally empty (as it
commonly is right after cloning, which does not write a
reflog entry), then our for_each_reflog call will not find
any entries, and we will come up with no merge base, even
though there may be one with the current tip of the base.

We can fix this by just adding the current tip to
our list of collected entries.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-12 14:30:16 -07:00
Johannes Schindelin 641c900b2c reset: fix usage
The <tree-ish> parameter is actually optional (see man page).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-11 12:27:39 -07:00
Junio C Hamano 1172e16af0 Merge branch 'jc/blame-reverse'
It is a common mistake to say "git blame --reverse OLD path",
expecting that the command line is dwimmed as if asking how lines
in path in an old revision OLD have survived up to the current
commit.

* jc/blame-reverse:
  blame: dwim "blame --reverse OLD" as "blame --reverse OLD.."
  blame: improve diagnosis for "--reverse NEW"
2016-10-10 14:03:51 -07:00
Junio C Hamano a460ea4a3c Merge branch 'nd/shallow-deepen'
The existing "git fetch --depth=<n>" option was hard to use
correctly when making the history of an existing shallow clone
deeper.  A new option, "--deepen=<n>", has been added to make this
easier to use.  "git clone" also learned "--shallow-since=<date>"
and "--shallow-exclude=<tag>" options to make it easier to specify
"I am interested only in the recent N months worth of history" and
"Give me only the history since that version".

* nd/shallow-deepen: (27 commits)
  fetch, upload-pack: --deepen=N extends shallow boundary by N commits
  upload-pack: add get_reachable_list()
  upload-pack: split check_unreachable() in two, prep for get_reachable_list()
  t5500, t5539: tests for shallow depth excluding a ref
  clone: define shallow clone boundary with --shallow-exclude
  fetch: define shallow boundary with --shallow-exclude
  upload-pack: support define shallow boundary by excluding revisions
  refs: add expand_ref()
  t5500, t5539: tests for shallow depth since a specific date
  clone: define shallow clone boundary based on time with --shallow-since
  fetch: define shallow boundary with --shallow-since
  upload-pack: add deepen-since to cut shallow repos based on time
  shallow.c: implement a generic shallow boundary finder based on rev-list
  fetch-pack: use a separate flag for fetch in deepening mode
  fetch-pack.c: mark strings for translating
  fetch-pack: use a common function for verbose printing
  fetch-pack: use skip_prefix() instead of starts_with()
  upload-pack: move rev-list code out of check_non_tip()
  upload-pack: make check_non_tip() clean things up on error
  upload-pack: tighten number parsing at "deepen" lines
  ...
2016-10-10 14:03:50 -07:00
Junio C Hamano e6e24c94df Merge branch 'jk/pack-objects-optim-mru'
"git pack-objects" in a repository with many packfiles used to
spend a lot of time looking for/at objects in them; the accesses to
the packfiles are now optimized by checking the most-recently-used
packfile first.

* jk/pack-objects-optim-mru:
  pack-objects: use mru list when iterating over packs
  pack-objects: break delta cycles before delta-search phase
  sha1_file: make packed_object_info public
  provide an initializer for "struct object_info"
2016-10-10 14:03:47 -07:00
Junio C Hamano b8688adb12 Merge branch 'rs/qsort'
We call "qsort(array, nelem, sizeof(array[0]), fn)", and most of
the time third parameter is redundant.  A new QSORT() macro lets us
omit it.

* rs/qsort:
  show-branch: use QSORT
  use QSORT, part 2
  coccicheck: use --all-includes by default
  remove unnecessary check before QSORT
  use QSORT
  add QSORT
2016-10-10 14:03:46 -07:00
Jeff King 722ff7f876 receive-pack: quarantine objects until pre-receive accepts
When a client pushes objects to us, index-pack checks the
objects themselves and then installs them into place. If we
then reject the push due to a pre-receive hook, we cannot
just delete the packfile; other processes may be depending
on it. We have to do a normal reachability check at this
point via `git gc`.

But such objects may hang around for weeks due to the
gc.pruneExpire grace period. And worse, during that time
they may be exploded from the pack into inefficient loose
objects.

Instead, this patch teaches receive-pack to put the new
objects into a "quarantine" temporary directory. We make
these objects available to the connectivity check and to the
pre-receive hook, and then install them into place only if
it is successful (and otherwise remove them as tempfiles).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:54:02 -07:00