Commit graph

5183 commits

Author SHA1 Message Date
Junio C Hamano d845d727cb Merge branch 'jk/setup-sequence-update'
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour.  The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.

* jk/setup-sequence-update:
  t1007: factor out repeated setup
  init: reset cached config when entering new repo
  init: expand comments explaining config trickery
  config: only read .git/config from configured repos
  test-config: setup git directory
  t1302: use "git -C"
  pager: handle early config
  pager: use callbacks instead of configset
  pager: make pager_program a file-local static
  pager: stop loading git_default_config()
  pager: remove obsolete comment
  diff: always try to set up the repository
  diff: handle --no-index prefixes consistently
  diff: skip implicit no-index check when given --no-index
  patch-id: use RUN_SETUP_GENTLY
  hash-object: always try to set up the git repository
2016-09-21 15:15:24 -07:00
Junio C Hamano 7f109ef54e Merge branch 'ks/pack-objects-bitmap'
Some codepaths in "git pack-objects" were not ready to use an
existing pack bitmap; now they are and as the result they have
become faster.

* ks/pack-objects-bitmap:
  pack-objects: use reachability bitmap index when generating non-stdout pack
  pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
2016-09-21 15:15:21 -07:00
Junio C Hamano 7889ed25ac Merge branch 'js/cat-file-filters'
Even though "git hash-objects", which is a tool to take an
on-filesystem data stream and put it into the Git object store,
allowed to perform the "outside-world-to-Git" conversions (e.g.
end-of-line conversions and application of the clean-filter), and
it had the feature on by default from very early days, its reverse
operation "git cat-file", which takes an object from the Git object
store and externalize for the consumption by the outside world,
lacked an equivalent mechanism to run the "Git-to-outside-world"
conversion.  The command learned the "--filters" option to do so.

* js/cat-file-filters:
  cat-file: support --textconv/--filters in batch mode
  cat-file --textconv/--filters: allow specifying the path separately
  cat-file: introduce the --filters option
  cat-file: fix a grammo in the man page
2016-09-21 15:15:19 -07:00
Josh Triplett 68e83a5b82 format-patch: add "--rfc" for the common case of [RFC PATCH]
Add an alias for --subject-prefix='RFC PATCH', which is used
commonly in some development communities to deserve such a
short-hand.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 08:58:10 -07:00
Nguyễn Thái Ngọc Duy b829b9439a checkout: fix ambiguity check in subdir
The two functions in parse_branchname_arg(), verify_non_filename and
check_filename, need correct prefix in order to reconstruct the paths
and check for their existence. With NULL prefix, they just check paths
at top dir instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 08:44:41 -07:00
Junio C Hamano 9f3d73e8cb Merge branch 'ah/misc-message-fixes' into maint
Message cleanup.

* ah/misc-message-fixes:
  unpack-trees: do not capitalize "working"
  git-merge-octopus: do not capitalize "octopus"
  git-rebase--interactive: fix English grammar
  cat-file: put spaces around pipes in usage string
  am: put spaces around pipe in usage string
2016-09-19 13:51:45 -07:00
Junio C Hamano d6645312ff Merge branch 'jc/forbid-symbolic-ref-d-HEAD' into maint
"git symbolic-ref -d HEAD" happily removes the symbolic ref, but
the resulting repository becomes an invalid one.  Teach the command
to forbid removal of HEAD.

* jc/forbid-symbolic-ref-d-HEAD:
  symbolic-ref -d: do not allow removal of HEAD
2016-09-19 13:51:41 -07:00
Junio C Hamano 4af9a7d344 Merge branch 'bc/object-id'
The "unsigned char sha1[20]" to "struct object_id" conversion
continues.  Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.

It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2").  Extra sets of eyes double-checking for
mismerges are highly appreciated.

* bc/object-id:
  builtin/reset: convert to use struct object_id
  builtin/commit-tree: convert to struct object_id
  builtin/am: convert to struct object_id
  refs: add an update_ref_oid function.
  sha1_name: convert get_sha1_mb to struct object_id
  builtin/update-index: convert file to struct object_id
  notes: convert init_notes to use struct object_id
  builtin/rm: convert to use struct object_id
  builtin/blame: convert file to use struct object_id
  Convert read_mmblob to take struct object_id.
  notes-merge: convert struct notes_merge_pair to struct object_id
  builtin/checkout: convert some static functions to struct object_id
  streaming: make stream_blob_to_fd take struct object_id
  builtin: convert textconv_object to use struct object_id
  builtin/cat-file: convert some static functions to struct object_id
  builtin/cat-file: convert struct expand_data to use struct object_id
  builtin/log: convert some static functions to use struct object_id
  builtin/blame: convert struct origin to use struct object_id
  builtin/apply: convert static functions to struct object_id
  cache: convert struct cache_entry to use struct object_id
2016-09-19 13:47:19 -07:00
Junio C Hamano 4322f3848a Merge branch 'mh/ref-store'
The ref-store abstraction was introduced to the refs API so that we
can plug in different backends to store references.

* mh/ref-store: (38 commits)
  refs: implement iteration over only per-worktree refs
  refs: make lock generic
  refs: add method to rename refs
  refs: add methods to init refs db
  refs: make delete_refs() virtual
  refs: add method for initial ref transaction commit
  refs: add methods for reflog
  refs: add method iterator_begin
  files_ref_iterator_begin(): take a ref_store argument
  split_symref_update(): add a files_ref_store argument
  lock_ref_sha1_basic(): add a files_ref_store argument
  lock_ref_for_update(): add a files_ref_store argument
  commit_ref_update(): add a files_ref_store argument
  lock_raw_ref(): add a files_ref_store argument
  repack_without_refs(): add a files_ref_store argument
  refs: make peel_ref() virtual
  refs: make create_symref() virtual
  refs: make pack_refs() virtual
  refs: make verify_refname_available() virtual
  refs: make read_raw_ref() virtual
  ...
2016-09-19 13:47:19 -07:00
Junio C Hamano 81358dc238 Merge branch 'cc/apply-am'
"git am" has been taught to make an internal call to "git apply"'s
innards without spawning the latter as a separate process.

* cc/apply-am: (41 commits)
  builtin/am: use apply API in run_apply()
  apply: learn to use a different index file
  apply: pass apply state to build_fake_ancestor()
  apply: refactor `git apply` option parsing
  apply: change error_routine when silent
  usage: add get_error_routine() and get_warn_routine()
  usage: add set_warn_routine()
  apply: don't print on stdout in verbosity_silent mode
  apply: make it possible to silently apply
  apply: use error_errno() where possible
  apply: make some parsing functions static again
  apply: move libified code from builtin/apply.c to apply.{c,h}
  apply: rename and move opt constants to apply.h
  builtin/apply: rename option parsing functions
  builtin/apply: make create_one_file() return -1 on error
  builtin/apply: make try_create_file() return -1 on error
  builtin/apply: make write_out_results() return -1 on error
  builtin/apply: make write_out_one_result() return -1 on error
  builtin/apply: make create_file() return -1 on error
  builtin/apply: make add_index_file() return -1 on error
  ...
2016-09-19 13:47:18 -07:00
Michael Haggerty 5b162879e9 blame: honor the diff heuristic options and config
Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".

Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.

It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:25:11 -07:00
Junio C Hamano 4fa1251bc2 Merge branch 'ah/misc-message-fixes'
Message cleanup.

* ah/misc-message-fixes:
  unpack-trees: do not capitalize "working"
  git-merge-octopus: do not capitalize "octopus"
  git-rebase--interactive: fix English grammar
  cat-file: put spaces around pipes in usage string
  am: put spaces around pipe in usage string
2016-09-15 14:11:15 -07:00
Junio C Hamano 9883ec2c73 Merge branch 'jk/pack-tag-of-tag'
"git pack-objects --include-tag" was taught that when we know that
we are sending an object C, we want a tag B that directly points at
C but also a tag A that points at the tag B.  We used to miss the
intermediate tag B in some cases.

* jk/pack-tag-of-tag:
  pack-objects: walk tag chains for --include-tag
  t5305: simplify packname handling
  t5305: use "git -C"
  t5305: drop "dry-run" of unpack-objects
  t5305: move cleanup into test block
2016-09-15 14:11:14 -07:00
Vasco Almeida 43073f8984 i18n: update-index: mark warnings for translation
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida 205d13451d i18n: show-branch: mark plural strings for translation
Mark plural string for translation using Q_().

Although we already know that the plural sentence is always used in the
English source, other languages have complex plural rules they must
comply according to the value of MAX_REVS.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida 8a78d462c9 i18n: show-branch: mark error messages for translation
Spell the first word of messages in lowercase, following the usual
style.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida 8ba35a2dc6 i18n: receive-pack: mark messages for translation
Mark messages refuse_unconfigured_deny_msg and
refuse_unconfigured_deny_delete_current_msg for translation.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida 8d79589ad6 notes: spell first word of error messages in lowercase
That's the usual style.

Update one test to reflect these changes.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida 2d1252dade i18n: notes: mark error messages for translation
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida 765773c839 i18n: merge-recursive: mark verbose message for translation
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida ccf7813139 i18n: merge-recursive: mark error messages for translation
Spell the first word of such error messages in lowercase,
following the usual style.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida ccf6380154 i18n: config: mark error message for translation
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida fb4cc6a9cb i18n: branch: mark option description for translation
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
Vasco Almeida e3f54bff43 i18n: blame: mark error messages for translation
Mark error messages for translation passed to die() function.
Change "Cannot" to lowercase following the usual style.

Reflect changes to test by using test_i18ngrep.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 13:17:32 -07:00
René Scharfe a22ae753cb use strbuf_addstr() for adding constant strings to a strbuf, part 2
Replace uses of strbuf_addf() for adding strings with more lightweight
strbuf_addstr() calls.  This makes the intent clearer and avoids
potential issues with printf format specifiers.

02962d3684 already converted six cases,
this patch covers eleven more.

A semantic patch for Coccinelle is included for easier checking for
new cases that might be introduced in the future.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:23:38 -07:00
Thomas Gummerer 610d55af0f add: modify already added files when --chmod is given
When the chmod option was added to git add, it was hooked up to the diff
machinery, meaning that it only works when the version in the index
differs from the version on disk.

As the option was supposed to mirror the chmod option in update-index,
which always changes the mode in the index, regardless of the status of
the file, make sure the option behaves the same way in git add.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
Thomas Gummerer d9d7096662 read-cache: introduce chmod_index_entry
As there are chmod options for both add and update-index, introduce a
new chmod_index_entry function to do the work.  Use it in update-index,
while it will be used in add in the next patch.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
Josh Triplett 480871e09e format-patch: show base info before email signature
Any text below the "-- " for the email signature gets treated as part of
the signature, and many mail clients will trim it from the quoted text
for a reply.  Move it above the signature, so people can reply to it
more easily.

Similarly, when producing the patch as a MIME attachment, the
original code placed the base info after the attached part, which
would be discarded.  Move the base info to the end of the part,
still inside the part boundary.

Add tests for the exact format of the email signature, and add tests
to ensure that the base info appears before the email signature when
producing a plain-text output, and that it appears before the part
boundary when producing a MIME attachment.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 10:07:10 -07:00
Thomas Gummerer 22433ce461 update-index: add test for chmod flags
Currently there is no test checking the expected behaviour when multiple
chmod flags with different arguments are passed.  As argument handling
is not in line with other git commands it's easy to miss and
accidentally change the current behaviour.

While there, fix the argument type of chmod_path, which takes an int,
but had a char passed in.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-14 15:03:49 -07:00
René Scharfe ce25e4c78d checkout: constify parameters of checkout_stage() and checkout_merged()
Document the fact that checkout_stage() and checkout_merged() don't
change the objects passed to them by adding the modifier const.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 16:12:28 -07:00
Jeff King 4543926ba8 init: reset cached config when entering new repo
After we copy the templates into place, we re-read the
config in case we copied in a default config file. But since
git_config() is backed by a cache these days, it's possible
that the call will not actually touch the filesystem at all;
we need to tell it that something has changed behind the
scenes.

Note that we also need to reset the shared_repository
config. At first glance, it seems like this should probably
just be folded into git_config_clear(). But unfortunately
that is not quite right. The shared repository value may
come from config, _or_ it may have been set manually. So
only the caller who knows whether or not they set it is the
one who can clear it (and indeed, if you _do_ put it into
git_config_clear(), then many tests fail, as we have to
clear the config cache any time we set a new config
variable).

There are three tests here. The first two actually pass
already, though it's largely luck: they just don't happen to
actually read any config before we enter the new repo.

But the third one does fail without this patch; we look at
core.sharedrepository while creating the directory, but need
to make sure the value from the template config overrides
it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 7c0a842b46 init: expand comments explaining config trickery
git-init may copy "config" from the templates directory and
then re-read it. There are some comments explaining what's
going on here, but they are not grouped very well with the
matching code. Let's rearrange and expand them.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 28a4e58021 diff: always try to set up the repository
If we see an explicit "--no-index", we do not bother calling
setup_git_directory_gently() at all. This means that we may
miss out on reading repo-specific config.

It's arguable whether this is correct or not. If we were
designing from scratch, making "git diff --no-index"
completely ignore the repository makes some sense. But we
are nowhere near scratch, so let's look at the existing
behavior:

  1. If you're in the top-level of a repository and run an
     explicit "diff --no-index", the config subsystem falls
     back to reading ".git/config", and we will respect repo
     config.

  2. If you're in a subdirectory of a repository, then we
     still try to read ".git/config", but it generally
     doesn't exist. So "diff --no-index" there does not
     respect repo config.

  3. If you have $GIT_DIR set in the environment, we read
     and respect $GIT_DIR/config,

  4. If you run "git diff /tmp/foo /tmp/bar" to get an
     implicit no-index, we _do_ run the repository setup,
     and set $GIT_DIR (or respect an existing $GIT_DIR
     variable). We find the repo config no matter where we
     started, and respect it.

So we already respect the repository config in a number of
common cases, and case (2) is the only one that does not.
And at least one of our tests, t4034, depends on case (1)
behaving as it does now (though it is just incidental, not
an explicit test for this behavior).

So let's bring case (2) in line with the others by always
running the repository setup, even with an explicit
"--no-index". We shouldn't need to change anything else, as the
implicit case already handles the prefix.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 475b362c2a diff: skip implicit no-index check when given --no-index
We can invoke no-index mode in two ways: by an explicit
request from the user, or implicitly by noticing that we
have two paths, and at least one is outside the repository.

If the user already told us --no-index, there is no need for
us to do the implicit test at all.  However, we currently
do, and downgrade our "explicit" to DIFF_NO_INDEX_IMPLICIT.

This doesn't have any user-visible behavior, though it's not
immediately obvious why. We only trigger the implicit check
when we have exactly two non-option arguments. And the only
code that cares about implicit versus explicit is an error
message that we show when we _don't_ have two non-option
arguments.

However, it's worth fixing anyway. Besides being slightly
more efficient, it makes the code easier to follow, which
will help when we modify it in future patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 0e94ee9415 hash-object: always try to set up the git repository
When "hash-object" is run without "-w", we don't need to be
in a git repository at all; we can just hash the object and
write its sha1 to stdout. However, if we _are_ in a git
repository, we would want to know that so we can follow the
normal rules for respecting config, .gitattributes, etc.

This happens to work at the top-level of a git repository
because we blindly read ".git/config", but as the included
test shows, it does not work when you are in a subdirectory.

The solution is to just do a "gentle" setup in this case. We
already take care to use prefix_filename() on any filename
arguments we get (to handle the "-w" case), so we don't need
to do anything extra to handle the side effects of repo
setup.

An alternative would be to specify RUN_SETUP_GENTLY for this
command in git.c, and then die if "-w" is set but we are not
in a repository. However, the error messages generated at
the time of setup_git_directory() are more detailed, so it's
better to find out which mode we are in, and then call the
appropriate function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Junio C Hamano d1de693d0d Merge branch 'jc/forbid-symbolic-ref-d-HEAD'
"git symbolic-ref -d HEAD" happily removes the symbolic ref, but
the resulting repository becomes an invalid one.  Teach the command
to forbid removal of HEAD.

* jc/forbid-symbolic-ref-d-HEAD:
  symbolic-ref -d: do not allow removal of HEAD
2016-09-12 15:34:35 -07:00
Junio C Hamano 87f5de387c Merge branch 'jc/am-read-author-file'
Extract a small helper out of the function that reads the authors
script file "git am" internally uses.

* jc/am-read-author-file:
  am: refactor read_author_script()
2016-09-12 15:34:32 -07:00
Junio C Hamano 305d7f1339 Merge branch 'jk/diff-submodule-diff-inline'
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.

* jk/diff-submodule-diff-inline:
  diff: teach diff to display submodule difference with an inline diff
  submodule: refactor show_submodule_summary with helper function
  submodule: convert show_submodule_summary to use struct object_id *
  allow do_submodule_path to work even if submodule isn't checked out
  diff: prepare for additional submodule formats
  graph: add support for --line-prefix on all graph-aware output
  diff.c: remove output_prefix_length field
  cache: add empty_tree_oid object and helper function
2016-09-12 15:34:31 -07:00
Kirill Smelkov 645c432d61 pack-objects: use reachability bitmap index when generating non-stdout pack
Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects)
if a repository has bitmap index, pack-objects can nicely speedup
"Counting objects" graph traversal phase. That however was done only for
case when resultant pack is sent to stdout, not written into a file.

The reason here is for on-disk repack by default we want:

- to produce good pack (with bitmap index not-yet-packed objects are
  emitted to pack in suboptimal order).

- to use more robust pack-generation codepath (avoiding possible
  bugs in bitmap code and possible bitmap index corruption).

Jeff King further explains:

    The reason for this split is that pack-objects tries to determine how
    "careful" it should be based on whether we are packing to disk or to
    stdout. Packing to disk implies "git repack", and that we will likely
    delete the old packs after finishing. We want to be more careful (so
    as not to carry forward a corruption, and to generate a more optimal
    pack), and we presumably run less frequently and can afford extra CPU.
    Whereas packing to stdout implies serving a remote via "git fetch" or
    "git push". This happens more frequently (e.g., a server handling many
    fetching clients), and we assume the receiving end takes more
    responsibility for verifying the data.

    But this isn't always the case. One might want to generate on-disk
    packfiles for a specialized object transfer. Just using "--stdout" and
    writing to a file is not optimal, as it will not generate the matching
    pack index.

    So it would be useful to have some way of overriding this heuristic:
    to tell pack-objects that even though it should generate on-disk
    files, it is still OK to use the reachability bitmaps to do the
    traversal.

So we can teach pack-objects to use bitmap index for initial object
counting phase when generating resultant pack file too:

- if we take care to not let it be activated under git-repack:

  See above about repack robustness and not forward-carrying corruption.

- if we know bitmap index generation is not enabled for resultant pack:

  The current code has singleton bitmap_git, so it cannot work
  simultaneously with two bitmap indices.

  We also want to avoid (at least with current implementation)
  generating bitmaps off of bitmaps. The reason here is: when generating
  a pack, not-yet-packed objects will be emitted into pack in
  suboptimal order and added to tail of the bitmap as "extended entries".
  When the resultant pack + some new objects in associated repository
  are in turn used to generate another pack with bitmap, the situation
  repeats: new objects are again not emitted optimally and just added to
  bitmap tail - not in recency order.

  So the pack badness can grow over time when at each step we have
  bitmapped pack + some other objects. That's why we want to avoid
  generating bitmaps off of bitmaps, not to let pack badness grow.

- if we keep pack reuse enabled still only for "send-to-stdout" case:

  Because pack-to-file needs to generate index for destination pack, and
  currently on pack reuse raw entries are directly written out to the
  destination pack by write_reused_pack(), bypassing needed for pack index
  generation bookkeeping done by regular codepath in write_one() and
  friends.

  ( In the future we might teach pack-reuse code about cases when index
    also needs to be generated for resultant pack and remove
    pack-reuse-only-for-stdout limitation )

This way for pack-objects -> file we get nice speedup:

    erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup
    repository managed by git-backup[2] via

    time echo 0186ac99 | git pack-objects --revs erp5pack

before:  37.2s
after:   26.2s

And for `git repack -adb` packed git.git

    time echo 5c589a73 | git pack-objects --revs gitpack

before:   7.1s
after:    3.6s

i.e. it can be 30% - 50% speedup for pack extraction.

git-backup extracts many packs on repositories restoration. That was my
initial motivation for the patch.

[1] https://lab.nexedi.com/nexedi/erp5
[2] https://lab.nexedi.com/kirr/git-backup

NOTE

Jeff also suggests that pack.useBitmaps was probably a mistake to
introduce originally. This way we are not adding another config point,
but instead just always default to-file pack-objects not to use bitmap
index: Tools which need to generate on-disk packs with using bitmap, can
pass --use-bitmap-index explicitly. And git-repack does never pass
--use-bitmap-index, so this way we can be sure regular on-disk repacking
remains robust.

NOTE2

`git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower
than `git pack-objects file.pack`. Extracting erp5.git pack from
lab.nexedi.com backup repository:

    $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack

    real    0m22.309s
    user    0m21.148s
    sys     0m0.932s

    $ time git index-pack erp5pack-stdout.pack

    real    0m50.873s   <-- more than 2 times slower than time to generate pack itself!
    user    0m49.300s
    sys     0m1.360s

So the time for

    `pack-object --stdout >file.pack` + `index-pack file.pack`  is  72s,

while

    `pack-objects file.pack` which does both pack and index     is  27s.

And even

    `pack-objects --no-use-bitmap-index file.pack`              is  37s.

Jeff explains:

    The packfile does not carry the sha1 of the objects. A receiving
    index-pack has to compute them itself, including inflating and applying
    all of the deltas.

that's why for `git-backup restore` we want to teach `git pack-objects
file.pack` to use bitmaps instead of using `git pack-objects --stdout
>file.pack` + `git index-pack file.pack`.

NOTE3

The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh

    Test                                    56dfeb62          this tree
    --------------------------------------------------------------------------------
    5310.2: repack to disk                  8.98(8.05+0.29)   9.05(8.08+0.33) +0.8%
    5310.3: simulated clone                 2.02(2.27+0.09)   2.01(2.25+0.08) -0.5%
    5310.4: simulated fetch                 0.81(1.07+0.02)   0.81(1.05+0.04) +0.0%
    5310.5: pack to file                    7.58(7.04+0.28)   7.60(7.04+0.30) +0.3%
    5310.6: pack to file (bitmap)           7.55(7.02+0.28)   3.25(2.82+0.18) -57.0%
    5310.8: clone (partial bitmap)          1.83(2.26+0.12)   1.82(2.22+0.14) -0.5%
    5310.9: pack to file (partial bitmap)   6.86(6.58+0.30)   2.87(2.74+0.20) -58.2%

More context:

    http://marc.info/?t=146792101400001&r=1&w=2
    http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t

Cc: Vicent Marti <tanoku@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-12 13:47:41 -07:00
Kirill Smelkov 702d1b9583 pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
are two codepaths in pack-objects: with & without using bitmap
reachability index.

However add_object_entry_from_bitmap(), despite its non-bitmapped
counterpart add_object_entry(), in no way does check for whether --local
or --honor-pack-keep or --incremental should be respected. In
non-bitmapped codepath this is handled in want_object_in_pack(), but
bitmapped codepath has simply no such checking at all.

The bitmapped codepath however was allowing to pass in all those options
and with bitmap indices still being used under such conditions -
potentially giving wrong output (e.g. including objects from non-local or
.keep'ed pack).

We can easily fix this by noting the following: when an object comes to
add_object_entry_from_bitmap() it can come for two reasons:

    1. entries coming from main pack covered by bitmap index, and
    2. object coming from, possibly alternate, loose or other packs.

"2" can be already handled by want_object_in_pack() and to cover
"1" we can teach want_object_in_pack() to expect that *found_pack can be
non-NULL, meaning calling client already found object's pack entry.

In want_object_in_pack() we care to start the checks from already found
pack, if we have one, this way determining the answer right away
in case neither --local nor --honour-pack-keep are active. In
particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do
not do harm to served-with-bitmap clones performance-wise:

    Test                      56dfeb62          this tree
    -----------------------------------------------------------------
    5310.2: repack to disk    9.08(8.20+0.25)   9.09(8.14+0.32) +0.1%
    5310.3: simulated clone   1.92(2.12+0.08)   1.93(2.12+0.09) +0.5%
    5310.4: simulated fetch   0.82(1.07+0.04)   0.82(1.06+0.04) +0.0%
    5310.6: partial bitmap    1.96(2.42+0.13)   1.95(2.40+0.15) -0.5%

    Test                      56dfeb62          this tree
    -----------------------------------------------------------------
    5310.2: repack to disk    9.11(8.16+0.32)   9.11(8.19+0.28) +0.0%
    5310.3: simulated clone   1.93(2.14+0.07)   1.92(2.11+0.10) -0.5%
    5310.4: simulated fetch   0.82(1.06+0.04)   0.82(1.04+0.05) +0.0%
    5310.6: partial bitmap    1.95(2.38+0.16)   1.94(2.39+0.14) -0.5%

    Test                      56dfeb62          this tree
    -----------------------------------------------------------------
    5310.2: repack to disk    9.13(8.17+0.31)   9.07(8.13+0.28) -0.7%
    5310.3: simulated clone   1.92(2.13+0.07)   1.91(2.12+0.06) -0.5%
    5310.4: simulated fetch   0.82(1.08+0.03)   0.82(1.08+0.03) +0.0%
    5310.6: partial bitmap    1.96(2.43+0.14)   1.96(2.42+0.14) +0.0%

with delta timings showing they are all within noise from run to run.

In the general case we do not want to call find_pack_entry_one() more than
once, because it is expensive. This patch splits the loop in
want_object_in_pack() into two parts: finding the object and seeing if it
impacts our choice to include it in the pack. We may call the inexpensive
want_found_object() twice, but we will never call find_pack_entry_one() if we
do not need to.

I appreciate help and discussing this change with Junio C Hamano and
Jeff King.

Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-12 13:47:41 -07:00
Johannes Schindelin 321459439e cat-file: support --textconv/--filters in batch mode
With this patch, --batch can be combined with --textconv or --filters.
For this to work, the input needs to have the form

	<object name><single white space><path>

so that the filters can be chosen appropriately.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-11 14:48:15 -07:00
Johannes Schindelin 7bcf341453 cat-file --textconv/--filters: allow specifying the path separately
There are circumstances when it is relatively easy to figure out the
object name for a given path, but not the name of the containing tree.
For example, when looking at a diff generated by Git, the object names
are recorded, but not the revision. As a matter of fact, the revisions
from which the diff was generated may not even exist locally.

In such a case, the user would have to generate a fake revision just to
be able to use --textconv or --filters.

Let's simplify this dramatically, because we do not really need that
revision at all: all we care about is that we know the path. In the
scenario described above, we do know the path, and we just want to
specify it separately from the object name.

Example usage:

	git cat-file --textconv --path=main.c 0f1937fd

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-11 14:48:15 -07:00
Johannes Schindelin b9e62f6011 cat-file: introduce the --filters option
The --filters option applies the convert_to_working_tree() filter for
the path when showing the contents of a regular file blob object;
the contents are written out as-is for other types of objects.

This feature comes in handy when a 3rd-party tool wants to work with
the contents of files from past revisions as if they had been checked
out, but without detouring via temporary files.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-11 14:47:46 -07:00
David Turner 6fb5acfd8f refs: add methods to init refs db
Alternate refs backends might not need the refs/heads directory and so
on, so we make ref db initialization part of the backend.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-09 15:28:14 -07:00
Junio C Hamano 02c6c14d6c Merge branch 'sb/submodule-clone-rr'
"git clone --resurse-submodules --reference $path $URL" is a way to
reduce network transfer cost by borrowing objects in an existing
$path repository when cloning the superproject from $URL; it
learned to also peek into $path for presense of corresponding
repositories of submodules and borrow objects from there when able.

* sb/submodule-clone-rr:
  clone: recursive and reference option triggers submodule alternates
  clone: implement optional references
  clone: clarify option_reference as required
  clone: factor out checking for an alternate path
  submodule--helper update-clone: allow multiple references
  submodule--helper module-clone: allow multiple references
  t7408: merge short tests, factor out testing method
  t7408: modernize style
2016-09-08 21:49:50 -07:00
Junio C Hamano 00d27937bf Merge branch 'jh/status-v2-porcelain'
Enhance "git status --porcelain" output by collecting more data on
the state of the index and the working tree files, which may
further be used to teach git-prompt (in contrib/) to make fewer
calls to git.

* jh/status-v2-porcelain:
  status: unit tests for --porcelain=v2
  test-lib-functions.sh: add lf_to_nul helper
  git-status.txt: describe --porcelain=v2 format
  status: print branch info with --porcelain=v2 --branch
  status: print per-file porcelain v2 status data
  status: collect per-file data for --porcelain=v2
  status: support --porcelain[=<version>]
  status: cleanup API to wt_status_print
  status: rename long-format print routines
2016-09-08 21:49:50 -07:00
Junio C Hamano d7ed183a91 Merge branch 'rt/help-unknown'
"git nosuchcommand --help" said "No manual entry for gitnosuchcommand",
which was not intuitive, given that "git nosuchcommand" said "git:
'nosuchcommand' is not a git command".

* rt/help-unknown:
  help: make option --help open man pages only for Git commands
  help: introduce option --exclude-guides
2016-09-08 21:49:48 -07:00
Junio C Hamano da3b6f06e1 Merge branch 'cc/receive-pack-limit'
An incoming "git push" that attempts to push too many bytes can now
be rejected by setting a new configuration variable at the receiving
end.

* cc/receive-pack-limit:
  receive-pack: allow a maximum input size to be specified
  unpack-objects: add --max-input-size=<size> option
  index-pack: add --max-input-size=<size> option
2016-09-08 21:49:47 -07:00
Junio C Hamano 452a9073ba Merge branch 'jk/format-patch-number-singleton-patch-with-cover'
"git format-patch --cover-letter HEAD^" to format a single patch
with a separate cover letter now numbers the output as [PATCH 0/1]
and [PATCH 1/1] by default.

* jk/format-patch-number-singleton-patch-with-cover:
  format-patch: show 0/1 and 1/1 for singleton patch with cover letter
2016-09-08 21:49:47 -07:00
Junio C Hamano 9bef642236 Merge branch 'jk/tighten-alloc' into maint
Small code and comment clean-up.

* jk/tighten-alloc:
  receive-pack: use FLEX_ALLOC_MEM in queue_command()
  correct FLEXPTR_* example in comment
2016-09-08 21:36:00 -07:00
Junio C Hamano 5e469ab66c Merge branch 'rs/use-strbuf-add-unique-abbrev' into maint
A small code clean-up.

* rs/use-strbuf-add-unique-abbrev:
  use strbuf_add_unique_abbrev() for adding short hashes
2016-09-08 21:36:00 -07:00
Junio C Hamano 24c88ad8d1 Merge branch 'rs/merge-add-strategies-simplification' into maint
A small code clean-up.

* rs/merge-add-strategies-simplification:
  merge: use string_list_split() in add_strategies()
2016-09-08 21:35:58 -07:00
Junio C Hamano ba22efd8f5 Merge branch 'js/no-html-bypass-on-windows' into maint
On Windows, help.browser configuration variable used to be ignored,
which has been corrected.

* js/no-html-bypass-on-windows:
  Revert "display HTML in default browser using Windows' shell API"
2016-09-08 21:35:55 -07:00
Junio C Hamano 7c96471947 Merge branch 'sb/checkout-explit-detach-no-advice' into maint
"git checkout --detach <branch>" used to give the same advice
message as that is issued when "git checkout <tag>" (or anything
that is not a branch name) is given, but asking with "--detach" is
an explicit enough sign that the user knows what is going on.  The
advice message has been squelched in this case.

* sb/checkout-explit-detach-no-advice:
  checkout: do not mention detach advice for explicit --detach option
2016-09-08 21:35:54 -07:00
Junio C Hamano 8e7c580e34 Merge branch 'js/mv-dir-to-new-directory' into maint
"git mv dir non-existing-dir/" did not work in some environments
the same way as existing mainstream platforms.  The code now moves
"dir" to "non-existing-dir", without relying on rename("A", "B/")
that strips the trailing slash of '/'.

* js/mv-dir-to-new-directory:
  git mv: do not keep slash in `git mv dir non-existing-dir/`
2016-09-08 21:35:54 -07:00
Junio C Hamano f59c6e6ccb Merge branch 'jk/reflog-date' into maint
The reflog output format is documented better, and a new format
--date=unix to report the seconds-since-epoch (without timezone)
has been added.

* jk/reflog-date:
  date: clarify --date=raw description
  date: add "unix" format
  date: document and test "raw-local" mode
  doc/pretty-formats: explain shortening of %gd
  doc/pretty-formats: describe index/time formats for %gd
  doc/rev-list-options: explain "-g" output formats
  doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
2016-09-08 21:35:52 -07:00
Nguyễn Thái Ngọc Duy 7f82b24e30 checkout: add some spaces between code and comment
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08 12:55:51 -07:00
Alex Henrie 88c782942c cat-file: put spaces around pipes in usage string
This makes the style a little more consistent with other usage strings,
and will resolve a warning at
https://www.softcatala.org/recursos/quality/git.html

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08 12:16:38 -07:00
Alex Henrie d65fdc9c5d am: put spaces around pipe in usage string
This makes the style a little more consistent with other usage strings,
and will resolve a warning at
https://www.softcatala.org/recursos/quality/git.html

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08 12:13:28 -07:00
brian m. carlson 3a5d7c55f7 builtin/reset: convert to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson 031cee5b73 builtin/commit-tree: convert to struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson 8c88769ba4 builtin/am: convert to struct object_id
Convert uses of unsigned char [20] to struct object_id.  Rename the
generically-named "ptr" to "old_oid" and make it const.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson 151b2911c1 sha1_name: convert get_sha1_mb to struct object_id
All of the callers of this function use struct object_id, so rename it
to get_oid_mb and make it take struct object_id instead of
unsigned char *.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson 71445a0fef builtin/update-index: convert file to struct object_id
Convert all functions to use struct object_id, and replace instances of
hardcoded 40, 41, and 42 with appropriate references to GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson 8ec46d7e3e builtin/rm: convert to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson 110d26fce8 builtin/blame: convert file to use struct object_id
Convert this file to use struct object_id, and additionally convert some
uses of the constant 40 to GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson d449347d08 Convert read_mmblob to take struct object_id.
Since all of its callers have been updated, convert read_mmblob to take
a pointer to struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson 60af7691db builtin/checkout: convert some static functions to struct object_id
Convert all the static functions that are not callbacks to struct
object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson 7eda0e4fbb streaming: make stream_blob_to_fd take struct object_id
Since all of its callers have been updated, modify stream_blob_to_fd to
take a struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson acad70d106 builtin: convert textconv_object to use struct object_id
Since all of its callers have been updated, make textconv_object take a
struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson 63ecb99e0d builtin/cat-file: convert some static functions to struct object_id
Convert all of the static functions that are not callbacks to use struct
object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson cd4f77beb7 builtin/cat-file: convert struct expand_data to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib,
plus the actual change to the struct:

@@
struct expand_data E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct expand_data *E1;
@@
- E1->sha1
+ E1->oid.hash

@@
struct expand_data E1;
@@
- E1.delta_base_sha1
+ E1.delta_base_oid.hash

@@
struct expand_data *E1;
@@
- E1->delta_base_sha1
+ E1->delta_base_oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson d801627b0c builtin/log: convert some static functions to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson a7bcfa126b builtin/blame: convert struct origin to use struct object_id
Convert struct origin to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib,
plus the actual change to the struct:

@@
struct origin E1;
@@
- E1.blob_sha1
+ E1.blob_oid.hash

@@
struct origin *E1;
@@
- E1->blob_sha1
+ E1->blob_oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson eb1c9c7328 builtin/apply: convert static functions to struct object_id
There were several static functions using unsigned char arrays for SHA-1
values.  Convert them to use struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson 99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Christian Couder edfac5ebff builtin/am: use apply API in run_apply()
This replaces run_apply() implementation with a new one that
uses the apply API that has been previously prepared in
apply.c and apply.h.

This shoud improve performance a lot in certain cases.

As the previous implementation was creating a new `git apply`
process to apply each patch, it could be slow on systems like
Windows where it is costly to create new processes.

Also the new `git apply` process had to read the index from
disk, and when the process was done the calling process
discarded its own index and read back from disk the new
index that had been created by the `git apply` process.

This could be very inefficient with big repositories that
have big index files, especially when the system decided
that it was a good idea to run the `git apply` processes on
a different processor core.

Also eliminating index reads enables further performance
improvements by using:

`git update-index --split-index`

For example here is a benchmark of a multi hundred commit
rebase on the Linux kernel on a Debian laptop with SSD:

command: git rebase --onto 1993b17 52bef0c 29dde7c

Vanilla "next" without split index:                1m54.953s
Vanilla "next" with split index:                   1m22.476s
This series on top of "next" without split index:  1m12.034s
This series on top of "next" with split index:     0m15.678s

(using branch "next" from mid April 2016.)

Benchmarked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:54 -07:00
Christian Couder 7e1bad24e3 apply: refactor git apply option parsing
Parsing `git apply` options can be useful to other commands that
want to call the libified apply functionality, because this way
they can easily pass some options from their own command line to
the libified apply functionality.

This will be used by `git am` in a following patch.

To make this possible, let's refactor the `git apply` option
parsing code into a new libified apply_parse_options() function.

Doing that makes it possible to remove some functions definitions
from "apply.h" and make them static in "apply.c".

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder a46160d27e apply: make it possible to silently apply
This changes 'int apply_verbosely' into 'enum apply_verbosity', and
changes the possible values of the variable from a bool to
a tristate.

The previous 'false' state is changed into 'verbosity_normal'.
The previous 'true' state is changed into 'verbosity_verbose'.

The new added state is 'verbosity_silent'. It should prevent
anything to be printed on both stderr and stdout.

This is needed because `git am` wants to first call apply
functionality silently, if it can then fall back on 3-way merge
in case of error.

Printing on stdout, and calls to warning() or error() are not
taken care of in this patch, as that will be done in following
patches.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder 13b5af22f3 apply: move libified code from builtin/apply.c to apply.{c,h}
As most of the apply code in builtin/apply.c has been libified by a number of
previous commits, it can now be moved to apply.{c,h}, so that more code can
use it.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder 803bf4e012 apply: rename and move opt constants to apply.h
The constants for the "inaccurate-eof" and the "recount" options will
be used in both "apply.c" and "builtin/apply.c", so they need to go
into "apply.h", and therefore they need a name that is more specific
to the API they belong to.

Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder da8e30dcd9 builtin/apply: rename option parsing functions
As these functions are going to be part of the libified
apply API, let's give them a name that is more specific
to the apply API.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder 603752a88d builtin/apply: make create_one_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", create_one_file() should return -1 instead of
calling exit().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder 739d8a16b5 builtin/apply: make try_create_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", try_create_file() should return -1 in case of
error.

Unfortunately try_create_file() currently returns -1 to signal a
recoverable error. To fix that, let's make it return 1 in case of
a recoverable error and -1 in case of an unrecoverable error.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Jeff King b773ddea2c pack-objects: walk tag chains for --include-tag
When pack-objects is given --include-tag, it peels each tag
ref down to a non-tag object, and if that non-tag object is
going to be packed, we include the tag, too. But what
happens if we have a chain of tags (e.g., tag "A" points to
tag "B", which points to commit "C")?

We'll peel down to "C" and realize that we want to include
tag "A", but we do not ever consider tag "B", leading to a
broken pack (assuming "B" was not otherwise selected).
Instead, we have to walk the whole chain, adding any tags we
find to the pack.

Interestingly, it doesn't seem possible to trigger this
problem with "git fetch", but you can with "git clone
--single-branch". The reason is that we generate the correct
pack when the client explicitly asks for "A" (because we do
a real reachability analysis there), and "fetch" is more
willing to do so. There are basically two cases:

  1. If "C" is already a ref tip, then the client can deduce
     that it needs "A" itself (via find_non_local_tags), and
     will ask for it explicitly rather than relying on the
     include-tag capability. Everything works.

  2. If "C" is not already a ref tip, then we hope for
     include-tag to send us the correct tag. But it doesn't;
     it generates a broken pack. However, the next step is
     to do a follow-up run of find_non_local_tags(),
     followed by fetch_refs() to backfill any tags we
     learned about.

     In the normal case, fetch_refs() calls quickfetch(),
     which does a connectivity check and sees we have no
     new objects to fetch. We just write the refs.

     But for the broken-pack case, the connectivity check
     fails, and quickfetch will follow-up with the remote,
     asking explicitly for each of the ref tips. This picks
     up the missing object in a new pack.

For a regular "git clone", we are similarly OK, because we
explicitly request all of the tag refs, and get a correct
pack. But with "--single-branch", we kick in tag
auto-following via "include-tag", but do _not_ do a
follow-up backfill. We just take whatever the server sent us
via include-tag and write out tag refs for any tag objects
we were sent. So prior to c6807a4 (clone: open a shortcut
for connectivity check, 2013-05-26), we actually claimed the
clone was a success, but the result was silently
corrupted!  Since c6807a4, index-pack's connectivity
check catches this case, and we correctly complain.

The included test directly checks that pack-objects does not
generate a broken pack, but also confirms that "clone
--single-branch" does not hit the bug.

Note that tag chains introduce another interesting question:
if we are packing the tag "B" but not the commit "C", should
"A" be included?

Both before and after this patch, we do not include "A",
because the initial peel_ref() check only knows about the
bottom-most level, "C". To realize that "B" is involved at
all, we would have to switch to an incremental peel, in
which we examine each tagged object, asking if it is being
packed (and including the outer tag if so).

But that runs contrary to the optimizations in peel_ref(),
which avoid accessing the objects at all, in favor of using
the value we pull from packed-refs. It's OK to walk the
whole chain once we know we're going to include the tag (we
have to access it anyway, so the effort is proportional to
the pack we're generating). But for the initial selection,
we have to look at every ref. If we're only packing a few
objects, we'd still have to parse every single referenced
tag object just to confirm that it isn't part of a tag
chain.

This could be addressed if packed-refs stored the complete
tag chain for each peeled ref (in most cases, this would be
the same cost as now, as each "chain" is only a single
link). But given the size of that project, it's out of scope
for this fix (and probably nobody cares enough anyway, as
it's such an obscure situation). This commit limits itself
to just avoiding the creation of a broken pack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:45:31 -07:00
Junio C Hamano 12cfa792b8 symbolic-ref -d: do not allow removal of HEAD
If you delete the symbolic-ref HEAD from a repository, Git no longer
considers the repository valid, and even "git symbolic-ref HEAD
refs/heads/master" would not be able to recover from that state
(although "git init" can, but that is a sure sign that you are
talking about a "broken" repository).

In the spirit similar to afe5d3d5 ("symbolic ref: refuse non-ref
targets in HEAD", 2009-01-29), forbid removal of HEAD to avoid
corrupting a repository.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-02 09:01:38 -07:00
Jacob Keller 660e113ce1 graph: add support for --line-prefix on all graph-aware output
Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.

To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.

This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.

This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.

Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:09 -07:00
Junio C Hamano 4762bf36d9 Merge branch 'mh/blame-worktree'
* mh/blame-worktree:
  blame: fix segfault on untracked files
2016-08-31 10:03:50 -07:00
Ralf Thielow af74128f4a help: introduce option --exclude-guides
Introduce option --exclude-guides to the help command.  With this option
being passed, "git help" will open man pages only for actual commands.

Since we know it is a command, we can use function help_unknown_command
to give the user advice on typos.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-30 16:09:41 -07:00
Junio C Hamano a77598ef44 am: refactor read_author_script()
By splitting the part that reads from a file and the part that
parses the variable definitions from the contents, make the latter
can be more reusable in the future.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-30 12:36:42 -07:00
Thomas Gummerer bc6b13a7d2 blame: fix segfault on untracked files
Since 3b75ee9 ("blame: allow to blame paths freshly added to the index",
2016-07-16) git blame also looks at the index to determine if there is a
file that was freshly added to the index.

cache_name_pos returns -pos - 1 in case there is no match is found, or
if the name matches, but the entry has a stage other than 0.  As git
blame should work for unmerged files, it uses strcmp to determine
whether the name of the returned position matches, in which case the
file exists, but is merely unmerged, or if the file actually doesn't
exist in the index.

If the repository is empty, or if the file would lexicographically be
sorted as the last file in the repository, -cache_name_pos - 1 is
outside of the length of the active_cache array, causing git blame to
segfault.  Guard against that, and die() normally to restore the old
behaviour.

Reported-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-29 11:57:33 -07:00
Junio C Hamano 3b1c6a9b6e Merge branch 'js/no-html-bypass-on-windows' into rt/help-unknown
* js/no-html-bypass-on-windows:
  Revert "display HTML in default browser using Windows' shell API"
2016-08-26 11:29:07 -07:00
Junio C Hamano 13e11ff707 Merge branch 'js/no-html-bypass-on-windows'
On Windows, help.browser configuration variable used to be ignored,
which has been corrected.

* js/no-html-bypass-on-windows:
  Revert "display HTML in default browser using Windows' shell API"
2016-08-25 13:55:06 -07:00
Jeff King c08db5a2d0 receive-pack: allow a maximum input size to be specified
Receive-pack feeds its input to either index-pack or
unpack-objects, which will happily accept as many bytes as
a sender is willing to provide. Let's allow an arbitrary
cutoff point where we will stop writing bytes to disk.

Cleaning up what has already been written to disk is a
related problem that is not addressed by this patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Christian Couder 5ad2186733 unpack-objects: add --max-input-size=<size> option
When receiving a pack-file, it can be useful to abort the
`git unpack-objects`, if the pack-file is too big.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Jeff King 411481be6f index-pack: add --max-input-size=<size> option
When receiving a pack-file, it can be useful to abort the
`git index-pack`, if the pack-file is too big.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Jacob Keller 957ed3a56c format-patch: show 0/1 and 1/1 for singleton patch with cover letter
Change the default behavior of git-format-patch to generate numbered
sequence of 0/1 and 1/1 when generating both a cover-letter and a single
patch. This standardizes the cover letter to have 0/N which helps
distinguish the cover letter from the patch itself. Since the behavior
is easily changed via configuration as well as the use of -n and -N this
should be acceptable default behavior.

Add tests for the new default behavior.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-23 15:59:11 -07:00
Junio C Hamano e6dab9f62f Merge branch 'sb/checkout-explit-detach-no-advice'
"git checkout --detach <branch>" used to give the same advice
message as that is issued when "git checkout <tag>" (or anything
that is not a branch name) is given, but asking with "--detach" is
an explicit enough sign that the user knows what is going on.  The
advice message has been squelched in this case.

* sb/checkout-explit-detach-no-advice:
  checkout: do not mention detach advice for explicit --detach option
2016-08-19 15:34:15 -07:00
Johannes Schindelin 6db5967d4e Revert "display HTML in default browser using Windows' shell API"
Since 4804aab (help (Windows): Display HTML in default browser using
Windows' shell API, 2008-07-13), Git for Windows used to call
`ShellExecute()` to launch the default Windows handler for `.html`
files.

The idea was to avoid going through a shell script, for performance
reasons.

However, this change ignores the `help.browser` config setting. Together
with browsing help not being a performance-critical operation, let's
just revert that patch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-19 13:47:28 -07:00
Stefan Beller 31224cbdc7 clone: recursive and reference option triggers submodule alternates
When `--recursive` and `--reference` is given, it is reasonable to
expect that the submodules are created with references to the submodules
of the given alternate for the superproject.

  An initial attempt to do this was presented to the mailing list, which
  used flags that are passed around ("--super-reference") that instructed
  the submodule clone to look for a reference in the submodules of the
  referenced superproject. This is not well thought out, as any further
  `submodule update` should also respect the initial setup.

  When a new  submodule is added to the superproject and the alternate
  of the superproject does not know about that submodule yet, we rather
  error out informing the user instead of being unclear if we did or did
  not use a submodules alternate.

To solve this problem introduce new options that store the configuration
for what the user wanted originally.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-17 17:19:11 -07:00
Junio C Hamano 2f664566c5 Merge branch 'jk/tighten-alloc'
Small code and comment clean-up.

* jk/tighten-alloc:
  receive-pack: use FLEX_ALLOC_MEM in queue_command()
  correct FLEXPTR_* example in comment
2016-08-17 14:07:46 -07:00
Stefan Beller f7415b4d71 clone: implement optional references
In a later patch we want to try to create alternates for submodules,
but they might not exist in the referenced superproject. So add a way
to skip the non existing references and report them.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:28:45 -07:00
Stefan Beller 5e40800df2 clone: clarify option_reference as required
In the next patch we introduce optional references; To better distinguish
between optional and required references we rename the variable.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:28:42 -07:00
Stefan Beller 9eeea7d2bc clone: factor out checking for an alternate path
In a later patch we want to determine if a path is suitable as an
alternate from other commands than builtin/clone. Move the checking
functionality of `add_one_reference` to `compute_alternate_path` that is
defined in cache.h.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:28:01 -07:00
Stefan Beller 779b88a91f checkout: do not mention detach advice for explicit --detach option
When a user asked for a detached HEAD specifically with `--detach`,
we do not need to give advice on what a detached HEAD state entails as
we can assume they know what they're getting into as they asked for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:01:45 -07:00
René Scharfe ddd0bfac7c receive-pack: use FLEX_ALLOC_MEM in queue_command()
Use the macro FLEX_ALLOC_MEM instead of open-coding it.  This shortens
and simplifies the code a bit.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-13 19:49:30 -07:00
Stefan Beller 5f50f33e87 submodule--helper update-clone: allow multiple references
Allow the user to pass in multiple references to update_clone.
Currently this is only internal API, but once the shell script is
replaced by a C version, this is needed.

This fixes an API bug between the shell script and the helper.
Currently the helper accepts "--reference" "--reference=foo"
as a OPT_STRING whose value happens to be "--reference=foo", and
then uses

        if (suc->reference)
                argv_array_push(&child->args, suc->reference)

where suc->reference _is_ "--reference=foo" when invoking the
underlying "git clone", it cancels out.

With this change we omit one of the "--reference" arguments when
passing references from the shell script to the helper.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 15:00:17 -07:00
Stefan Beller 965dbea09a submodule--helper module-clone: allow multiple references
Allow users to pass in multiple references, just as clone accepts multiple
references as well.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 15:00:17 -07:00
Junio C Hamano dd610aeda6 Merge branch 'kw/patch-ids-optim'
When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.

* kw/patch-ids-optim:
  rebase: avoid computing unnecessary patch IDs
  patch-ids: add flag to create the diff patch id using header only data
  patch-ids: replace the seen indicator with a commit pointer
  patch-ids: stop using a hand-rolled hashmap implementation
2016-08-12 09:47:39 -07:00
Junio C Hamano 2c44b7a53b Merge branch 'js/mv-dir-to-new-directory'
"git mv dir non-existing-dir/" did not work in some environments
the same way as existing mainstream platforms.  The code now moves
"dir" to "non-existing-dir", without relying on rename("A", "B/")
that strips the trailing slash of '/'.

* js/mv-dir-to-new-directory:
  git mv: do not keep slash in `git mv dir non-existing-dir/`
2016-08-12 09:47:37 -07:00
Junio C Hamano 0a315befa7 Merge branch 'rs/use-strbuf-add-unique-abbrev'
A small code clean-up.

* rs/use-strbuf-add-unique-abbrev:
  use strbuf_add_unique_abbrev() for adding short hashes
2016-08-12 09:47:37 -07:00
Junio C Hamano b32d7c524b Merge branch 'rs/merge-add-strategies-simplification'
A small code clean-up.

* rs/merge-add-strategies-simplification:
  merge: use string_list_split() in add_strategies()
2016-08-12 09:47:36 -07:00
Junio C Hamano 18f3ce8841 Merge branch 'rs/child-process-init'
A small code clean-up.

* rs/child-process-init:
  use CHILD_PROCESS_INIT to initialize automatic variables
2016-08-12 09:47:36 -07:00
Junio C Hamano 2f9c615efb Merge branch 'sb/submodule-clone-retry'
Fix-up to an error codepath in a topic already in 'master'.

* sb/submodule-clone-retry:
  submodule--helper: use parallel processor correctly
2016-08-12 09:47:34 -07:00
Junio C Hamano f4fd627661 Merge branch 'jk/reset-ident-time-per-commit' into maint
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.

* jk/reset-ident-time-per-commit:
  am: reset cached ident date for each patch
2016-08-12 09:16:56 -07:00
Kevin Willford b3dfeebb92 rebase: avoid computing unnecessary patch IDs
The `rebase` family of Git commands avoid applying patches that were
already integrated upstream. They do that by using the revision walking
option that computes the patch IDs of the two sides of the rebase
(local-only patches vs upstream-only ones) and skipping those local
patches whose patch ID matches one of the upstream ones.

In many cases, this causes unnecessary churn, as already the set of
paths touched by a given commit would suffice to determine that an
upstream patch has no local equivalent.

This hurts performance in particular when there are a lot of upstream
patches, and/or large ones.

Therefore, let's introduce the concept of a "diff-header-only" patch ID,
compare those first, and only evaluate the "full" patch ID lazily.

Please note that in contrast to the "full" patch IDs, those
"diff-header-only" patch IDs are prone to collide with one another, as
adjacent commits frequently touch the very same files. Hence we now
have to be careful to allow multiple hash entries with the same hash.
We accomplish that by using the hashmap_add() function that does not even
test for hash collisions.  This also allows us to evaluate the full patch ID
lazily, i.e. only when we found commits with matching diff-header-only
patch IDs.

We add a performance test that demonstrates ~1-6% improvement.  In
practice this will depend on various factors such as how many upstream
changes and how big those changes are along with whether file system
caches are cold or warm.  As Git's test suite has no way of catching
performance regressions, we also add a regression test that verifies
that the full patch ID computation is skipped when the diff-header-only
computation suffices.

Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 14:39:16 -07:00
Christian Couder ccceb7bb13 builtin/apply: make write_out_results() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", write_out_results() should return -1 instead of
calling exit().

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder 434389deb1 builtin/apply: make write_out_one_result() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", write_out_one_result() should just return what
remove_file() and create_file() are returning instead of calling
exit().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder 8f5b5445d7 builtin/apply: make create_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", create_file() should just return what
add_conflicted_stages_file() and add_index_file() are returning
instead of calling exit().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder 69e1609f81 builtin/apply: make add_index_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", add_index_file() should return -1 instead of
calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder a902edceeb builtin/apply: make add_conflicted_stages_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", add_conflicted_stages_file() should return -1
instead of calling die().

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder 6e8df31469 builtin/apply: make remove_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", remove_file() should return -1 instead of
calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder fe41b80225 builtin/apply: make build_fake_ancestor() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", build_fake_ancestor() should return -1 instead
of calling die().

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder 119ab159e6 builtin/apply: change die_on_unsafe_path() to check_unsafe_path()
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", die_on_unsafe_path() should return a negative
integer instead of calling die(), so while doing that let's change
its name to check_unsafe_path().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder dbf1b5fb6a builtin/apply: make gitdiff_*() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", gitdiff_*() functions should return -1 instead
of calling die().

A previous patch made it possible for gitdiff_*() functions to
return -1 in case of error. Let's take advantage of that to
make gitdiff_verify_name() return -1 on error, and to have
gitdiff_oldname() and gitdiff_newname() directly return
what gitdiff_verify_name() returns.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder 70af7662d4 builtin/apply: make gitdiff_*() return 1 at end of header
The gitdiff_*() functions that are called as p->fn() in parse_git_header()
should return 1 instead of -1 in case of end of header or unrecognized
input, as these are not real errors. It just instructs the parser to break
out.

This makes it possible for gitdiff_*() functions to return -1 in case of a
real error. This will be done in a following patch.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder 9724e6ff48 builtin/apply: make parse_traditional_patch() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", parse_traditional_patch() should return -1
instead of calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder fef7ba5353 builtin/apply: make apply_all_patches() return 128 or 1 on error
To finish libifying the apply functionality, apply_all_patches() should not
die() or exit() in case of error, but return either 128 or 1, so that it
gives the same exit code as when die() or exit(1) is called. This way
scripts relying on the exit code don't need to be changed.

While doing that we must take care that file descriptors are properly closed
and, if needed, reset to a sensible value.

Also, according to the lockfile API, when finished with a lockfile, one
should either commit it or roll it back.

This is even more important now that the same lockfile can be passed
to init_apply_state() many times to be reused by series of calls to
the apply lib functions.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder b6446d54ec builtin/apply: move check_apply_state() to apply.c
To libify `git apply` functionality we must make check_apply_state()
usable outside "builtin/apply.c".

Let's do that by moving it into "apply.c".

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder f36538d88b builtin/apply: make check_apply_state() return -1 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", check_apply_state() should return -1 instead of
calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder 2f5a6d1218 apply: make init_apply_state() return -1 instead of exit()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", init_apply_state() should return -1 instead of
calling exit().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder bb493a5c14 builtin/apply: move init_apply_state() to apply.c
To libify `git apply` functionality we must make init_apply_state()
usable outside "builtin/apply.c".

Let's do that by moving it into a new "apply.c".

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder f95fdc256b builtin/apply: make parse_ignorewhitespace_option() return -1 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", parse_ignorewhitespace_option() should return
-1 instead of calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder aaf6c447aa builtin/apply: make parse_whitespace_option() return -1 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, parse_whitespace_option() should return -1 instead
of calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder dae197f753 builtin/apply: make parse_single_patch() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, parse_single_patch() should return a negative
integer instead of calling die().

Let's do that by using error() and let's adjust the related test
cases accordingly.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder b654b34c1c builtin/apply: make parse_chunk() return a negative integer on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing or exit()ing.

To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, parse_chunk() should return a negative integer
instead of calling die() or exit().

As parse_chunk() is called only by apply_patch() which already
returns either -1 or -128 when an error happened, let's make it also
return -1 or -128.

This makes it compatible with what find_header() and parse_binary()
already return.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder 5950851e44 builtin/apply: make find_header() return -128 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, let's make find_header() return -128 instead of
calling die().

We could make it return -1, unfortunately find_header() already
returns -1 when no header is found.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder 3bee345d7b builtin/apply: read_patch_file() return -1 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing. Let's do that by returning -1 instead of
die()ing in read_patch_file().

Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder f07a9f7643 builtin/apply: make apply_patch() return -1 or -128 instead of die()ing
To libify `git apply` functionality we have to signal errors
to the caller instead of die()ing.

As a first step in this direction, let's make apply_patch() return
-1 or -128 in case of errors instead of dying. For now its only
caller apply_all_patches() will exit(128) when apply_patch()
return -128 and it will exit(1) when it returns -1.

We exit() with code 128 because that was what die() was doing
and we want to keep the distinction between exiting with code 1
and exiting with code 128.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder 71501a71d0 apply: move 'struct apply_state' to apply.h
To libify `git apply` functionality we must make 'struct apply_state'
usable outside "builtin/apply.c".

Let's do that by creating a new "apply.h" and moving
'struct apply_state' there.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder 4d5acae0ca apply: make some names more specific
To prepare for some structs and constants being moved from
builtin/apply.c to apply.h, we should give them some more
specific names to avoid possible name collisions in the global
namespace.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:09 -07:00
Jeff King 07e7dbf0db gc: default aggressive depth to 50
This commit message is long and has lots of background and
numbers. The summary is: the current default of 250 doesn't
save much space, and costs CPU. It's not a good tradeoff.
Read on for details.

The "--aggressive" flag to git-gc does three things:

  1. use "-f" to throw out existing deltas and recompute from
     scratch

  2. use "--window=250" to look harder for deltas

  3. use "--depth=250" to make longer delta chains

Items (1) and (2) are good matches for an "aggressive"
repack. They ask the repack to do more computation work in
the hopes of getting a better pack. You pay the costs during
the repack, and other operations see only the benefit.

Item (3) is not so clear. Allowing longer chains means fewer
restrictions on the deltas, which means potentially finding
better ones and saving some space. But it also means that
operations which access the deltas have to follow longer
chains, which affects their performance. So it's a tradeoff,
and it's not clear that the tradeoff is even a good one.

The existing "250" numbers for "--aggressive" come
originally from this thread:

  http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/

where Linus says:

  So when I said "--depth=250 --window=250", I chose those
  numbers more as an example of extremely aggressive
  packing, and I'm not at all sure that the end result is
  necessarily wonderfully usable. It's going to save disk
  space (and network bandwidth - the delta's will be re-used
  for the network protocol too!), but there are definitely
  downsides too, and using long delta chains may
  simply not be worth it in practice.

There are some numbers in that thread, but they're mostly
focused on the improved window size, and measure the
improvement from --depth=250 and --window=250 together.
E.g.:

  http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/

talks about the improved run-time of "git-blame", which
comes from the reduced pack size. But most of that reduction
is coming from --window=250, whereas most of the extra costs
come from --depth=250. There's a link in that thread showing
that increasing the depth beyond 50 doesn't seem to help
much with the size:

  https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html

but again, no discussion of the timing impact.

In an earlier thread from Ted Ts'o which discussed setting
the non-aggressive default (from 10 to 50):

  http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/

we have more numbers, with the conclusion that going past 50
does not help size much, and hurts the speed of normal
operations.

So from that, we might guess that 50 is actually a sweet
spot, even for aggressive, if we interpret aggressive to
"spend time now to make a better pack". It is not clear that
"--depth=250" is actually a better pack. It may be slightly
_smaller_, but it carries a run-time penalty.

Here are some more recent timings I did to verify that. They
show three things:

  - the size of the resulting pack (so disk saved to store,
    bandwidth saved on clones/fetches)

  - the cost of "rev-list --objects --all", which shows the
    effect of the delta chains on trees (commits typically
    don't delta, and the command doesn't touch the blobs at
    all)

  - the cost of "log -Sfoo", which will additionally access
    each blob

All cases were repacked with "git repack -adf --depth=$d
--window=250" (so basically, what would happen if we tweaked
the "gc --aggressive" default depth).

The timings are all wall-clock best-of-3. The machine itself
has plenty of RAM compared to the repositories (which is
probably typical of most workstations these days), so we're
really measuring CPU usage, as the whole thing will be in
disk cache after the first run.

The core.deltaBaseCacheLimit is at its default of 96MiB.
It's possible that tweaking it would have some impact on the
tests, as some of them (especially "log -S" on a large repo)
are likely to overflow that. But bumping that carries a
run-time memory cost, so for these tests, I focused on what
we could do just with the on-disk pack tradeoffs.

Each test is done for four depths: 250 (the current value),
50 (the current default that tested well previously), 100
(to show something on the larger side, which previous tests
showed was not a good tradeoff), and 10 (the very old
default, which previous tests showed was worse than 50).

Here are the numbers for linux.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  | 967MB |  n/a  | 48.159s  |   n/a  | 378.088   |   n/a
    100  | 971MB | +0.4% | 41.471s  | -13.9% | 342.060   |  -9.5%
     50  | 979MB | +1.2% | 37.778s  | -21.6% | 311.040s  | -17.7%
     10  | 1.1GB | +6.6% | 32.518s  | -32.5% | 279.890s  | -25.9%

and for git.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  |  48MB |  n/a  |  2.215s  |   n/a  |  20.922s  |   n/a
    100  |  49MB | +0.5% |  2.140s  |  -3.4% |  17.736s  | -15.2%
     50  |  49MB | +1.7% |  2.099s  |  -5.2% |  15.418s  | -26.3%
     10  |  53MB | +9.3% |  2.001s  |  -9.7% |  12.677s  | -39.4%

You can see that that the CPU savings for regular operations improves as we
decrease the depth. The savings are less for "rev-list" on a smaller repository
than they are for blob-accessing operations, or even rev-list on a larger
repository. This may mean that a larger delta cache would help (though setting
core.deltaBaseCacheLimit by itself doesn't).

But we can also see that the space savings are not that great as the depth goes
higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff.
Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is
probably not.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:53:19 -07:00
Jeff Hostetler d9fc746cd7 status: print branch info with --porcelain=v2 --branch
Expand porcelain v2 output to include branch and tracking
branch information. This includes the commit id, the branch,
the upstream branch, and the ahead and behind counts.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:15:40 -07:00
Jeff Hostetler 1ecdecce62 status: collect per-file data for --porcelain=v2
Collect extra per-file data for porcelain V2 format.

The output of `git status --porcelain` leaves out many
details about the current status that clients might like
to have.  This can force them to be less efficient as they
may need to launch secondary commands (and try to match
the logic within git) to accumulate this extra information.
For example, a GUI IDE might want the file mode to display
the correct icon for a changed item (without having to stat
it afterwards).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:14:43 -07:00
Jeff King c9af708b1a pack-objects: use mru list when iterating over packs
In the original implementation of want_object_in_pack(), we
always looked for the object in every pack, so the order did
not matter for performance.

As of the last few patches, however, we can now often break
out of the loop early after finding the first instance, and
avoid looking in the other packs at all. In this case, pack
order can make a big difference, because we'd like to find
the objects by looking at as few packs as possible.

This patch switches us to the same packed_git_mru list that
is now used by normal object lookups.

Here are timings for p5303 on linux.git:

Test                      HEAD^                HEAD
------------------------------------------------------------------------
5303.3: rev-list (1)      31.31(31.07+0.23)    31.28(31.00+0.27) -0.1%
5303.4: repack (1)        40.35(38.84+2.60)    40.53(39.31+2.32) +0.4%
5303.6: rev-list (50)     31.37(31.15+0.21)    31.41(31.16+0.24) +0.1%
5303.7: repack (50)       58.25(68.54+2.03)    47.28(57.66+1.89) -18.8%
5303.9: rev-list (1000)   31.91(31.57+0.33)    31.93(31.64+0.28) +0.1%
5303.10: repack (1000)    304.80(376.00+3.92)  87.21(159.54+2.84) -71.4%

The rev-list numbers are unchanged, which makes sense (they
are not exercising this code at all). The 50- and 1000-pack
repack cases show considerable improvement.

The single-pack repack case doesn't, of course; there's
nothing to improve. In fact, it gives us a baseline for how
fast we could possibly go. You can see that though rev-list
can approach the single-pack case even with 1000 packs,
repack doesn't. The reason is simple: the loop we are
optimizing is only part of what the repack is doing. After
the "counting" phase, we do delta compression, which is much
more expensive when there are multiple packs, because we
have fewer deltas we can reuse (you can also see that these
numbers come from a multicore machine; the CPU times are
much higher than the wall-clock times due to the delta
phase).

So the good news is that in cases with many packs, we used
to be dominated by the "counting" phase, and now we are
dominated by the delta compression (which is faster, and
which we have already parallelized).

Here are similar numbers for git.git:

Test                      HEAD^               HEAD
---------------------------------------------------------------------
5303.3: rev-list (1)      1.55(1.51+0.02)     1.54(1.53+0.00) -0.6%
5303.4: repack (1)        1.82(1.80+0.08)     1.82(1.78+0.09) +0.0%
5303.6: rev-list (50)     1.58(1.57+0.00)     1.58(1.56+0.01) +0.0%
5303.7: repack (50)       2.50(3.12+0.07)     2.31(2.95+0.06) -7.6%
5303.9: rev-list (1000)   2.22(2.20+0.02)     2.23(2.19+0.03) +0.5%
5303.10: repack (1000)    10.47(16.78+0.22)   7.50(13.76+0.22) -28.4%

Not as impressive in terms of percentage, but still
measurable wins.  If you look at the wall-clock time
improvements in the 1000-pack case, you can see that linux
improved by roughly 10x as many seconds as git. That's
because it has roughly 10x as many objects, and we'd expect
this improvement to scale linearly with the number of
objects (since the number of packs is kept constant). It's
just that the "counting" phase is a smaller percentage of
the total time spent for a git.git repack, and hence the
percentage win is smaller.

The implementation itself is a straightforward use of the
MRU code. We only bother marking a pack as used when we know
that we are able to break early out of the loop, for two
reasons:

  1. If we can't break out early, it does no good; we have
     to visit each pack anyway, so we might as well avoid
     even the minor overhead of managing the cache order.

  2. The mru_mark() function reorders the list, which would
     screw up our traversal. So it is only safe to mark when
     we are about to break out of the loop. We could record
     the found pack and mark it after the loop finishes, of
     course, but that's more complicated and it doesn't buy
     us anything due to (1).

Note that this reordering does have a potential impact on
the final pack, as we store only a single "found" pack for
each object, even if it is present in multiple packs. In
principle, any copy is acceptable, as they all refer to the
same content. But in practice, they may differ in whether
they are stored as deltas, against which base, etc. This may
have an impact on delta reuse, and even the delta search
(since we skip pairs that were already in the same pack).

It's not clear whether this change of order would hurt or
even help average cases, though. The most likely reason to
have duplicate objects is from the completion of thin packs
(e.g., you have some objects in a "base" pack, then receive
several pushes; the packs you receive may be thin on the
wire, with deltas that refer to bases outside the pack, but
we complete them with duplicate base objects when indexing
them).

In such a case the current code would always find the thin
duplicates (because we currently walk the packs in reverse
chronological order). Whereas with this patch, some of those
duplicates would be found in the base pack instead.

In my tests repacking a real-world case of linux.git with
3600 thin-pack pushes (on top of a large "base" pack), the
resulting pack was about 0.04% larger with this patch. On
the other hand, because we were more likely to hit the base
pack, there were more opportunities for delta reuse, and we
had 50,000 fewer objects to examine in the delta search.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 10:44:23 -07:00
Jeff King 4cf2143e02 pack-objects: break delta cycles before delta-search phase
We do not allow cycles in the delta graph of a pack (i.e., A
is a delta of B which is a delta of A) for the obvious
reason that you cannot actually access any of the objects in
such a case.

There's a last-ditch attempt to notice cycles during the
write phase, during which we issue a warning to the user and
write one of the objects out in full. However, this is
"last-ditch" for two reasons:

  1. By this time, it's too late to find another delta for
     the object, so the resulting pack is larger than it
     otherwise could be.

  2. The warning is there because this is something that
     _shouldn't_ ever happen. If it does, then either:

       a. a pack we are reusing deltas from had its own
          cycle

       b. we are reusing deltas from multiple packs, and
          we found a cycle among them (i.e., A is a delta of
          B in one pack, but B is a delta of A in another,
          and we choose to use both deltas).

       c. there is a bug in the delta-search code

     So this code serves as a final check that none of these
     things has happened, warns the user, and prevents us
     from writing a bogus pack.

Right now, (2b) should never happen because of the static
ordering of packs in want_object_in_pack(). If two objects
have a delta relationship, then they must be in the same
pack, and therefore we will find them from that same pack.

However, a future patch would like to change that static
ordering, which will make (2b) a common occurrence. In
preparation, we should be able to handle those kinds of
cycles better. This patch does by introducing a
cycle-breaking step during the get_object_details() phase,
when we are deciding which deltas can be reused. That gives
us the chance to feed the objects into the delta search as
if the cycle did not exist.

We'll leave the detection and warning in the write_object()
phase in place, as it still serves as a check for case (2c).

This does mean we will stop warning for (2a). That case is
caused by bogus input packs, and we ideally would warn the
user about it.  However, since those cycles show up after
picking reusable deltas, they look the same as (2b) to us;
our new code will break the cycles early and the last-ditch
check will never see them.

We could do analysis on any cycles that we find to
distinguish the two cases (i.e., it is a bogus pack if and
only if every delta in the cycle is in the same pack), but
we don't need to. If there is a cycle inside a pack, we'll
run into problems not only reusing the delta, but accessing
the object data at all. So when we try to dig up the actual
size of the object, we'll hit that same cycle and kick in
our usual complain-and-try-another-source code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 10:44:13 -07:00
Jeff King 27b5c1a065 provide an initializer for "struct object_info"
An all-zero initializer is fine for this struct, but because
the first element is a pointer, call sites need to know to
use "NULL" instead of "0". Otherwise some static checkers
like "sparse" will complain; see d099b71 (Fix some sparse
warnings, 2013-07-18) for example.  So let's provide an
initializer to make this easier to get right.

But let's also comment that memset() to zero is explicitly
OK[1]. One of the callers embeds object_info in another
struct which is initialized via memset (expand_data in
builtin/cat-file.c). Since our subset of C doesn't allow
assignment from a compound literal, handling this in any
other way is awkward, so we'd like to keep the ability to
initialize by memset(). By documenting this property, it
should make anybody who wants to change the initializer
think twice before doing so.

There's one other caller of interest. In parse_sha1_header(),
we did not initialize the struct fully in the first place.
This turned out not to be a bug because the sub-function it
calls does not look at any other fields except the ones we
did initialize. But that assumption might not hold in the
future, so it's a dangerous construct. This patch switches
it to initializing the whole struct, which protects us
against unexpected reads of the other fields.

[1] Obviously using memset() to initialize a pointer
    violates the C standard, but we long ago decided that it
    was an acceptable tradeoff in the real world.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 10:42:23 -07:00
Junio C Hamano 11b53957ac Merge branch 'sb/submodule-update-dot-branch'
A few updates to "git submodule update".

Use of "| wc -l" break with BSD variant of 'wc'.

* sb/submodule-update-dot-branch:
  t7406: fix breakage on OSX
  submodule update: allow '.' for branch value
  submodule--helper: add remote-branch helper
  submodule-config: keep configured branch around
  submodule--helper: fix usage string for relative-path
  submodule update: narrow scope of local variable
  submodule update: respect depth in subsequent fetches
  t7406: future proof tests with hard coded depth
2016-08-10 12:33:20 -07:00
Junio C Hamano 1a5f1a3f25 Merge branch 'js/am-3-merge-recursive-direct'
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.

* js/am-3-merge-recursive-direct:
  merge-recursive: flush output buffer even when erroring out
  merge_trees(): ensure that the callers release output buffer
  merge-recursive: offer an option to retain the output in 'obuf'
  merge-recursive: write the commit title in one go
  merge-recursive: flush output buffer before printing error messages
  am -3: use merge_recursive() directly again
  merge-recursive: switch to returning errors instead of dying
  merge-recursive: handle return values indicating errors
  merge-recursive: allow write_tree_from_memory() to error out
  merge-recursive: avoid returning a wholesale struct
  merge_recursive: abort properly upon errors
  prepare the builtins for a libified merge_recursive()
  merge-recursive: clarify code in was_tracked()
  die(_("BUG")): avoid translating bug messages
  die("bug"): report bugs consistently
  t5520: verify that `pull --rebase` shows the helpful advice when failing
2016-08-10 12:33:20 -07:00
Junio C Hamano 7a3ea66633 Merge branch 'js/commit-slab-decl-fix'
* js/commit-slab-decl-fix:
  commit-slab.h: avoid duplicated global static variables
  config.c: avoid duplicated global static variables
2016-08-10 12:33:20 -07:00