Commit graph

11182 commits

Author SHA1 Message Date
Junio C Hamano d845d727cb Merge branch 'jk/setup-sequence-update'
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour.  The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.

* jk/setup-sequence-update:
  t1007: factor out repeated setup
  init: reset cached config when entering new repo
  init: expand comments explaining config trickery
  config: only read .git/config from configured repos
  test-config: setup git directory
  t1302: use "git -C"
  pager: handle early config
  pager: use callbacks instead of configset
  pager: make pager_program a file-local static
  pager: stop loading git_default_config()
  pager: remove obsolete comment
  diff: always try to set up the repository
  diff: handle --no-index prefixes consistently
  diff: skip implicit no-index check when given --no-index
  patch-id: use RUN_SETUP_GENTLY
  hash-object: always try to set up the git repository
2016-09-21 15:15:24 -07:00
Junio C Hamano 7f109ef54e Merge branch 'ks/pack-objects-bitmap'
Some codepaths in "git pack-objects" were not ready to use an
existing pack bitmap; now they are and as the result they have
become faster.

* ks/pack-objects-bitmap:
  pack-objects: use reachability bitmap index when generating non-stdout pack
  pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
2016-09-21 15:15:21 -07:00
Junio C Hamano 7889ed25ac Merge branch 'js/cat-file-filters'
Even though "git hash-objects", which is a tool to take an
on-filesystem data stream and put it into the Git object store,
allowed to perform the "outside-world-to-Git" conversions (e.g.
end-of-line conversions and application of the clean-filter), and
it had the feature on by default from very early days, its reverse
operation "git cat-file", which takes an object from the Git object
store and externalize for the consumption by the outside world,
lacked an equivalent mechanism to run the "Git-to-outside-world"
conversion.  The command learned the "--filters" option to do so.

* js/cat-file-filters:
  cat-file: support --textconv/--filters in batch mode
  cat-file --textconv/--filters: allow specifying the path separately
  cat-file: introduce the --filters option
  cat-file: fix a grammo in the man page
2016-09-21 15:15:19 -07:00
Junio C Hamano 07d872434d Merge branch 'jt/accept-capability-advertisement-when-fetching-from-void'
JGit can show a fake ref "capabilities^{}" to "git fetch" when it
does not advertise any refs, but "git fetch" was not prepared to
see such an advertisement.  When the other side disconnects without
giving any ref advertisement, we used to say "there may not be a
repository at that URL", but we may have seen other advertisement
like "shallow" and ".have" in which case we definitely know that a
repository is there.  The code to detect this case has also been
updated.

* jt/accept-capability-advertisement-when-fetching-from-void:
  connect: advertized capability is not a ref
  connect: tighten check for unexpected early hang up
  tests: move test_lazy_prereq JGIT to test-lib.sh
2016-09-21 15:15:18 -07:00
Junio C Hamano 4af9a7d344 Merge branch 'bc/object-id'
The "unsigned char sha1[20]" to "struct object_id" conversion
continues.  Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.

It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2").  Extra sets of eyes double-checking for
mismerges are highly appreciated.

* bc/object-id:
  builtin/reset: convert to use struct object_id
  builtin/commit-tree: convert to struct object_id
  builtin/am: convert to struct object_id
  refs: add an update_ref_oid function.
  sha1_name: convert get_sha1_mb to struct object_id
  builtin/update-index: convert file to struct object_id
  notes: convert init_notes to use struct object_id
  builtin/rm: convert to use struct object_id
  builtin/blame: convert file to use struct object_id
  Convert read_mmblob to take struct object_id.
  notes-merge: convert struct notes_merge_pair to struct object_id
  builtin/checkout: convert some static functions to struct object_id
  streaming: make stream_blob_to_fd take struct object_id
  builtin: convert textconv_object to use struct object_id
  builtin/cat-file: convert some static functions to struct object_id
  builtin/cat-file: convert struct expand_data to use struct object_id
  builtin/log: convert some static functions to use struct object_id
  builtin/blame: convert struct origin to use struct object_id
  builtin/apply: convert static functions to struct object_id
  cache: convert struct cache_entry to use struct object_id
2016-09-19 13:47:19 -07:00
Junio C Hamano 81358dc238 Merge branch 'cc/apply-am'
"git am" has been taught to make an internal call to "git apply"'s
innards without spawning the latter as a separate process.

* cc/apply-am: (41 commits)
  builtin/am: use apply API in run_apply()
  apply: learn to use a different index file
  apply: pass apply state to build_fake_ancestor()
  apply: refactor `git apply` option parsing
  apply: change error_routine when silent
  usage: add get_error_routine() and get_warn_routine()
  usage: add set_warn_routine()
  apply: don't print on stdout in verbosity_silent mode
  apply: make it possible to silently apply
  apply: use error_errno() where possible
  apply: make some parsing functions static again
  apply: move libified code from builtin/apply.c to apply.{c,h}
  apply: rename and move opt constants to apply.h
  builtin/apply: rename option parsing functions
  builtin/apply: make create_one_file() return -1 on error
  builtin/apply: make try_create_file() return -1 on error
  builtin/apply: make write_out_results() return -1 on error
  builtin/apply: make write_out_one_result() return -1 on error
  builtin/apply: make create_file() return -1 on error
  builtin/apply: make add_index_file() return -1 on error
  ...
2016-09-19 13:47:18 -07:00
Junio C Hamano c13f458d86 Merge branch 'jk/fix-remote-curl-url-wo-proto'
"git fetch http::/site/path" did not die correctly and segfaulted
instead.

* jk/fix-remote-curl-url-wo-proto:
  remote-curl: handle URLs without protocol
2016-09-15 14:11:15 -07:00
Junio C Hamano 9883ec2c73 Merge branch 'jk/pack-tag-of-tag'
"git pack-objects --include-tag" was taught that when we know that
we are sending an object C, we want a tag B that directly points at
C but also a tag A that points at the tag B.  We used to miss the
intermediate tag B in some cases.

* jk/pack-tag-of-tag:
  pack-objects: walk tag chains for --include-tag
  t5305: simplify packname handling
  t5305: use "git -C"
  t5305: drop "dry-run" of unpack-objects
  t5305: move cleanup into test block
2016-09-15 14:11:14 -07:00
Jeff King 4d0efa101b t1007: factor out repeated setup
We have a series of 3 CRLF tests that do exactly the same
(long) setup sequence. Let's pull it out into a common setup
test, which is shorter, more efficient, and will make it
easier to add new tests.

Note that we don't have to worry about cleaning up any of
the setup which was previously per-test; we call pop_repo
after the CRLF tests, which cleans up everything.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 4543926ba8 init: reset cached config when entering new repo
After we copy the templates into place, we re-read the
config in case we copied in a default config file. But since
git_config() is backed by a cache these days, it's possible
that the call will not actually touch the filesystem at all;
we need to tell it that something has changed behind the
scenes.

Note that we also need to reset the shared_repository
config. At first glance, it seems like this should probably
just be folded into git_config_clear(). But unfortunately
that is not quite right. The shared repository value may
come from config, _or_ it may have been set manually. So
only the caller who knows whether or not they set it is the
one who can clear it (and indeed, if you _do_ put it into
git_config_clear(), then many tests fail, as we have to
clear the config cache any time we set a new config
variable).

There are three tests here. The first two actually pass
already, though it's largely luck: they just don't happen to
actually read any config before we enter the new repo.

But the third one does fail without this patch; we look at
core.sharedrepository while creating the directory, but need
to make sure the value from the template config overrides
it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King b9605bc4f2 config: only read .git/config from configured repos
When git_config() runs, it looks in the system, user-wide,
and repo-level config files. It gets the latter by calling
git_pathdup(), which in turn calls get_git_dir(). If we
haven't set up the git repository yet, this may simply
return ".git", and we will look at ".git/config".  This
seems like it would be helpful (presumably we haven't set up
the repository yet, so it tries to find it), but it turns
out to be a bad idea for a few reasons:

  - it's not sufficient, and therefore hides bugs in a
    confusing way. Config will be respected if commands are
    run from the top-level of the working tree, but not from
    a subdirectory.

  - it's not always true that we haven't set up the
    repository _yet_; we may not want to do it at all. For
    instance, if you run "git init /some/path" from inside
    another repository, it should not load config from the
    existing repository.

  - there might be a path ".git/config", but it is not the
    actual repository we would find via setup_git_directory().
    This may happen, e.g., if you are storing a git
    repository inside another git repository, but have
    munged one of the files in such a way that the
    inner repository is not valid (e.g., by removing HEAD).

We have at least two bugs of the second type in git-init,
introduced by ae5f677 (lazily load core.sharedrepository,
2016-03-11). It causes init to use git_configset(), which
loads all of the config, including values from the current
repo (if any).  This shows up in two ways:

  1. If we happen to be in an existing repository directory,
     we'll read and respect core.sharedrepository from it,
     even though it should have no bearing on the new
     repository. A new test in t1301 covers this.

  2. Similarly, if we're in an existing repo that sets
     core.logallrefupdates, that will cause init to fail to
     set it in a newly created repository (because it thinks
     that the user's templates already did so). A new test
     in t0001 covers this.

We also need to adjust an existing test in t1302, which
gives another example of why this patch is an improvement.

That test creates an embedded repository with a bogus
core.repositoryformatversion of "99". It wants to make sure
that we actually stop at the bogus repo rather than
continuing upward to find the outer repo. So it checks that
"git config core.repositoryformatversion" returns 99. But
that only works because we blindly read ".git/config", even
though we _know_ we're in a repository whose vintage we do
not understand.

After this patch, we avoid reading config from the unknown
vintage repository at all, which is a safer choice.  But we
need to tweak the test, since core.repositoryformatversion
will not return 99; it will claim that it could not find the
variable at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 11ca4bec96 t1302: use "git -C"
This is shorter, and saves a subshell.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 28a4e58021 diff: always try to set up the repository
If we see an explicit "--no-index", we do not bother calling
setup_git_directory_gently() at all. This means that we may
miss out on reading repo-specific config.

It's arguable whether this is correct or not. If we were
designing from scratch, making "git diff --no-index"
completely ignore the repository makes some sense. But we
are nowhere near scratch, so let's look at the existing
behavior:

  1. If you're in the top-level of a repository and run an
     explicit "diff --no-index", the config subsystem falls
     back to reading ".git/config", and we will respect repo
     config.

  2. If you're in a subdirectory of a repository, then we
     still try to read ".git/config", but it generally
     doesn't exist. So "diff --no-index" there does not
     respect repo config.

  3. If you have $GIT_DIR set in the environment, we read
     and respect $GIT_DIR/config,

  4. If you run "git diff /tmp/foo /tmp/bar" to get an
     implicit no-index, we _do_ run the repository setup,
     and set $GIT_DIR (or respect an existing $GIT_DIR
     variable). We find the repo config no matter where we
     started, and respect it.

So we already respect the repository config in a number of
common cases, and case (2) is the only one that does not.
And at least one of our tests, t4034, depends on case (1)
behaving as it does now (though it is just incidental, not
an explicit test for this behavior).

So let's bring case (2) in line with the others by always
running the repository setup, even with an explicit
"--no-index". We shouldn't need to change anything else, as the
implicit case already handles the prefix.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 7d8930d903 diff: handle --no-index prefixes consistently
If we see an explicit "git diff --no-index ../foo ../bar",
then we do not set up the git repository at all (we already
know we are in --no-index mode, so do not have to check "are
we in a repository?"), and hence have no "prefix" within the
repository. A patch generated by this command will have the
filenames "a/../foo" and "b/../bar", no matter which
directory we are in with respect to any repository.

However, in the implicit case, where we notice that the
files are outside the repository, we will have chdir()'d to
the top-level of the repository. We then feed the prefix
back to the diff machinery. As a result, running the same
diff from a subdirectory will result in paths that look like
"a/subdir/../../foo".

Besides being unnecessarily long, this may also be confusing
to the user: they don't care about the subdir or the
repository at all; it's just where they happened to be when
running the command. We should treat this the same as the
explicit --no-index case.

One way to address this would be to chdir() back to the
original path before running our diff. However, that's a bit
hacky, as we would also need to adjust $GIT_DIR, which could
be a relative path from our top-level.

Instead, we can reuse the diff machinery's RELATIVE_NAME
option, which automatically strips off the prefix. Note that
this _also_ restricts the diff to this relative prefix, but
that's OK for our purposes: we queue our own diff pairs
manually, and do not rely on that part of the diff code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 4a73aaaf18 patch-id: use RUN_SETUP_GENTLY
Patch-id does not require a repository because it is just
processing the incoming diff on stdin, but it may look at
git config for keys like patchid.stable.

Even though we do not setup_git_directory(), this works from
the top-level of a repository because we blindly look at
".git/config" in this case. But as the included test
demonstrates, it does not work from a subdirectory.

We can fix it by using RUN_SETUP_GENTLY. We do not take any
filenames from the user on the command line, so there's no
need to adjust them via prefix_filename().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King 0e94ee9415 hash-object: always try to set up the git repository
When "hash-object" is run without "-w", we don't need to be
in a git repository at all; we can just hash the object and
write its sha1 to stdout. However, if we _are_ in a git
repository, we would want to know that so we can follow the
normal rules for respecting config, .gitattributes, etc.

This happens to work at the top-level of a git repository
because we blindly read ".git/config", but as the included
test shows, it does not work when you are in a subdirectory.

The solution is to just do a "gentle" setup in this case. We
already take care to use prefix_filename() on any filename
arguments we get (to handle the "-w" case), so we don't need
to do anything extra to handle the side effects of repo
setup.

An alternative would be to specify RUN_SETUP_GENTLY for this
command in git.c, and then die if "-w" is set but we are not
in a repository. However, the error messages generated at
the time of setup_git_directory() are more detailed, so it's
better to find out which mode we are in, and then call the
appropriate function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Junio C Hamano 930b67ebd7 Merge branch 'ep/use-git-trace-curl-in-tests'
Update a few tests that used to use GIT_CURL_VERBOSE to use the
newer GIT_TRACE_CURL.

* ep/use-git-trace-curl-in-tests:
  t5551-http-fetch-smart.sh: use the GIT_TRACE_CURL environment var
  t5550-http-fetch-dumb.sh: use the GIT_TRACE_CURL environment var
  test-lib.sh: preserve GIT_TRACE_CURL from the environment
  t5541-http-push-smart.sh: use the GIT_TRACE_CURL environment var
2016-09-12 15:34:38 -07:00
Junio C Hamano ba06991e5f Merge branch 'js/t6026-clean-up'
A test spawned a short-lived background process, which sometimes
prevented the test directory from getting removed at the end of the
script on some platforms.

* js/t6026-clean-up:
  t6026-merge-attr: clean up background process at end of test case
2016-09-12 15:34:37 -07:00
Junio C Hamano 038763c71a Merge branch 'js/t9903-chaining'
* js/t9903-chaining:
  t9903: fix broken && chain
2016-09-12 15:34:37 -07:00
Junio C Hamano d1de693d0d Merge branch 'jc/forbid-symbolic-ref-d-HEAD'
"git symbolic-ref -d HEAD" happily removes the symbolic ref, but
the resulting repository becomes an invalid one.  Teach the command
to forbid removal of HEAD.

* jc/forbid-symbolic-ref-d-HEAD:
  symbolic-ref -d: do not allow removal of HEAD
2016-09-12 15:34:35 -07:00
Junio C Hamano 293c232ab1 Merge branch 'jc/submodule-anchor-git-dir'
Having a submodule whose ".git" repository is somehow corrupt
caused a few commands that recurse into submodules loop forever.

* jc/submodule-anchor-git-dir:
  submodule: avoid auto-discovery in prepare_submodule_repo_env()
2016-09-12 15:34:34 -07:00
Junio C Hamano 8f6fd086e6 Merge branch 'jk/test-lib-drop-pid-from-results'
The test framework left the number of tests and success/failure
count in the t/test-results directory, keyed by the name of the
test script plus the process ID.  The latter however turned out not
to serve any useful purpose.  The process ID part of the filename
has been removed.

* jk/test-lib-drop-pid-from-results:
  test-lib: drop PID from test-results/*.count
2016-09-12 15:34:33 -07:00
Junio C Hamano 305d7f1339 Merge branch 'jk/diff-submodule-diff-inline'
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.

* jk/diff-submodule-diff-inline:
  diff: teach diff to display submodule difference with an inline diff
  submodule: refactor show_submodule_summary with helper function
  submodule: convert show_submodule_summary to use struct object_id *
  allow do_submodule_path to work even if submodule isn't checked out
  diff: prepare for additional submodule formats
  graph: add support for --line-prefix on all graph-aware output
  diff.c: remove output_prefix_length field
  cache: add empty_tree_oid object and helper function
2016-09-12 15:34:31 -07:00
Kirill Smelkov 645c432d61 pack-objects: use reachability bitmap index when generating non-stdout pack
Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects)
if a repository has bitmap index, pack-objects can nicely speedup
"Counting objects" graph traversal phase. That however was done only for
case when resultant pack is sent to stdout, not written into a file.

The reason here is for on-disk repack by default we want:

- to produce good pack (with bitmap index not-yet-packed objects are
  emitted to pack in suboptimal order).

- to use more robust pack-generation codepath (avoiding possible
  bugs in bitmap code and possible bitmap index corruption).

Jeff King further explains:

    The reason for this split is that pack-objects tries to determine how
    "careful" it should be based on whether we are packing to disk or to
    stdout. Packing to disk implies "git repack", and that we will likely
    delete the old packs after finishing. We want to be more careful (so
    as not to carry forward a corruption, and to generate a more optimal
    pack), and we presumably run less frequently and can afford extra CPU.
    Whereas packing to stdout implies serving a remote via "git fetch" or
    "git push". This happens more frequently (e.g., a server handling many
    fetching clients), and we assume the receiving end takes more
    responsibility for verifying the data.

    But this isn't always the case. One might want to generate on-disk
    packfiles for a specialized object transfer. Just using "--stdout" and
    writing to a file is not optimal, as it will not generate the matching
    pack index.

    So it would be useful to have some way of overriding this heuristic:
    to tell pack-objects that even though it should generate on-disk
    files, it is still OK to use the reachability bitmaps to do the
    traversal.

So we can teach pack-objects to use bitmap index for initial object
counting phase when generating resultant pack file too:

- if we take care to not let it be activated under git-repack:

  See above about repack robustness and not forward-carrying corruption.

- if we know bitmap index generation is not enabled for resultant pack:

  The current code has singleton bitmap_git, so it cannot work
  simultaneously with two bitmap indices.

  We also want to avoid (at least with current implementation)
  generating bitmaps off of bitmaps. The reason here is: when generating
  a pack, not-yet-packed objects will be emitted into pack in
  suboptimal order and added to tail of the bitmap as "extended entries".
  When the resultant pack + some new objects in associated repository
  are in turn used to generate another pack with bitmap, the situation
  repeats: new objects are again not emitted optimally and just added to
  bitmap tail - not in recency order.

  So the pack badness can grow over time when at each step we have
  bitmapped pack + some other objects. That's why we want to avoid
  generating bitmaps off of bitmaps, not to let pack badness grow.

- if we keep pack reuse enabled still only for "send-to-stdout" case:

  Because pack-to-file needs to generate index for destination pack, and
  currently on pack reuse raw entries are directly written out to the
  destination pack by write_reused_pack(), bypassing needed for pack index
  generation bookkeeping done by regular codepath in write_one() and
  friends.

  ( In the future we might teach pack-reuse code about cases when index
    also needs to be generated for resultant pack and remove
    pack-reuse-only-for-stdout limitation )

This way for pack-objects -> file we get nice speedup:

    erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup
    repository managed by git-backup[2] via

    time echo 0186ac99 | git pack-objects --revs erp5pack

before:  37.2s
after:   26.2s

And for `git repack -adb` packed git.git

    time echo 5c589a73 | git pack-objects --revs gitpack

before:   7.1s
after:    3.6s

i.e. it can be 30% - 50% speedup for pack extraction.

git-backup extracts many packs on repositories restoration. That was my
initial motivation for the patch.

[1] https://lab.nexedi.com/nexedi/erp5
[2] https://lab.nexedi.com/kirr/git-backup

NOTE

Jeff also suggests that pack.useBitmaps was probably a mistake to
introduce originally. This way we are not adding another config point,
but instead just always default to-file pack-objects not to use bitmap
index: Tools which need to generate on-disk packs with using bitmap, can
pass --use-bitmap-index explicitly. And git-repack does never pass
--use-bitmap-index, so this way we can be sure regular on-disk repacking
remains robust.

NOTE2

`git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower
than `git pack-objects file.pack`. Extracting erp5.git pack from
lab.nexedi.com backup repository:

    $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack

    real    0m22.309s
    user    0m21.148s
    sys     0m0.932s

    $ time git index-pack erp5pack-stdout.pack

    real    0m50.873s   <-- more than 2 times slower than time to generate pack itself!
    user    0m49.300s
    sys     0m1.360s

So the time for

    `pack-object --stdout >file.pack` + `index-pack file.pack`  is  72s,

while

    `pack-objects file.pack` which does both pack and index     is  27s.

And even

    `pack-objects --no-use-bitmap-index file.pack`              is  37s.

Jeff explains:

    The packfile does not carry the sha1 of the objects. A receiving
    index-pack has to compute them itself, including inflating and applying
    all of the deltas.

that's why for `git-backup restore` we want to teach `git pack-objects
file.pack` to use bitmaps instead of using `git pack-objects --stdout
>file.pack` + `git index-pack file.pack`.

NOTE3

The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh

    Test                                    56dfeb62          this tree
    --------------------------------------------------------------------------------
    5310.2: repack to disk                  8.98(8.05+0.29)   9.05(8.08+0.33) +0.8%
    5310.3: simulated clone                 2.02(2.27+0.09)   2.01(2.25+0.08) -0.5%
    5310.4: simulated fetch                 0.81(1.07+0.02)   0.81(1.05+0.04) +0.0%
    5310.5: pack to file                    7.58(7.04+0.28)   7.60(7.04+0.30) +0.3%
    5310.6: pack to file (bitmap)           7.55(7.02+0.28)   3.25(2.82+0.18) -57.0%
    5310.8: clone (partial bitmap)          1.83(2.26+0.12)   1.82(2.22+0.14) -0.5%
    5310.9: pack to file (partial bitmap)   6.86(6.58+0.30)   2.87(2.74+0.20) -58.2%

More context:

    http://marc.info/?t=146792101400001&r=1&w=2
    http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t

Cc: Vicent Marti <tanoku@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-12 13:47:41 -07:00
Kirill Smelkov 702d1b9583 pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
are two codepaths in pack-objects: with & without using bitmap
reachability index.

However add_object_entry_from_bitmap(), despite its non-bitmapped
counterpart add_object_entry(), in no way does check for whether --local
or --honor-pack-keep or --incremental should be respected. In
non-bitmapped codepath this is handled in want_object_in_pack(), but
bitmapped codepath has simply no such checking at all.

The bitmapped codepath however was allowing to pass in all those options
and with bitmap indices still being used under such conditions -
potentially giving wrong output (e.g. including objects from non-local or
.keep'ed pack).

We can easily fix this by noting the following: when an object comes to
add_object_entry_from_bitmap() it can come for two reasons:

    1. entries coming from main pack covered by bitmap index, and
    2. object coming from, possibly alternate, loose or other packs.

"2" can be already handled by want_object_in_pack() and to cover
"1" we can teach want_object_in_pack() to expect that *found_pack can be
non-NULL, meaning calling client already found object's pack entry.

In want_object_in_pack() we care to start the checks from already found
pack, if we have one, this way determining the answer right away
in case neither --local nor --honour-pack-keep are active. In
particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do
not do harm to served-with-bitmap clones performance-wise:

    Test                      56dfeb62          this tree
    -----------------------------------------------------------------
    5310.2: repack to disk    9.08(8.20+0.25)   9.09(8.14+0.32) +0.1%
    5310.3: simulated clone   1.92(2.12+0.08)   1.93(2.12+0.09) +0.5%
    5310.4: simulated fetch   0.82(1.07+0.04)   0.82(1.06+0.04) +0.0%
    5310.6: partial bitmap    1.96(2.42+0.13)   1.95(2.40+0.15) -0.5%

    Test                      56dfeb62          this tree
    -----------------------------------------------------------------
    5310.2: repack to disk    9.11(8.16+0.32)   9.11(8.19+0.28) +0.0%
    5310.3: simulated clone   1.93(2.14+0.07)   1.92(2.11+0.10) -0.5%
    5310.4: simulated fetch   0.82(1.06+0.04)   0.82(1.04+0.05) +0.0%
    5310.6: partial bitmap    1.95(2.38+0.16)   1.94(2.39+0.14) -0.5%

    Test                      56dfeb62          this tree
    -----------------------------------------------------------------
    5310.2: repack to disk    9.13(8.17+0.31)   9.07(8.13+0.28) -0.7%
    5310.3: simulated clone   1.92(2.13+0.07)   1.91(2.12+0.06) -0.5%
    5310.4: simulated fetch   0.82(1.08+0.03)   0.82(1.08+0.03) +0.0%
    5310.6: partial bitmap    1.96(2.43+0.14)   1.96(2.42+0.14) +0.0%

with delta timings showing they are all within noise from run to run.

In the general case we do not want to call find_pack_entry_one() more than
once, because it is expensive. This patch splits the loop in
want_object_in_pack() into two parts: finding the object and seeing if it
impacts our choice to include it in the pack. We may call the inexpensive
want_found_object() twice, but we will never call find_pack_entry_one() if we
do not need to.

I appreciate help and discussing this change with Junio C Hamano and
Jeff King.

Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-12 13:47:41 -07:00
Johannes Schindelin 321459439e cat-file: support --textconv/--filters in batch mode
With this patch, --batch can be combined with --textconv or --filters.
For this to work, the input needs to have the form

	<object name><single white space><path>

so that the filters can be chosen appropriately.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-11 14:48:15 -07:00
Johannes Schindelin 7bcf341453 cat-file --textconv/--filters: allow specifying the path separately
There are circumstances when it is relatively easy to figure out the
object name for a given path, but not the name of the containing tree.
For example, when looking at a diff generated by Git, the object names
are recorded, but not the revision. As a matter of fact, the revisions
from which the diff was generated may not even exist locally.

In such a case, the user would have to generate a fake revision just to
be able to use --textconv or --filters.

Let's simplify this dramatically, because we do not really need that
revision at all: all we care about is that we know the path. In the
scenario described above, we do know the path, and we just want to
specify it separately from the object name.

Example usage:

	git cat-file --textconv --path=main.c 0f1937fd

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-11 14:48:15 -07:00
Johannes Schindelin b9e62f6011 cat-file: introduce the --filters option
The --filters option applies the convert_to_working_tree() filter for
the path when showing the contents of a regular file blob object;
the contents are written out as-is for other types of objects.

This feature comes in handy when a 3rd-party tool wants to work with
the contents of files from past revisions as if they had been checked
out, but without detouring via temporary files.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-11 14:47:46 -07:00
Jonathan Tan eb398797cd connect: advertized capability is not a ref
When cloning an empty repository served by standard git, "git clone" produces
the following reassuring message:

	$ git clone git://localhost/tmp/empty
	Cloning into 'empty'...
	warning: You appear to have cloned an empty repository.
	Checking connectivity... done.

Meanwhile when cloning an empty repository served by JGit, the output is more
haphazard:

	$ git clone git://localhost/tmp/empty
	Cloning into 'empty'...
	Checking connectivity... done.
	warning: remote HEAD refers to nonexistent ref, unable to checkout.

This is a common command to run immediately after creating a remote repository
as preparation for adding content to populate it and pushing. The warning is
confusing and needlessly worrying.

The cause is that, since v3.1.0.201309270735-rc1~22 (Advertise capabilities
with no refs in upload service., 2013-08-08), JGit's ref advertisement includes
a ref named capabilities^{} to advertise its capabilities on, while git's ref
advertisement is empty in this case. This allows the client to learn about the
server's capabilities and is needed, for example, for fetch-by-sha1 to work
when no refs are advertised.

This also affects "ls-remote". For example, against an empty repository served
by JGit:

	$ git ls-remote git://localhost/tmp/empty
	0000000000000000000000000000000000000000        capabilities^{}

Git advertises the same capabilities^{} ref in its ref advertisement for push
but since it never did so for fetch, the client didn't need to handle this
case.  Handle it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-09 13:40:36 -07:00
Jonathan Tan 63b747ce1a tests: move test_lazy_prereq JGIT to test-lib.sh
This enables JGIT to be used as a prereq in invocations of
test_expect_success (and other functions) in other test scripts.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-09 13:37:52 -07:00
Junio C Hamano 02c6c14d6c Merge branch 'sb/submodule-clone-rr'
"git clone --resurse-submodules --reference $path $URL" is a way to
reduce network transfer cost by borrowing objects in an existing
$path repository when cloning the superproject from $URL; it
learned to also peek into $path for presense of corresponding
repositories of submodules and borrow objects from there when able.

* sb/submodule-clone-rr:
  clone: recursive and reference option triggers submodule alternates
  clone: implement optional references
  clone: clarify option_reference as required
  clone: factor out checking for an alternate path
  submodule--helper update-clone: allow multiple references
  submodule--helper module-clone: allow multiple references
  t7408: merge short tests, factor out testing method
  t7408: modernize style
2016-09-08 21:49:50 -07:00
Junio C Hamano 00d27937bf Merge branch 'jh/status-v2-porcelain'
Enhance "git status --porcelain" output by collecting more data on
the state of the index and the working tree files, which may
further be used to teach git-prompt (in contrib/) to make fewer
calls to git.

* jh/status-v2-porcelain:
  status: unit tests for --porcelain=v2
  test-lib-functions.sh: add lf_to_nul helper
  git-status.txt: describe --porcelain=v2 format
  status: print branch info with --porcelain=v2 --branch
  status: print per-file porcelain v2 status data
  status: collect per-file data for --porcelain=v2
  status: support --porcelain[=<version>]
  status: cleanup API to wt_status_print
  status: rename long-format print routines
2016-09-08 21:49:50 -07:00
Junio C Hamano d7ed183a91 Merge branch 'rt/help-unknown'
"git nosuchcommand --help" said "No manual entry for gitnosuchcommand",
which was not intuitive, given that "git nosuchcommand" said "git:
'nosuchcommand' is not a git command".

* rt/help-unknown:
  help: make option --help open man pages only for Git commands
  help: introduce option --exclude-guides
2016-09-08 21:49:48 -07:00
Junio C Hamano da3b6f06e1 Merge branch 'cc/receive-pack-limit'
An incoming "git push" that attempts to push too many bytes can now
be rejected by setting a new configuration variable at the receiving
end.

* cc/receive-pack-limit:
  receive-pack: allow a maximum input size to be specified
  unpack-objects: add --max-input-size=<size> option
  index-pack: add --max-input-size=<size> option
2016-09-08 21:49:47 -07:00
Junio C Hamano 452a9073ba Merge branch 'jk/format-patch-number-singleton-patch-with-cover'
"git format-patch --cover-letter HEAD^" to format a single patch
with a separate cover letter now numbers the output as [PATCH 0/1]
and [PATCH 1/1] by default.

* jk/format-patch-number-singleton-patch-with-cover:
  format-patch: show 0/1 and 1/1 for singleton patch with cover letter
2016-09-08 21:49:47 -07:00
Junio C Hamano c4071eace9 Merge branch 'jk/delta-base-cache'
The delta-base-cache mechanism has been a key to the performance in
a repository with a tightly packed packfile, but it did not scale
well even with a larger value of core.deltaBaseCacheLimit.

* jk/delta-base-cache:
  t/perf: add basic perf tests for delta base cache
  delta_base_cache: use hashmap.h
  delta_base_cache: drop special treatment of blobs
  delta_base_cache: use list.h for LRU
  release_delta_base_cache: reuse existing detach function
  clear_delta_base_cache_entry: use a more descriptive name
  cache_or_unpack_entry: drop keep_cache parameter
2016-09-08 21:49:46 -07:00
Jeff King d63ed6ef24 remote-curl: handle URLs without protocol
Generally remote-curl would never see a URL that did not
have "proto:" at the beginning, as that is what tells git to
run the "git-remote-proto" helper (and git-remote-http, etc,
are aliases for git-remote-curl).

However, the special syntax "proto::something" will run
git-remote-proto with only "something" as the URL. So a
malformed URL like:

  http::/example.com/repo.git

will feed the URL "/example.com/repo.git" to
git-remote-http. The resulting URL has no protocol, but the
code added by 372370f (http: use credential API to handle
proxy authentication, 2016-01-26) does not handle this case
and segfaults.

For the purposes of this code, we don't really care what the
exact protocol; only whether or not it is https. So let's
just assume that a missing protocol is not, and curl will
handle the real error (which is that the URL is nonsense).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08 11:23:43 -07:00
brian m. carlson 99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Jeff King b773ddea2c pack-objects: walk tag chains for --include-tag
When pack-objects is given --include-tag, it peels each tag
ref down to a non-tag object, and if that non-tag object is
going to be packed, we include the tag, too. But what
happens if we have a chain of tags (e.g., tag "A" points to
tag "B", which points to commit "C")?

We'll peel down to "C" and realize that we want to include
tag "A", but we do not ever consider tag "B", leading to a
broken pack (assuming "B" was not otherwise selected).
Instead, we have to walk the whole chain, adding any tags we
find to the pack.

Interestingly, it doesn't seem possible to trigger this
problem with "git fetch", but you can with "git clone
--single-branch". The reason is that we generate the correct
pack when the client explicitly asks for "A" (because we do
a real reachability analysis there), and "fetch" is more
willing to do so. There are basically two cases:

  1. If "C" is already a ref tip, then the client can deduce
     that it needs "A" itself (via find_non_local_tags), and
     will ask for it explicitly rather than relying on the
     include-tag capability. Everything works.

  2. If "C" is not already a ref tip, then we hope for
     include-tag to send us the correct tag. But it doesn't;
     it generates a broken pack. However, the next step is
     to do a follow-up run of find_non_local_tags(),
     followed by fetch_refs() to backfill any tags we
     learned about.

     In the normal case, fetch_refs() calls quickfetch(),
     which does a connectivity check and sees we have no
     new objects to fetch. We just write the refs.

     But for the broken-pack case, the connectivity check
     fails, and quickfetch will follow-up with the remote,
     asking explicitly for each of the ref tips. This picks
     up the missing object in a new pack.

For a regular "git clone", we are similarly OK, because we
explicitly request all of the tag refs, and get a correct
pack. But with "--single-branch", we kick in tag
auto-following via "include-tag", but do _not_ do a
follow-up backfill. We just take whatever the server sent us
via include-tag and write out tag refs for any tag objects
we were sent. So prior to c6807a4 (clone: open a shortcut
for connectivity check, 2013-05-26), we actually claimed the
clone was a success, but the result was silently
corrupted!  Since c6807a4, index-pack's connectivity
check catches this case, and we correctly complain.

The included test directly checks that pack-objects does not
generate a broken pack, but also confirms that "clone
--single-branch" does not hit the bug.

Note that tag chains introduce another interesting question:
if we are packing the tag "B" but not the commit "C", should
"A" be included?

Both before and after this patch, we do not include "A",
because the initial peel_ref() check only knows about the
bottom-most level, "C". To realize that "B" is involved at
all, we would have to switch to an incremental peel, in
which we examine each tagged object, asking if it is being
packed (and including the outer tag if so).

But that runs contrary to the optimizations in peel_ref(),
which avoid accessing the objects at all, in favor of using
the value we pull from packed-refs. It's OK to walk the
whole chain once we know we're going to include the tag (we
have to access it anyway, so the effort is proportional to
the pack we're generating). But for the initial selection,
we have to look at every ref. If we're only packing a few
objects, we'd still have to parse every single referenced
tag object just to confirm that it isn't part of a tag
chain.

This could be addressed if packed-refs stored the complete
tag chain for each peeled ref (in most cases, this would be
the same cost as now, as each "chain" is only a single
link). But given the size of that project, it's out of scope
for this fix (and probably nobody cares enough anyway, as
it's such an obscure situation). This commit limits itself
to just avoiding the creation of a broken pack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:45:31 -07:00
Jeff King ab5178356c t5305: simplify packname handling
We generate a series of packfiles test-1-$pack,
test-2-$pack, with different properties and then examine
them. However we always store the packname generated by
pack-objects in the variable packname_1. This probably was
meant to be packname_2 in the second test, but it turns out
that it doesn't matter: once we are done with the first
pack, we can just keep using the same $packname variable.

So let's drop the confusing "_1" parameter. At the same
time, let's give test-1 and test-2 more descriptive names,
which can help keep them straight (note that we _could_
likewise overwrite the packfiles in each test, but by using
separate filenames, we are sure that test 2 does not
accidentally use the packfile from test 1).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:45:29 -07:00
Jeff King 948a7fd242 t5305: use "git -C"
This test unpacks objects into a separate repository, and
accesses it by setting GIT_DIR in a subshell. We can do the
same thing these days by using "git init <repo>" and "git
-C". In most cases this is shorter, though when there are
multiple commands, we may end up repeating the "-C".

However, this repetition can actually be a good thing. This
patch also fixes a bug introduced by 512477b (tests: use
"env" to run commands with temporary env-var settings,
2014-03-18). That commit essentially converted:

   (GIT_DIR=...; export GIT_DIR
    cmd1 &&
    cmd2)

into:

   (GIT_DIR=... cmd1 &&
    cmd2)

which obviously loses the GIT_DIR setting for cmd2 (we never
noticed the bug because it simply runs "cmd2" in the parent
repo, which means we were simply failing to test anything
interesting). By using "git -C" rather than a subshell, it
becomes quite obvious where each command is supposed to be
running.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:45:28 -07:00
Jeff King 2076353f47 t5305: drop "dry-run" of unpack-objects
For each test we do a dry-run of unpack-objects, followed by
a real run, followed by confirming that it contained the
objects we expected. The dry-run is telling us nothing, as
any errors it encounters would be found in the real run.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:45:27 -07:00
Jeff King 1962d9fbe3 t5305: move cleanup into test block
We usually try to avoid doing any significant actions
outside of test blocks. Although "rm -rf" is unlikely to
either fail or to generate output, moving these to the
point of use makes it more clear that they are part of the
overall setup of "clone.git".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:45:26 -07:00
Elia Pinto 14e24114d9 t5551-http-fetch-smart.sh: use the GIT_TRACE_CURL environment var
Use the new GIT_TRACE_CURL environment variable instead
of the deprecated GIT_CURL_VERBOSE.

Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:41:45 -07:00
Elia Pinto 81590bf77d t5550-http-fetch-dumb.sh: use the GIT_TRACE_CURL environment var
Use the new GIT_TRACE_CURL environment variable instead
of the deprecated GIT_CURL_VERBOSE.

Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:41:42 -07:00
Elia Pinto 4527aa10a6 test-lib.sh: preserve GIT_TRACE_CURL from the environment
Turning on this variable can be useful when debugging http
tests. It can break a few tests in t5541 if not set
to an absolute path but it is not a variable
that the user is likely to have enabled accidentally.

Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:41:40 -07:00
Elia Pinto 4eee6c6ddc t5541-http-push-smart.sh: use the GIT_TRACE_CURL environment var
Use the new GIT_TRACE_CURL environment variable instead
of the deprecated GIT_CURL_VERBOSE.

Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:41:39 -07:00
Johannes Sixt 5babb5bdb3 t6026-merge-attr: clean up background process at end of test case
The process spawned in the hook uses the test's trash directory as CWD.
As long as it is alive, the directory cannot be removed on Windows.
Although the test succeeds, the 'test_done' that follows produces an
error message and leaves the trash directory around. Kill the process
before the test case advances.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:40:22 -07:00
Johannes Sixt c00bfc9d1b t9903: fix broken && chain
We might wonder why our && chain check does not catch this case:
The && chain check uses a strange exit code with the expectation that
the second or later part of a broken && chain would not exit with this
particular code.

This expectation does not work in this case because __git_ps1, being
the first command in the second part of the broken && chain, records
the current exit code, does its work, and finally returns to the caller
with the recorded exit code. This fools our && chain check.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:35:08 -07:00
Junio C Hamano 12cfa792b8 symbolic-ref -d: do not allow removal of HEAD
If you delete the symbolic-ref HEAD from a repository, Git no longer
considers the repository valid, and even "git symbolic-ref HEAD
refs/heads/master" would not be able to recover from that state
(although "git init" can, but that is a sure sign that you are
talking about a "broken" repository).

In the spirit similar to afe5d3d5 ("symbolic ref: refuse non-ref
targets in HEAD", 2009-01-29), forbid removal of HEAD to avoid
corrupting a repository.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-02 09:01:38 -07:00