Commit graph

46928 commits

Author SHA1 Message Date
Junio C Hamano dfe46c5ce6 Merge branch 'jk/loose-object-info-report-error'
Update error handling for codepath that deals with corrupt loose
objects.

* jk/loose-object-info-report-error:
  index-pack: detect local corruption in collision check
  sha1_loose_object_info: return error for corrupted objects
2017-04-16 23:29:30 -07:00
Junio C Hamano 3c833cae44 Merge branch 'jc/bs-t-is-not-a-tab-for-sed'
Code cleanup.

* jc/bs-t-is-not-a-tab-for-sed:
  contrib/git-resurrect.sh: do not write \t for HT in sed scripts
2017-04-16 23:29:29 -07:00
Junio C Hamano 93a96cced3 Merge branch 'jc/unused-symbols'
Code cleanup.

* jc/unused-symbols:
  remote.[ch]: parse_push_cas_option() can be static
2017-04-16 23:29:27 -07:00
Junio C Hamano cb054eb264 Merge branch 'jk/snprintf-cleanups'
Code clean-up.

* jk/snprintf-cleanups:
  daemon: use an argv_array to exec children
  gc: replace local buffer with git_path
  transport-helper: replace checked snprintf with xsnprintf
  convert unchecked snprintf into xsnprintf
  combine-diff: replace malloc/snprintf with xstrfmt
  replace unchecked snprintf calls with heap buffers
  receive-pack: print --pack-header directly into argv array
  name-rev: replace static buffer with strbuf
  create_branch: use xstrfmt for reflog message
  create_branch: move msg setup closer to point of use
  avoid using mksnpath for refs
  avoid using fixed PATH_MAX buffers for refs
  fetch: use heap buffer to format reflog
  tag: use strbuf to format tag header
  diff: avoid fixed-size buffer for patch-ids
  odb_mkstemp: use git_path_buf
  odb_mkstemp: write filename into strbuf
  do not check odb_mkstemp return value for errors
2017-04-16 23:29:26 -07:00
Michael Haggerty f890db83ee do_for_each_entry_in_dir(): delete function
Its only remaining caller was itself.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty 50c2d8555b files_pack_refs(): use reference iteration
Use reference iteration rather than `do_for_each_entry_in_dir()` in
the definition of `files_pack_refs()`. This makes the code shorter and
easier to follow, because the logic can be inline rather than spread
between the main function and a callback function, and it removes the
need to use `pack_refs_cb_data` to preserve intermediate state.

This removes the last callers of `entry_resolves_to_object()` and
`get_loose_ref_dir()`, so delete those functions.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty 1710fbafb6 commit_packed_refs(): use reference iteration
Use reference iteration rather than do_for_each_entry_in_dir() in the
definition of commit_packed_refs().

Note that an internal consistency check that was previously done in
`write_packed_entry_fn()` is not there anymore. This is actually an
improvement:

The old error message was emitted when there is an entry in the
packed-ref cache that is not `REF_KNOWS_PEELED`, and when we attempted
to peel the reference, the result was `PEEL_INVALID`,
`PEEL_IS_SYMREF`, or `PEEL_BROKEN`. Since a packed ref cannot be a
symref, `PEEL_IS_SYMREF` and `PEEL_BROKEN` can be ruled out. So we're
left with `PEEL_INVALID`.

An entry without `REF_KNOWS_PEELED` can get into the packed-refs cache
in the following two ways:

* The reference was read from a `packed-refs` file that didn't have
  the `fully-peeled` attribute. In that case, we *don't want* to emit
  an error, because the broken value is presumably a stale value of
  the reference that is now masked by a loose version of the same
  reference (which we just don't happen to be packing this time). This
  is a perfectly legitimate situation and doesn't indicate that the
  repository is corrupt. The old code incorrectly emits an error
  message in this case. (It was probably never reported as a bug
  because this scenario is rare.)

* The reference was a loose reference that was just added to the
  packed ref cache by `files_packed_refs()` via
  `pack_if_possible_fn()` in preparation for being packed. The latter
  function refuses to pack a reference for which
  `entry_resolves_to_object()` returns false, and otherwise calls
  `peel_entry()` itself and checks the return value. So an entry added
  this way should always have `REF_KNOWS_PEELED` and shouldn't trigger
  the error message in either the old code or the new.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty 059ae35a48 cache_ref_iterator_begin(): make function smarter
Change `cache_ref_iterator_begin()` to take two new arguments:

* `prefix` -- to iterate only over references with the specified
  prefix.

* `prime_dir` -- to "prime" (i.e., pre-load) the cache before starting
  the iteration.

The new functionality makes it possible for
`files_ref_iterator_begin()` to be made more ignorant of the internals
of `ref_cache`, and `find_containing_dir()` and `prime_ref_dir()` to
be made private.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty a714b19ca8 get_loose_ref_cache(): new function
Extract a new function, `get_loose_ref_cache()`, from
get_loose_ref_dir(). The function returns the `ref_cache` for the
loose refs of a `files_ref_store`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty 86f423584b get_loose_ref_dir(): function renamed from get_loose_refs()
The new name is more analogous to `get_packed_ref_dir()`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty 5c7bba77b2 do_for_each_entry_in_dir(): eliminate offset argument
It was never used.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty e3bf2989ca refs: handle "refs/bisect/" in loose_fill_ref_dir()
That "refs/bisect/" has to be handled specially when filling the
ref_cache for loose references is a peculiarity of the files backend,
and the ref-cache code shouldn't need to know about it. So move this
code to the callback function, `loose_fill_ref_dir()`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty df30875987 ref-cache: use a callback function to fill the cache
It is a leveling violation for `ref_cache` to know about
`files_ref_store` or that it should call `read_loose_refs()` to lazily
fill cache directories. So instead, have its constructor take as an
argument a callback function that it should use for lazy-filling, and
change `files_ref_store` to supply a pointer to function
`read_loose_refs` (renamed to `loose_fill_ref_dir`) when creating the
ref cache for its loose refs.

This means that we can generify the type of the back-pointer in
`struct ref_cache` from `files_ref_store` to `ref_store`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty e00d1a4ff7 refs: record the ref_store in ref_cache, not ref_dir
Instead of keeping a pointer to the `ref_store` in every `ref_dir`
entry, store it once in `struct ref_cache`, and change `struct
ref_dir` to include a pointer to its containing `ref_cache` instead.
This makes it easier to add to the information that is accessible from
a `ref_dir` without increasing the size of every `ref_dir` instance.

Note that previously, every `ref_dir` pointed at the containing
`files_ref_store` regardless of whether it was a part of the loose or
packed reference cache. Now we have to be sure to initialize the
instances to point at the correct containing `ref_cache`. So change
`create_dir_entry()` to take a `ref_cache` parameter, and change its
callers to pass the correct `ref_cache` depending on the purpose of
the new `dir_entry`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty 7c22bc8a18 ref-cache: introduce a new type, ref_cache
For now, it just wraps a `ref_entry *` that points at the root of the
tree. Soon it will hold more information.

Add two new functions, `create_ref_cache()` and `free_ref_cache()`.
Make `free_ref_entry()` private.

Change files-backend to use this type to hold its caches.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:46 -07:00
Michael Haggerty 958f964691 refs: split ref_cache code into separate files
The `ref_cache` code is currently too tightly coupled to
`files-backend`, making the code harder to understand and making it
awkward for new code to use `ref_cache` (as we indeed have planned).
Start loosening that coupling by splitting `ref_cache` into a separate
module.

This commit moves code, adds declarations, and changes the visibility
of some functions, but doesn't change any code.

The modules are still too tightly coupled, but the situation will be
improved in subsequent commits.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:45 -07:00
Michael Haggerty 9fc3b06311 ref-cache: rename remove_entry() to remove_entry_from_dir()
This function's visibility is about to be increased, so give it a more
distinctive name.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:45 -07:00
Michael Haggerty bc1c696e89 ref-cache: rename find_ref() to find_ref_entry()
This function's visibility is about to be increased, so give it a more
distinctive name.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:45 -07:00
Michael Haggerty a3ade2baba ref-cache: rename add_ref() to add_ref_entry()
This function's visibility is about to be increased, so give it a more
distinctive name.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:45 -07:00
Michael Haggerty 524a9fdb51 refs_verify_refname_available(): use function in more places
Change `lock_raw_ref()` and `lock_ref_sha1_basic()` to use
`refs_verify_refname_available()` instead of
`verify_refname_available_dir()`. This means that those callsites now
check for conflicts with all references rather than just packed refs,
but the performance cost shouldn't be significant (and will be
regained later).

These were the last callers of `verify_refname_available_dir()`, so
also delete that (very complicated) function.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:45 -07:00
Michael Haggerty b05855b5bc refs_verify_refname_available(): implement once for all backends
It turns out that we can now implement
`refs_verify_refname_available()` based on the other virtual
functions, so there is no need for it to be defined at the backend
level. Instead, define it once in `refs.c` and remove the
`files_backend` definition.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:32:45 -07:00
Sebastian Schuberth 0747fb49fd sha1_file: remove an used fd variable
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:29:18 -07:00
René Scharfe ac8ce18d89 am: close stream on error, but not stdin
Avoid closing stdin, but do close an actual input file on error exit.

Found with Cppcheck.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:27:39 -07:00
Giuseppe Bilotta 0fb3c4fc9a builtin/am: fold am_signoff() into am_append_signoff()
There are no more direct calls to am_signoff(), so we can fold its
logic  in am_append_signoff().

(This is done in a separate commit rather than in the previous one, to
make it easier to revert this specific change if additional calls are
ever introduced.)

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:19:09 -07:00
Giuseppe Bilotta b7cc7051f7 builtin/am: honor --signoff also when --rebasing
Signoff is handled in parse_mail(), but not in parse_mail_rebasing(),
since the latter is only used when git-rebase calls git-am with the
--rebasing option, and --signoff is never passed in this case.

In order to introduce (in the upcoming commits) support for
`git-rebase --signoff`, we must make git-am pay attention to it also
in the rebase case. This can be done by moving the conditional
addition of the signoff from parse_mail() to the caller am_run(),
after either of the parse_mail*() functions were called.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:19:09 -07:00
Luke Diamand eff451101d git-p4: don't use name-rev to get current branch
git-p4 was using "git name-rev" to find out the current branch.

That is not safe, since if multiple branches or tags point at
the same revision, the result obtained might not be what is
expected.

Instead use "git symbolic-ref".

Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:13:26 -07:00
Luke Diamand 78871bf46f git-p4: add read_pipe_text() internal function
The existing read_pipe() function returns an empty string on
error, but also returns an empty string if the command returns
an empty string.

This leads to ugly constructions trying to detect error cases.

Add read_pipe_text() which just returns None on error.

Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:13:24 -07:00
Luke Diamand 3d553cceb5 git-p4: add failing test for name-rev rather than symbolic-ref
Using name-rev to find the current git branch means that git-p4
does not correctly get the current branch name if there are
multiple branches pointing at HEAD, or a tag.

This change adds a test case which demonstrates the problem.
Configuring which branches are allowed to be submitted from goes
wrong, as git-p4 gets confused about which branch is in use.

This appears to be the only place that git-p4 actually cares
about the current branch.

Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 21:13:23 -07:00
Brandon Williams cf9e55f494 submodule: prevent backslash expantion in submodule names
When attempting to add a submodule with backslashes in its name 'git
submodule' fails in a funny way.  We can see that some of the
backslashes are expanded resulting in a bogus path:

git -C main submodule add ../sub\\with\\backslash
fatal: repository '/tmp/test/sub\witackslash' does not exist
fatal: clone of '/tmp/test/sub\witackslash' into submodule path

To solve this, convert calls to 'read' to 'read -r' in git-submodule.sh
in order to prevent backslash expantion in submodule names.

Reported-by: Joachim Durchholz <jo@durchholz.org>
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 20:09:36 -07:00
René Scharfe bccb22cbb1 test-read-cache: setup git dir
b1ef400e (setup_git_env: avoid blind fall-back to ".git") made programs
that tried to access a repository without initializing properly die with
a diagnostic message.  One offender is test-read-cache, which is used in
p0002.  Fix it by calling setup_git_directory() before accessing the
index.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 20:05:11 -07:00
Jeff King d8f4481c4f refs: reject ref updates while GIT_QUARANTINE_PATH is set
As documented in git-receive-pack(1), updating a ref from
within the pre-receive hook is dangerous and can corrupt
your repo. This patch forbids ref updates entirely during
the hook to make it harder for adventurous hook writers to
shoot themselves in the foot.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 18:19:18 -07:00
Jeff King eaeed077a6 receive-pack: document user-visible quarantine effects
Commit 722ff7f87 (receive-pack: quarantine objects until
pre-receive accepts, 2016-10-03) changed the underlying
details of how we take in objects. This is mostly
transparent to the user, but there are a few things they
might notice. Let's document them.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 18:15:17 -07:00
Jeff King 360244a51a receive-pack: drop tmp_objdir_env from run_update_hook
Since 722ff7f87 (receive-pack: quarantine objects until
pre-receive accepts, 2016-10-03), we have to feed the
pre-receive hook the tmp_objdir environment, so that git
programs run from the hook know where to find the objects.

That commit modified run_update_hook() to do the same, but
there it is a noop. By the time we get to the update hooks,
we have already migrated the objects from quarantine, and so
tmp_objdir_env() will always return NULL. We can drop this
useless call.

Note that the ordering here and the lack of support for the
update hook is intentional. The update hook calls are
interspersed with actual ref updates, and we must migrate
the objects before any refs are updated (since otherwise
those refs would appear broken to outside processes). So the
only other options are:

  - remain in quarantine for the _first_ ref, but not the
    others. This is sufficiently confusing that it can be
    rejected outright.

  - run all the individual update hooks first, then migrate,
    then update all the refs. But this changes the repository
    state that the update hooks see (i.e., whether or not
    refs from the same push are updated yet or not).

So the functionality is fine and remains unchanged with this
patch; we're just cleaning up a useless and confusing line
of code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 18:13:12 -07:00
SZEDER Gábor ef09036cf3 t6500: wait for detached auto gc at the end of the test script
The last test in 't6500-gc', 'background auto gc does not run if
gc.log is present and recent but does if it is old', added in
a831c06a2 (gc: ignore old gc.log files, 2017-02-10), may sporadically
trigger an error message from the test harness:

  rm: cannot remove 'trash directory.t6500-gc/.git/objects': Directory not empty

The test in question ends with executing an auto gc in the backround,
which occasionally takes so long that it's still running when
'test_done' is about to remove the trash directory.  This 'rm -rf
$trash' in the foreground might race with the detached auto gc to
create and delete files and directories, and gc might (re-)create a
path that 'rm' already visited and removed, triggering the above error
message when 'rm' attempts to remove its parent directory.

Commit bb05510e5 (t5510: run auto-gc in the foreground, 2016-05-01)
fixed the same problem in a different test script by simply
disallowing background gc.  Unfortunately, what worked there is not
applicable here, because the purpose of this test is to check the
behavior of a detached auto gc.

Make sure that the test doesn't continue before the gc is finished in
the background with a clever bit of shell trickery:

  - Open fd 9 in the shell, to be inherited by the background gc
    process, because our daemonize() only closes the standard fds 0,
    1 and 2.
  - Duplicate this fd 9 to stdout.
  - Read 'git gc's stdout, and thus fd 9, through a command
    substitution.  We don't actually care about gc's output, but this
    construct has two useful properties:
  - This read blocks until stdout or fd 9 are open.  While stdout is
    closed after the main gc process creates the background process
    and exits, fd 9 remains open until the backround process exits.
  - The variable assignment from the command substitution gets its
    exit status from the command executed within the command
    substitution, i.e. a failing main gc process will cause the test
    to fail.

Note, that this fd trickery doesn't work on Windows, because due to
MSYS limitations the git process only inherits the standard fds 0, 1
and 2 from the shell.  Luckily, it doesn't matter in this case,
because on Windows daemonize() is basically a noop, thus 'git gc
--auto' always runs in the foreground.

And since we can now continue the test reliably after the detached gc
finished, check that there is only a single packfile left at the end,
i.e. that the detached gc actually did what it was supposed to do.
Also add a comment at the end of the test script to warn developers of
future tests about this issue of long running detached gc processes.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 18:06:50 -07:00
Brandon Williams 5ce10c0a29 pathspec: fix segfault in clear_pathspec
In 'clear_pathspec()' the incorrect index parameter is used to bound an
inner-loop which is used to free a 'struct attr_match' value field.
Using the incorrect index parameter (in addition to being incorrect)
occasionally causes segmentation faults when attempting to free an
invalid pointer.  Fix this by using the correct index parameter 'i'.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 18:04:06 -07:00
Ævar Arnfjörð Bjarmason 7861fa07c3 grep: plug a trivial memory leak
Change the cleanup phase for the grep command to free the pathspec
struct that's allocated earlier in the same block, and used just a few
lines earlier.

With "grep hi README.md" valgrind reports a loss of 239 bytes now,
down from 351.

The relevant --num-callers=40 --leak-check=full --show-leak-kinds=all
backtrace is:

    [...] 187 (112 direct, 75 indirect) bytes in 1 blocks are definitely lost in loss record 70 of 110
    [...]    at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    [...]    by 0x60B339: do_xmalloc (wrapper.c:59)
    [...]    by 0x60B2F6: xmalloc (wrapper.c:86)
    [...]    by 0x576B37: parse_pathspec (pathspec.c:652)
    [...]    by 0x4519F0: cmd_grep (grep.c:1215)
    [...]    by 0x4062EF: run_builtin (git.c:371)
    [...]    by 0x40544D: handle_builtin (git.c:572)
    [...]    by 0x4060A2: run_argv (git.c:624)
    [...]    by 0x4051C6: cmd_main (git.c:701)
    [...]    by 0x4C5901: main (common-main.c:43)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 17:56:47 -07:00
Jeff King 22e5ae5c8e connect.c: handle errors from split_cmdline
Commit e9d9a8a4d (connect: handle putty/plink also in
GIT_SSH_COMMAND, 2017-01-02) added a call to
split_cmdline(), but checks only for a non-zero return to
see if we got any output. Since the function returns
negative values (and a NULL argv) on error, we end up
dereferencing NULL and segfaulting.

Arguably we could report on the parsing error here, but it's
probably not worth it. This is a best-effort attempt to see
if we are using plink. So we can simply return here with
"no, it wasn't plink" and let the shell actually complain
about the bogus quoting.

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 17:48:00 -07:00
Lars Schneider d8245bb3fd travis-ci: add static analysis build job to run coccicheck
Add a dedicated build job for static analysis. As a starter we only run
coccicheck but in the future we could run Clang Static Analyzer or
similar tools, too.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16 17:31:50 -07:00
Jeff Hostetler d12a8cf0af unpack-trees: avoid duplicate ODB lookups during checkout
Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.

In operations such as read-tree and checkout, there will
likely be many peer directories that have the same OID when
the differences between the commits are relatively small.
In these cases we can avoid hitting the ODB multiple times
for the same OID.

This patch handles n=2 and n=3 cases and simply copies the
data rather than repeating the fill_tree_descriptor().

================
On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.

(avg) before: 22.699
(avg) after:  21.955
===============

================
On Linux using p0006-read-tree-checkout.sh with linux.git:

Test                                                          HEAD^              HEAD
-------------------------------------------------------------------------------------------------------
0006.2: read-tree br_base br_ballast (57994)                  0.24(0.20+0.03)    0.24(0.22+0.01) +0.0%
0006.3: switch between br_base br_ballast (57994)             10.58(6.23+2.86)   10.67(5.94+2.87) +0.9%
0006.4: switch between br_ballast br_ballast_plus_1 (57994)   0.60(0.44+0.17)    0.57(0.44+0.14) -5.0%
0006.5: switch between aliases (57994)                        0.59(0.48+0.13)    0.57(0.44+0.15) -3.4%
================

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 02:50:44 -07:00
Jeff Hostetler a6db3fbb6e read-cache: add strcmp_offset function
Add strcmp_offset() function to also return the offset of the
first change.

Add unit test and helper to verify.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 02:21:12 -07:00
Jeff Hostetler 950a234cbd string-list: use ALLOC_GROW macro when reallocing string_list
Use ALLOC_GROW() macro when reallocing a string_list array
rather than simply increasing it by 32.  This is a performance
optimization.

During status on a very large repo and there are many changes,
a significant percentage of the total run time is spent
reallocing the wt_status.changes array.

This change decreases the time in wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.

This produced a modest gain on my 1M file artificial repo, but
broke even on linux.git.

Test                                            HEAD^^            HEAD
---------------------------------------------------------------------------------------
0005.2: read-tree status br_ballast (1000001)   8.29(5.62+2.62)   8.22(5.57+2.63) -0.8%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 02:04:41 -07:00
Jeff Hostetler a33fc72fe9 read-cache: force_verify_index_checksum
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.

Teach fsck to force this verification.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

These effect can be seen using t/perf/p0002-read-cache.sh:

Test                                          HEAD~1            HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.66(0.44+0.20)   0.30(0.27+0.02) -54.5%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 00:58:36 -07:00
Patrick Steinhardt be4dbbbed9 pathspec: honor PATHSPEC_PREFIX_ORIGIN with empty prefix
Previous to commit 5d8f084a5 (pathspec: simpler logic to prefix original
pathspec elements, 2017-01-04), we were always using the computed
`match` variable to perform pathspec matching whenever
`PATHSPEC_PREFIX_ORIGIN` is set. This is for example useful when passing
the parsed pathspecs to other commands, as the computed `match` may
contain a pathspec relative to the repository root. The commit changed
this logic to only do so when we do have an actual prefix and when
literal pathspecs are deactivated.

But this change may actually break some commands which expect passed
pathspecs to be relative to the repository root. One such case is `git
add --patch`, which now fails when using relative paths from a
subdirectory. For example if executing "git add -p ../foo.c" in a
subdirectory, the `git-add--interactive` command will directly pass
"../foo.c" to `git-ls-files`. As ls-files is executed at the
repository's root, the command will notice that "../foo.c" is outside
the repository and fail.

Fix the issue by again using the computed `match` variable when
`PATHSPEC_PREFIX_ORIGIN` is set and global literal pathspecs are
deactivated. Note that in contrast to previous behavior, we will now
always call `prefix_magic` regardless of whether a prefix is actually
set. But this is the right thing to do: when the `match` variable has
been resolved to the repository's root, it will be set to an empty
string. When passing the empty string directly to other commands, it
will result in a warning regarding deprecated empty pathspecs. By always
adding the prefix magic, we will end up with at least the string
":(prefix:0)" and thus avoid the warning.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Duy Nguyen <pclouds@gmail.com>
2017-04-14 23:55:24 -07:00
Nguyễn Thái Ngọc Duy 86f9515708 config: resolve symlinks in conditional include's patterns
$GIT_DIR returned by get_git_dir() is normalized, with all symlinks
resolved (see setup_work_tree function). In order to match paths (or
patterns) against $GIT_DIR char-by-char, they have to be normalized
too. There is a note in config.txt about this, that the user need to
resolve symlinks by themselves if needed.

The problem is, we allow certain path expansion, '~/' and './', for
convenience and can't ask the user to resolve symlinks in these
expansions. Make sure the expanded paths have all symlinks resolved.

PS. The strbuf_realpath(&text, get_git_dir(), 1) is still needed because
get_git_dir() may return relative path.

Noticed-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-14 23:51:38 -07:00
Nguyễn Thái Ngọc Duy 4aad2f1627 path.c: and an option to call real_path() in expand_user_path()
In the next patch we need the ability to expand '~' to
real_path($HOME). But we can't do that from outside because '~' is part
of a pattern, not a true path. Add an option to expand_user_path() to do
so.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-14 23:51:38 -07:00
Prathamesh Chavan 210e5dba0b t2027: avoid using pipes
Whenever a git command is present in the upstream of a pipe, its failure
gets masked by piping. Hence we should avoid it for testing the
upstream git command. By writing out the output of the git command to
a file, we can test the exit codes of both the commands as a failure exit
code in any command is able to stop the && chain.

Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-14 16:55:24 -07:00
Michael Haggerty e121b9cb5f refs_ref_iterator_begin(): new function
Extract a new function from `do_for_each_ref()`. It will be useful
elsewhere.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-14 03:54:31 -07:00
Michael Haggerty 470be51862 refs_read_raw_ref(): new function
Extract a new function from `refs_resolve_ref_unsafe()`. It will be
useful elsewhere.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-14 03:54:31 -07:00
Michael Haggerty 68fb02e40d get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
Since references under "refs/bisect/" are per-worktree, they have to
be sought in the worktree rather than in the main repository. But
since loose references are found by traversing directories, the
reference iterator won't even get the idea to look for a
"refs/bisect/" directory in the worktree if there is not a directory
with that name in the main repository. Thus `get_ref_dir()` manually
inserts a dir_entry for "refs/bisect/" whenever it reads the entry for
"refs/".

The current code then immediately calls `read_loose_refs()` on that
directory. But since the dir_entry is created with its `incomplete`
flag set, any traversal that gets to this point will read the
directory automatically. So there is no need to call
`read_loose_refs()` explicitly; the lazy mechanism suffices.

And in fact, the attempt to `read_loose_refs()` was broken anyway.
That function needs its `dirname` argument to have a trailing `/`
character, but the invocation here was passing it "refs/bisect"
without a trailing slash. So `read_loose_refs()` would read
`$GIT_DIR/refs/bisect" correctly, but if it found an entry "foo" in
that directory, it would try to read "$GIT_DIR/refs/bisectfoo".
Normally it wouldn't find anything at that path, but the failure was
canceled out because `get_ref_dir()` *also* forgot to reset the
`REF_INCOMPLETE` bit on the dir_entry. So the read was attempted again
when it was accessed, via the lazy mechanism, and this time the read
was done correctly.

This code has been broken since it was first introduced.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-14 03:54:31 -07:00
Nguyễn Thái Ngọc Duy adac8115a6 refs.h: add a note about sorting order of for_each_ref_*
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-14 03:53:25 -07:00