Commit graph

2100 commits

Author SHA1 Message Date
Jonathan Tan
dade47c06c commit-graph: add repo arg to graph readers
Add a struct repository argument to the functions in commit-graph.h that
read the commit graph. (This commit does not affect functions that write
commit graphs.)

Because the commit graph functions can now read the commit graph of any
repository, the global variable core_commit_graph has been removed.
Instead, the config option core.commitGraph is now read on the first
time in a repository that a commit is attempted to be parsed using its
commit graph.

This commit includes a test that exercises the functionality on an
arbitrary repository that is not the_repository.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 15:47:48 -07:00
brian m. carlson
509f6f62a4 cache: update object ID functions for the_hash_algo
Most of our code has been converted to use struct object_id for object
IDs.  However, there are some places that still have not, and there are
a variety of places that compare equivalently sized hashes that are not
object IDs.  All of these hashes are artifacts of the internal hash
algorithm in use, and when we switch to NewHash for object storage, all
of these uses will also switch.

Update the hashcpy, hashclr, and hashcmp functions to use the_hash_algo,
since they are used in a variety of places to copy and manipulate
buffers that need to move data into or out of struct object_id.  This
has the effect of making the corresponding oid* functions use
the_hash_algo as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16 14:27:38 -07:00
Elijah Newren
e1f8694f33 merge-recursive: fix assumption that head tree being merged is HEAD
`git merge-recursive` does a three-way merge between user-specified trees
base, head, and remote.  Since the user is allowed to specify head, we can
not necesarily assume that head == HEAD.

Modify index_has_changes() to take an extra argument specifying the tree
to compare against.  If NULL, it will compare to HEAD.  We then use this
from merge-recursive to make sure we compare to the user-specified head.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-11 09:38:36 -07:00
Elijah Newren
1b9fbefbe0 index_has_changes(): avoid assuming operating on the_index
Modify index_has_changes() to take a struct istate* instead of just
operating on the_index.  This is only a partial conversion, though,
because we call do_diff_cache() which implicitly assumes work is to be
done on the_index.  Ongoing work is being done elsewhere to do the
remainder of the conversion, and thus is not duplicated here.  Instead,
a simple check is put in place until that work is complete.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 13:13:18 -07:00
Jameson Miller
8616a2d0cb block alloc: add validations around cache_entry lifecyle
Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
     were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.

Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 10:58:27 -07:00
Jameson Miller
8e72d67529 block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.

This change moves the cache entry allocation to be on top of memory
pools.

Design:

The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
    split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool.  This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.

Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:

Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.

Alternatives:

1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
  Pro: No extra memory overhead to track this state
  Con: Extra complexity in callers to handle this correctly.

The extra complexity and burden to not regress this behavior in the
future was more than we wanted.

2) cache_entry would gain knowledge about which mem_pool allocated it
  Pro: Could (potentially) do extra logic to know when a mem_pool no
       longer had references to any cache_entry
  Con: cache_entry would grow heavier by a pointer, instead of int

We didn't see a tangible benefit to this approach

3) Do not add any extra information to a cache_entry, but when freeing a
   cache entry, check if the memory exists in a region managed by existing
   mem_pools.
  Pro: No extra memory overhead to track state
  Con: Extra computation is performed when freeing cache entries

We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.

Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 10:58:27 -07:00
Jameson Miller
a849735bfb block alloc: add lifecycle APIs for cache_entry structs
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made to allocate cahce entries when loading an index.

Add an API to allocate and discard cache entries, abstracting the
details of managing the memory backing the cache entries. This commit
does actually change how memory is managed - this will be done in a
later commit in the series.

This change makes the distinction between cache entries that are
associated with an index and cache entries that are not associated with
an index. A main use of cache entries is with an index, and we can
optimize the memory management around this. We still have other cases
where a cache entry is not persisted with an index, and so we need to
handle the "transient" use case as well.

To keep the congnitive overhead of managing the cache entries, there
will only be a single discard function. This means there must be enough
information kept with the cache entry so that we know how to discard
them.

A summary of the main functions in the API is:

make_cache_entry: create cache entry for use in an index. Uses specified
                  parameters to populate cache_entry fields.

make_empty_cache_entry: Create an empty cache entry for use in an index.
                        Returns cache entry with empty fields.

make_transient_cache_entry: create cache entry that is not used in an
                            index. Uses specified parameters to populate
                            cache_entry fields.

make_empty_transient_cache_entry: create cache entry that is not used in
                                  an index. Returns cache entry with
                                  empty fields.

discard_cache_entry: A single function that knows how to discard a cache
                     entry regardless of how it was allocated.

Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 10:58:27 -07:00
Jameson Miller
825ed4d9a0 read-cache: teach make_cache_entry to take object_id
Teach make_cache_entry function to take object_id instead of a SHA-1.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 10:58:15 -07:00
Jameson Miller
768d796506 read-cache: teach refresh_cache_entry to take istate
Refactor refresh_cache_entry() to work on a specific index, instead of
implicitly using the_index. This is in preparation for making the
make_cache_entry function apply to a specific index.

Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 10:58:15 -07:00
Junio C Hamano
b16b60f71b Merge branch 'sb/object-store-grafts' into sb/object-store-lookup
* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-06-29 10:43:28 -07:00
Junio C Hamano
110240588d Merge branch 'sb/object-store-alloc'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-alloc:
  alloc: allow arbitrary repositories for alloc functions
  object: allow create_object to handle arbitrary repositories
  object: allow grow_object_hash to handle arbitrary repositories
  alloc: add repository argument to alloc_commit_index
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_blob_node
  object: add repository argument to grow_object_hash
  object: add repository argument to create_object
  repository: introduce parsed objects field
2018-06-25 13:22:38 -07:00
Junio C Hamano
2289880f78 Merge branch 'nd/command-list'
The list of commands with their various attributes were spread
across a few places in the build procedure, but it now is getting a
bit more consolidated to allow more automation.

* nd/command-list:
  completion: allow to customize the completable command list
  completion: add and use --list-cmds=alias
  completion: add and use --list-cmds=nohelpers
  Move declaration for alias.c to alias.h
  completion: reduce completable command list
  completion: let git provide the completable command list
  command-list.txt: documentation and guide line
  help: use command-list.txt for the source of guides
  help: add "-a --verbose" to list all commands with synopsis
  git: support --list-cmds=list-<category>
  completion: implement and use --list-cmds=main,others
  git --list-cmds: collect command list in a string_list
  git.c: convert --list-* to --list-cmds=*
  Remove common-cmds.h
  help: use command-list.h for common command list
  generate-cmds.sh: export all commands to command-list.h
  generate-cmds.sh: factor out synopsis extract code
2018-06-01 15:06:37 +09:00
Junio C Hamano
42c8ce1c49 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (42 commits)
  merge-one-file: compute empty blob object ID
  add--interactive: compute the empty tree value
  Update shell scripts to compute empty tree object ID
  sha1_file: only expose empty object constants through git_hash_algo
  dir: use the_hash_algo for empty blob object ID
  sequencer: use the_hash_algo for empty tree object ID
  cache-tree: use is_empty_tree_oid
  sha1_file: convert cached object code to struct object_id
  builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
  builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
  wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
  submodule: convert several uses of EMPTY_TREE_SHA1_HEX
  sequencer: convert one use of EMPTY_TREE_SHA1_HEX
  merge: convert empty tree constant to the_hash_algo
  builtin/merge: switch tree functions to use object_id
  builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
  sha1-file: add functions for hex empty tree and blob OIDs
  builtin/receive-pack: avoid hard-coded constants for push certs
  diff: specify abbreviation size in terms of the_hash_algo
  upload-pack: replace use of several hard-coded constants
  ...
2018-05-30 14:04:10 +09:00
Junio C Hamano
7913f53b56 Sync with Git 2.17.1
* maint: (25 commits)
  Git 2.17.1
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  fsck: complain when .gitmodules is a symlink
  index-pack: check .gitmodules files with --strict
  unpack-objects: call fsck_finish() after fscking objects
  fsck: call fsck_finish() after fscking objects
  fsck: check .gitmodules content
  fsck: handle promisor objects in .gitmodules check
  fsck: detect gitmodules files
  fsck: actually fsck blob data
  fsck: simplify ".git" check
  index-pack: make fsck error message more specific
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  ...
2018-05-29 17:10:05 +09:00
Junio C Hamano
ad635e82d6 Merge branch 'nd/pack-objects-pack-struct'
"git pack-objects" needs to allocate tons of "struct object_entry"
while doing its work, and shrinking its size helps the performance
quite a bit.

* nd/pack-objects-pack-struct:
  ci: exercise the whole test suite with uncommon code in pack-objects
  pack-objects: reorder members to shrink struct object_entry
  pack-objects: shrink delta_size field in struct object_entry
  pack-objects: shrink size field in struct object_entry
  pack-objects: clarify the use of object_entry::size
  pack-objects: don't check size when the object is bad
  pack-objects: shrink z_delta_size field in struct object_entry
  pack-objects: refer to delta objects by index instead of pointer
  pack-objects: move in_pack out of struct object_entry
  pack-objects: move in_pack_pos out of struct object_entry
  pack-objects: use bitfield for object_entry::depth
  pack-objects: use bitfield for object_entry::dfs_state
  pack-objects: turn type and in_pack_type to bitfields
  pack-objects: a bit of document about struct object_entry
  read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean
2018-05-23 14:38:19 +09:00
Junio C Hamano
fcb6df3254 Merge branch 'sb/oid-object-info'
The codepath around object-info API has been taught to take the
repository object (which in turn tells the API which object store
the objects are to be located).

* sb/oid-object-info:
  cache.h: allow oid_object_info to handle arbitrary repositories
  packfile: add repository argument to cache_or_unpack_entry
  packfile: add repository argument to unpack_entry
  packfile: add repository argument to read_object
  packfile: add repository argument to packed_object_info
  packfile: add repository argument to packed_to_object_type
  packfile: add repository argument to retry_bad_packed_offset
  cache.h: add repository argument to oid_object_info
  cache.h: add repository argument to oid_object_info_extended
2018-05-23 14:38:16 +09:00
Junio C Hamano
b577198526 Merge branch 'nd/pack-format-doc'
Doc update.

* nd/pack-format-doc:
  pack-format.txt: more details on pack file format
2018-05-23 14:38:11 +09:00
Junio C Hamano
68f95b26e4 Sync with Git 2.16.4
* maint-2.16:
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:25:26 +09:00
Junio C Hamano
023020401d Sync with Git 2.15.2
* maint-2.15:
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:18:06 +09:00
Junio C Hamano
9e0f06d55d Sync with Git 2.14.4
* maint-2.14:
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:15:14 +09:00
Junio C Hamano
7b01c71b64 Sync with Git 2.13.7
* maint-2.13:
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:10:49 +09:00
Jeff King
10ecfa7649 verify_path: disallow symlinks in .gitmodules
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:

  1. It won't be portable to systems without symlinks.

  2. It may behave inconsistently, since Git may look at
     this file in the index or a tree without bothering to
     resolve any symbolic links. We don't do this _yet_, but
     the config infrastructure is there and it's planned for
     the future.

With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:

  a. A symlinked .gitmodules file may circumvent any fsck
     checks of the content.

  b. Git may read and write from the on-disk file without
     sanity checking the symlink target. So for example, if
     you link ".gitmodules" to "../oops" and run "git
     submodule add", we'll write to the file "oops" outside
     the repository.

Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.

Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Johannes Schindelin
e7cb0b4455 is_ntfs_dotgit: match other .git files
When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.

However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.

It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
number 4. After that, a *different* prefix is used, calculated from the
long file name using an undocumented, but stable algorithm.

For example, the short name of .gitmodules would be GITMOD~1, but if it
is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
GI7EBA~1 will be used. From there, collisions are handled by
incrementing the number, shortening the prefix as needed (until ~9999999
is reached, in which case NTFS will not allow the file to be created).

We'd also want to handle .gitignore and .gitattributes, which suffer
from a similar problem, using the fall-back short names GI250A~1 and
GI7D29~1, respectively.

To accommodate for that, we could reimplement the hashing algorithm, but
it is just safer and simpler to provide the known prefixes. This
algorithm has been reverse-engineered and described at
https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
still available via https://web.archive.org/.

These can be recomputed by running the following Perl script:

-- snip --
use warnings;
use strict;

sub compute_short_name_hash ($) {
        my $checksum = 0;
        foreach (split('', $_[0])) {
                $checksum = ($checksum * 0x25 + ord($_)) & 0xffff;
        }

        $checksum = ($checksum * 314159269) & 0xffffffff;
        $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000);
        $checksum -= (($checksum * 1152921497) >> 60) * 1000000007;

        return scalar reverse sprintf("%x", $checksum & 0xffff);
}

print compute_short_name_hash($ARGV[0]);
-- snap --

E.g., running that with the argument ".gitignore" will
result in "250a" (which then becomes "gi250a" in the code).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Nguyễn Thái Ngọc Duy
65b5f9483e Move declaration for alias.c to alias.h
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21 13:23:14 +09:00
Stefan Beller
0437a2e365 cache: convert get_graft_file to handle arbitrary repositories
This conversion was done without the #define trick used in the earlier
series refactoring to have better repository access, because this function
is easy to review, as all lines are converted and it has only one caller.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller
cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Stefan Beller
14ba97f81c alloc: allow arbitrary repositories for alloc functions
We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:16:50 +09:00
Nguyễn Thái Ngọc Duy
011b648646 pack-format.txt: more details on pack file format
The current document mentions OBJ_* constants without their actual
values. A git developer would know these are from cache.h but that's
not very friendly to a person who wants to read this file to implement
a pack file parser.

Similarly, the deltified representation is not documented at all (the
"document" is basically patch-delta.c). Translate that C code to
English with a bit more about what ofs-delta and ref-delta mean.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13 10:20:03 +09:00
Stefan Beller
dd5d9deb01 alloc: add repository argument to alloc_commit_index
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
17bfe87369 alloc: add repository argument to alloc_report
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
13e3fdcb76 alloc: add repository argument to alloc_object_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
a0bd9086bb alloc: add repository argument to alloc_tag_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
8ba0e5ec57 alloc: add repository argument to alloc_commit_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
cf7203bdc6 alloc: add repository argument to alloc_tree_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
f0de1d62ae alloc: add repository argument to alloc_blob_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Junio C Hamano
174774cd51 Merge branch 'sb/object-store-replace'
The effort to pass the repository in-core structure throughout the
API continues.  This round deals with the code that implements the
refs/replace/ mechanism.

* sb/object-store-replace:
  replace-object: allow lookup_replace_object to handle arbitrary repositories
  replace-object: allow do_lookup_replace_object to handle arbitrary repositories
  replace-object: allow prepare_replace_object to handle arbitrary repositories
  refs: allow for_each_replace_ref to handle arbitrary repositories
  refs: store the main ref store inside the repository struct
  replace-object: add repository argument to lookup_replace_object
  replace-object: add repository argument to do_lookup_replace_object
  replace-object: add repository argument to prepare_replace_object
  refs: add repository argument to for_each_replace_ref
  refs: add repository argument to get_main_ref_store
  replace-object: check_replace_refs is safe in multi repo environment
  replace-object: eliminate replace objects prepared flag
  object-store: move lookup_replace_object to replace-object.h
  replace-object: move replace_map to object store
  replace_object: use oidmap
2018-05-08 15:59:21 +09:00
Junio C Hamano
b10edb2df5 Merge branch 'ds/commit-graph'
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.

* ds/commit-graph:
  commit-graph: implement "--append" option
  commit-graph: build graph from starting commits
  commit-graph: read only from specific pack-indexes
  commit: integrate commit graph with commit parsing
  commit-graph: close under reachability
  commit-graph: add core.commitGraph setting
  commit-graph: implement git commit-graph read
  commit-graph: implement git-commit-graph write
  commit-graph: implement write_commit_graph()
  commit-graph: create git-commit-graph builtin
  graph: add commit graph design document
  commit-graph: add format document
  csum-file: refactor finalize_hashfile() method
  csum-file: rename hashclose() to finalize_hashfile()
2018-05-08 15:59:20 +09:00
Junio C Hamano
92034a9cd5 Merge branch 'dj/runtime-prefix'
A build-time option has been added to allow Git to be told to refer
to its associated files relative to the main binary, in the same
way that has been possible on Windows for quite some time, for
Linux, BSDs and Darwin.

* dj/runtime-prefix:
  Makefile: quote $INSTLIBDIR when passing it to sed
  Makefile: remove unused @@PERLLIBDIR@@ substitution variable
  mingw/msvc: use the new-style RUNTIME_PREFIX helper
  exec_cmd: provide a new-style RUNTIME_PREFIX helper for Windows
  exec_cmd: RUNTIME_PREFIX on some POSIX systems
  Makefile: add Perl runtime prefix support
  Makefile: generate Perl header from template file
2018-05-08 15:59:17 +09:00
brian m. carlson
e1ccd7e2b1 sha1_file: only expose empty object constants through git_hash_algo
There really isn't any case in which we want to expose the constants for
empty trees and blobs outside of using the hash algorithm abstraction.
Make these constants static and stop exposing the defines in cache.h.
Remove the constants which are no longer in use.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:53 +09:00
brian m. carlson
d8a92ced62 sha1-file: add functions for hex empty tree and blob OIDs
Oftentimes, we'll want to refer to an empty tree or empty blob by its
hex name without having to call oid_to_hex or explicitly refer to
the_hash_algo.  Add helper functions that format these values into
static buffers and return them for easy use.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:51 +09:00
brian m. carlson
75691ea345 Update struct index_state to use struct object_id
Adjust struct index_state to use struct object_id instead of unsigned
char [20].

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:50 +09:00
brian m. carlson
6862ebbfcb sha1-file: convert freshen functions to object_id
Convert the various functions for freshening objects and
has_loose_object_nonlocal to use struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:49 +09:00
brian m. carlson
c51c39418b packfile: remove unused member from struct pack_entry
The sha1 member in struct pack_entry is unused except for one instance
in which we store a value in it.  Since nobody ever reads this value,
don't bother to compute it and remove the member from struct pack_entry.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:49 +09:00
brian m. carlson
6f13fd0ec6 Remove unused member in struct object_context
The tree member of struct object_context is unused except in one place
where we write to it.  Since there are no users of this member, remove
it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:49 +09:00
brian m. carlson
69d124255e cache: add a function to read an object ID from a buffer
In various places throughout the codebase, we need to read data into a
struct object_id from a pack or other unsigned char buffer.  Add an
inline function that does this based on the current hash algorithm in
use, and use it in several places.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:48 +09:00
Stefan Beller
9d98354f48 cache.h: allow oid_object_info to handle arbitrary repositories
This involves also adapting oid_object_info_extended and a some
internal functions that are used to implement these. It all has to
happen in one patch, because of a single recursive chain of calls visits
all these functions.

oid_object_info_extended is also used in partial clones, which allow
fetching missing objects. As this series will not add the repository
struct to the transport code and fetch_object(), add a TODO note and
omit fetching if a user tries to use a partial clone in a repository
other than the_repository.

Among the functions modified to handle arbitrary repositories,
unpack_entry() is one of them. Note that it still references the globals
"delta_base_cache" and "delta_base_cached", but those are safe to be
referenced (the former is indexed partly by "struct packed_git *", which
is repo-specific, and the latter is only used to limit the size of the
former as an optimization).

Helped-by: Brandon Williams <bmwill@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 10:54:28 +09:00
Stefan Beller
0df8e96566 cache.h: add repository argument to oid_object_info
Add a repository argument to allow the callers of oid_object_info
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 10:54:27 +09:00
Stefan Beller
7ecd869060 cache.h: add repository argument to oid_object_info_extended
Add a repository argument to allow oid_object_info_extended callers
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 10:54:27 +09:00
Junio C Hamano
ff6eb825f0 Merge branch 'jk/relative-directory-fix'
Some codepaths, including the refs API, get and keep relative
paths, that go out of sync when the process does chdir(2).  The
chdir-notify API is introduced to let these codepaths adjust these
cached paths to the new current directory.

* jk/relative-directory-fix:
  refs: use chdir_notify to update cached relative paths
  set_work_tree: use chdir_notify
  add chdir-notify API
  trace.c: export trace_setup_key
  set_git_dir: die when setenv() fails
2018-04-25 13:28:52 +09:00
Nguyễn Thái Ngọc Duy
fd9b1baef8 pack-objects: turn type and in_pack_type to bitfields
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.

A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.

But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger

    fatal: unable to get type of object ...

Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.

[1] See 5c49c11686 (pack-objects: better check_object() performances -
    2007-04-16)

[2] 21666f1aae (convert object type handling from a string to a number
    - 2007-02-26)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Stefan Beller
47f351e9b3 object-store: move lookup_replace_object to replace-object.h
lookup_replace_object is a low-level function that most users of the
object store do not need to use directly.

Move it to replace-object.h to avoid a dependency loop in an upcoming
change to its inline definition that will make use of repository.h.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-12 11:38:56 +09:00
Dan Jacques
226c0ddd0d exec_cmd: RUNTIME_PREFIX on some POSIX systems
Enable Git to resolve its own binary location using a variety of
OS-specific and generic methods, including:

- procfs via "/proc/self/exe" (Linux)
- _NSGetExecutablePath (Darwin)
- KERN_PROC_PATHNAME sysctl on BSDs.
- argv0, if absolute (all, including Windows).

This is used to enable RUNTIME_PREFIX support for non-Windows systems,
notably Linux and Darwin. When configured with RUNTIME_PREFIX, Git will
do a best-effort resolution of its executable path and automatically use
this as its "exec_path" for relative helper and data lookups, unless
explicitly overridden.

Small incidental formatting cleanup of "exec_cmd.c".

Signed-off-by: Dan Jacques <dnj@google.com>
Thanks-to: Robbie Iannucci <iannucci@google.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 18:10:28 +09:00
Junio C Hamano
cf0b1793ea Merge branch 'sb/object-store'
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.

Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.

* sb/object-store: (27 commits)
  sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
  sha1_file: allow map_sha1_file to handle arbitrary repositories
  sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
  sha1_file: allow open_sha1_file to handle arbitrary repositories
  sha1_file: allow stat_sha1_file to handle arbitrary repositories
  sha1_file: allow sha1_file_name to handle arbitrary repositories
  sha1_file: add repository argument to sha1_loose_object_info
  sha1_file: add repository argument to map_sha1_file
  sha1_file: add repository argument to map_sha1_file_1
  sha1_file: add repository argument to open_sha1_file
  sha1_file: add repository argument to stat_sha1_file
  sha1_file: add repository argument to sha1_file_name
  sha1_file: allow prepare_alt_odb to handle arbitrary repositories
  sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
  sha1_file: add repository argument to prepare_alt_odb
  sha1_file: add repository argument to link_alt_odb_entries
  sha1_file: add repository argument to read_info_alternates
  sha1_file: add repository argument to link_alt_odb_entry
  sha1_file: add raw_object_store argument to alt_odb_usable
  pack: move approximate object count to object store
  ...
2018-04-11 13:09:55 +09:00
Derrick Stolee
1b70dfd594 commit-graph: add core.commitGraph setting
The commit graph feature is controlled by the new core.commitGraph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.commitGraph is that a user can always stop checking
for or parsing commit graph files if core.commitGraph=0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:43:01 +09:00
Junio C Hamano
0873c393c7 Merge branch 'nd/remove-ignore-env-field'
Code clean-up for the "repository" abstraction.

* nd/remove-ignore-env-field:
  repository.h: add comment and clarify repo_set_gitdir
  repository: delete ignore_env member
  sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
  repository.c: delete dead functions
  repository.c: move env-related setup code back to environment.c
  repository: initialize the_repository in main()
2018-04-10 16:28:20 +09:00
Junio C Hamano
a5bbc29994 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (36 commits)
  convert: convert to struct object_id
  sha1_file: introduce a constant for max header length
  Convert lookup_replace_object to struct object_id
  sha1_file: convert read_sha1_file to struct object_id
  sha1_file: convert read_object_with_reference to object_id
  tree-walk: convert tree entry functions to object_id
  streaming: convert istream internals to struct object_id
  tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
  builtin/notes: convert static functions to object_id
  builtin/fmt-merge-msg: convert remaining code to object_id
  sha1_file: convert sha1_object_info* to object_id
  Convert remaining callers of sha1_object_info_extended to object_id
  packfile: convert unpack_entry to struct object_id
  sha1_file: convert retry_bad_packed_offset to struct object_id
  sha1_file: convert assert_sha1_type to object_id
  builtin/mktree: convert to struct object_id
  streaming: convert open_istream to use struct object_id
  sha1_file: convert check_sha1_signature to struct object_id
  sha1_file: convert read_loose_object to use struct object_id
  builtin/index-pack: convert struct ref_delta_entry to object_id
  ...
2018-04-10 08:25:45 +09:00
Junio C Hamano
5d806b74d5 Merge branch 'ti/fetch-everything-local-optim'
A "git fetch" from a repository with insane number of refs into a
repository that is already up-to-date still wasted too many cycles
making many lstat(2) calls to see if these objects at the tips
exist as loose objects locally.  These lstat(2) calls are optimized
away by enumerating all loose objects beforehand.

It is unknown if the new strategy negatively affects existing use
cases, fetching into a repository with many loose objects from a
repository with small number of refs.

* ti/fetch-everything-local-optim:
  fetch-pack.c: use oidset to check existence of loose object
2018-04-10 08:25:43 +09:00
Jeff King
48988c4d0c set_git_dir: die when setenv() fails
The set_git_dir() function returns an error if setenv()
fails, but there are zero callers who pay attention to this
return value. If this ever were to happen, it could cause
confusing results, as sub-processes would see a potentially
stale GIT_DIR (e.g., if it is relative and we chdir()-ed to
the root of the working tree).

We _could_ try to fix each caller, but there's really
nothing useful to do after this failure except die. Let's
just lump setenv() failure into the same category as malloc
failure: things that should never happen and cause us to
abort catastrophically.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-30 12:49:57 -07:00
Stefan Beller
e35454fa62 sha1_file: add repository argument to map_sha1_file
Add a repository argument to allow map_sha1_file callers to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

While at it, move the declaration to object-store.h, where it should
be easier to find.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:05:55 -07:00
Stefan Beller
cf78ae4f3d sha1_file: add repository argument to sha1_file_name
Add a repository argument to allow sha1_file_name callers to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

While at it, move the declaration to object-store.h, where it should
be easier to find.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:05:55 -07:00
Stefan Beller
a80d72db2a object-store: move packed_git and packed_git_mru to object store
In a process with multiple repositories open, packfile accessors
should be associated to a single repository and not shared globally.
Move packed_git and packed_git_mru into the_repository and adjust
callers to reflect this.

[nd: while at there, wrap access to these two fields in get_packed_git()
and get_packed_git_mru(). This allows us to lazily initialize these
fields without caller doing that explicitly]

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:05:46 -07:00
Stefan Beller
0d4a132144 object-store: migrate alternates struct and functions from cache.h
Migrate the struct alternate_object_database and all its related
functions to the object store as these functions are easier found in
that header. The migration is just a verbatim copy, no need to
include the object store header at any C file, because cache.h includes
repository.h which in turn includes the object-store.h

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-23 11:06:01 -07:00
Junio C Hamano
b0e0fc267b Merge branch 'tg/split-index-fixes' into maint
The split-index mode had a few corner case bugs fixed.

* tg/split-index-fixes:
  travis: run tests with GIT_TEST_SPLIT_INDEX
  split-index: don't write cache tree with null oid entries
  read-cache: fix reading the shared index for other repos
2018-03-22 14:24:10 -07:00
Junio C Hamano
beb2cdf504 Merge branch 'ma/skip-writing-unchanged-index'
Internal API clean-up to allow write_locked_index() optionally skip
writing the in-core index when it is not modified.

* ma/skip-writing-unchanged-index:
  write_locked_index(): add flag to avoid writing unchanged index
2018-03-21 11:30:10 -07:00
Takuto Ikuta
024aa4696c fetch-pack.c: use oidset to check existence of loose object
When fetching from a repository with large number of refs, because to
check existence of each refs in local repository to packed and loose
objects, 'git fetch' ends up doing a lot of lstat(2) to non-existing
loose form, which makes it slow.

Instead of making as many lstat(2) calls as the refs the remote side
advertised to see if these objects exist in the loose form, first
enumerate all the existing loose objects in hashmap beforehand and use
it to check existence of them if the number of refs is larger than the
number of loose objects.

With this patch, the number of lstat(2) calls in `git fetch` is reduced
from 411412 to 13794 for chromium repository, it has more than 480000
remote refs.

I took time stat of `git fetch` when fetch-pack happens for chromium
repository 3 times on linux with SSD.
* with this patch
8.105s
8.309s
7.640s
avg: 8.018s

* master
12.287s
11.175s
12.227s
avg: 11.896s

On my MacBook Air which has slower lstat(2).
* with this patch
14.501s

* master
1m16.027s

`git fetch` on slow disk will be improved largely.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 11:17:26 -07:00
brian m. carlson
b383a13cc0 Convert lookup_replace_object to struct object_id
Convert both the argument and the return value to be pointers to struct
object_id.  Update the callers and their internals to deal with the new
type.  Remove several temporaries which are no longer needed.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson
b4f5aca40e sha1_file: convert read_sha1_file to struct object_id
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file.  Do the same for read_sha1_file_extended.

Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id.  Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson
02f0547eaa sha1_file: convert read_object_with_reference to object_id
Convert read_object_with_reference to take pointers to struct object_id.
Update the internals of the function accordingly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson
abef9020e3 sha1_file: convert sha1_object_info* to object_id
Convert sha1_object_info and sha1_object_info_extended to take pointers
to struct object_id and rename them to use "oid" instead of "sha1" in
their names.  Update the declaration and definition and apply the
following semantic patch, plus the standard object_id transforms:

@@
expression E1, E2;
@@
- sha1_object_info(E1.hash, E2)
+ oid_object_info(&E1, E2)

@@
expression E1, E2;
@@
- sha1_object_info(E1->hash, E2)
+ oid_object_info(E1, E2)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1.hash, E2, E3)
+ oid_object_info_extended(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1->hash, E2, E3)
+ oid_object_info_extended(E1, E2, E3)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson
e816caa07b sha1_file: convert assert_sha1_type to object_id
Convert this function to take a pointer to struct object_id and rename
it to assert_oid_type.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson
17e65451e3 sha1_file: convert check_sha1_signature to struct object_id
Convert this function to take a pointer to struct object_id and rename
it check_object_signature.  Introduce temporaries to convert the return
values of lookup_replace_object and lookup_replace_object_extended into
struct object_id.

The temporaries are needed because in order to convert
lookup_replace_object, open_istream needs to be converted, and
open_istream needs check_sha1_signature to be converted, causing a loop
of dependencies.  The temporaries will be removed in a future patch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson
d61d87bd15 sha1_file: convert read_loose_object to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:48 -07:00
brian m. carlson
aab9583f7b Convert find_unique_abbrev* to struct object_id
Convert find_unique_abbrev and find_unique_abbrev_r to each take a
pointer to struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:48 -07:00
Junio C Hamano
169c9c0169 Merge branch 'bw/c-plus-plus'
Avoid using identifiers that clash with C++ keywords.  Even though
it is not a goal to compile Git with C++ compilers, changes like
this help use of code analysis tools that targets C++ on our
codebase.

* bw/c-plus-plus: (37 commits)
  replace: rename 'new' variables
  trailer: rename 'template' variables
  tempfile: rename 'template' variables
  wrapper: rename 'template' variables
  environment: rename 'namespace' variables
  diff: rename 'template' variables
  environment: rename 'template' variables
  init-db: rename 'template' variables
  unpack-trees: rename 'new' variables
  trailer: rename 'new' variables
  submodule: rename 'new' variables
  split-index: rename 'new' variables
  remote: rename 'new' variables
  ref-filter: rename 'new' variables
  read-cache: rename 'new' variables
  line-log: rename 'new' variables
  imap-send: rename 'new' variables
  http: rename 'new' variables
  entry: rename 'new' variables
  diffcore-delta: rename 'new' variables
  ...
2018-03-06 14:54:07 -08:00
Nguyễn Thái Ngọc Duy
357a03ebe9 repository.c: move env-related setup code back to environment.c
It does not make sense that generic repository code contains handling
of environment variables, which are specific for the main repository
only. Refactor repo_set_gitdir() function to take $GIT_DIR and
optionally _all_ other customizable paths. These optional paths can be
NULL and will be calculated according to the default directory layout.

Note that some dead functions are left behind to reduce diff
noise. They will be deleted in the next patch.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05 11:14:03 -08:00
Martin Ågren
610008146e write_locked_index(): add flag to avoid writing unchanged index
We have several callers like

	if (active_cache_changed && write_locked_index(...))
		handle_error();
	rollback_lock_file(...);

where the final rollback is needed because "!active_cache_changed"
shortcuts the if-expression. There are also a few variants of this,
including some if-else constructs that make it more clear when the
explicit rollback is really needed.

Teach `write_locked_index()` to take a new flag SKIP_IF_UNCHANGED and
simplify the callers. Leave the most complicated of the callers (in
builtin/update-index.c) unchanged. Rewriting it to use this new flag
would end up duplicating logic.

We could have made the new flag behave the other way round
("FORCE_WRITE"), but that could break existing users behind their backs.
Let's take the more conservative approach. We can still migrate existing
callers to use our new flag. Later we might even be able to flip the
default, possibly without entirely ignoring the risk to in-flight or
out-of-tree topics.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-01 13:28:01 -08:00
Brandon Williams
a63b5fca9b environment: rename 'template' variables
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-22 10:08:05 -08:00
Junio C Hamano
0fd90daba8 Merge branch 'bc/hash-algo'
More abstraction of hash function from the codepath.

* bc/hash-algo:
  hash: update obsolete reference to SHA1_HEADER
  bulk-checkin: abstract SHA-1 usage
  csum-file: abstract uses of SHA-1
  csum-file: rename sha1file to hashfile
  read-cache: abstract away uses of SHA-1
  pack-write: switch various SHA-1 values to abstract forms
  pack-check: convert various uses of SHA-1 to abstract forms
  fast-import: switch various uses of SHA-1 to the_hash_algo
  sha1_file: switch uses of SHA-1 to the_hash_algo
  builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
  builtin/index-pack: improve hash function abstraction
  hash: create union for hash context allocation
  hash: move SHA-1 macros to hash.h
2018-02-15 14:55:47 -08:00
Junio C Hamano
8be8342b4c Merge branch 'po/object-id'
Conversion from uchar[20] to struct object_id continues.

* po/object-id:
  sha1_file: rename hash_sha1_file_literally
  sha1_file: convert write_loose_object to object_id
  sha1_file: convert force_object_loose to object_id
  sha1_file: convert write_sha1_file to object_id
  notes: convert write_notes_tree to object_id
  notes: convert combine_notes_* to object_id
  commit: convert commit_tree* to object_id
  match-trees: convert splice_tree to object_id
  cache: clear whole hash buffer with oidclr
  sha1_file: convert hash_sha1_file to object_id
  dir: convert struct sha1_stat to use object_id
  sha1_file: convert pretend_sha1_file to object_id
2018-02-15 14:55:43 -08:00
Brandon Williams
6ca32f4714 object_info: change member name from 'typename' to 'type_name'
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-14 13:10:05 -08:00
Junio C Hamano
e75c862125 Merge branch 'tg/split-index-fixes'
The split-index mode had a few corner case bugs fixed.

* tg/split-index-fixes:
  travis: run tests with GIT_TEST_SPLIT_INDEX
  split-index: don't write cache tree with null oid entries
  read-cache: fix reading the shared index for other repos
2018-02-13 13:39:13 -08:00
Junio C Hamano
9238941618 Merge branch 'cc/sha1-file-name'
Code clean-up.

* cc/sha1-file-name:
  sha1_file: improve sha1_file_name() perfs
  sha1_file: remove static strbuf from sha1_file_name()
2018-02-13 13:39:10 -08:00
Junio C Hamano
867622398f Merge branch 'gs/retire-mru'
Retire mru API as it does not give enough abstraction over
underlying list API to be worth it.

* gs/retire-mru:
  mru: Replace mru.[ch] with list.h implementation
2018-02-13 13:39:06 -08:00
Junio C Hamano
6bed209a20 Merge branch 'jh/partial-clone'
The machinery to clone & fetch, which in turn involves packing and
unpacking objects, have been told how to omit certain objects using
the filtering mechanism introduced by the jh/object-filtering
topic, and also mark the resulting pack as a promisor pack to
tolerate missing objects, taking advantage of the mechanism
introduced by the jh/fsck-promisors topic.

* jh/partial-clone:
  t5616: test bulk prefetch after partial fetch
  fetch: inherit filter-spec from partial clone
  t5616: end-to-end tests for partial clone
  fetch-pack: restore save_commit_buffer after use
  unpack-trees: batch fetching of missing blobs
  clone: partial clone
  partial-clone: define partial clone settings in config
  fetch: support filters
  fetch: refactor calculation of remote list
  fetch-pack: test support excluding large blobs
  fetch-pack: add --no-filter
  fetch-pack, index-pack, transport: partial clone
  upload-pack: add object filtering for partial clone
2018-02-13 13:39:04 -08:00
Junio C Hamano
f3d618d2bf Merge branch 'jh/fsck-promisors'
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.

* jh/fsck-promisors:
  gc: do not repack promisor packfiles
  rev-list: support termination at promisor objects
  sha1_file: support lazily fetching missing objects
  introduce fetch-object: fetch one promisor object
  index-pack: refactor writing of .keep files
  fsck: support promisor objects as CLI argument
  fsck: support referenced promisor objects
  fsck: support refs pointing to promisor objects
  fsck: introduce partialclone extension
  extension.partialclone: introduce partial clone extension
2018-02-13 13:39:03 -08:00
brian m. carlson
164e716330 hash: move SHA-1 macros to hash.h
Most of the other code dealing with SHA-1 and other hashes is located in
hash.h, which is in turn loaded by cache.h.  Move the SHA-1 macros to
hash.h as well, so we can use them in additional hash-related items in
the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:40 -08:00
Patryk Obara
1752cbbc44 sha1_file: rename hash_sha1_file_literally
This function was already converted to use struct object_id earlier.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara
4bdb70a4f7 sha1_file: convert force_object_loose to object_id
Convert the definition and declaration of force_object_loose to
struct object_id and adjust usage of this function.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara
a09c985eae sha1_file: convert write_sha1_file to object_id
Convert the definition and declaration of write_sha1_file to
struct object_id and adjust usage of this function.

This commit also converts static function write_sha1_file_prepare, as it
is closely related.

Rename these functions to write_object_file and
write_object_file_prepare respectively.

Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents
wherever possible.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara
97a41a0c01 cache: clear whole hash buffer with oidclr
As long as GIT_SHA1_RAWSZ is equal to GIT_MAX_RAWSZ there's no problem,
but when new hashing algorithm will be in place this memset will clear
only 20-byte prefix of hash buffer.

Alternatively, hashclr implementation could be adjusted, but this
function is almost removed from codebase already.  Separate
implementation of oidclr prevents potential buffer overrun in case
someone incorrectly used hashclr on object_id in future.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara
f070faccc1 sha1_file: convert hash_sha1_file to object_id
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.

Rename this function to hash_object_file.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara
4b33e60201 dir: convert struct sha1_stat to use object_id
Convert the declaration of struct sha1_stat. Adjust all usages of this
struct and replace hash{clr,cmp,cpy} with oid{clr,cmp,cpy} wherever
possible.  Rename it to struct oid_stat.

Rename static function load_sha1_stat to load_oid_stat.

Remove macro EMPTY_BLOB_SHA1_BIN, as it's no longer used.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara
829e5c3b92 sha1_file: convert pretend_sha1_file to object_id
Convert the declaration and definition of pretend_sha1_file to use
struct object_id and adjust all usages of this function.  Rename it to
pretend_object_file.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:35 -08:00
Gargi Sharma
ec2dd32c70 mru: Replace mru.[ch] with list.h implementation
Replace the custom calls to mru.[ch] with calls to list.h. This patch is
the final step in removing the mru API completely and inlining the logic.
This patch leads to significant code reduction and the mru API hence, is
not a useful abstraction anymore.

Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-24 09:52:16 -08:00
Thomas Gummerer
4bddd98311 split-index: don't write cache tree with null oid entries
In a96d3cc3f6 ("cache-tree: reject entries with null sha1", 2017-04-21)
we made sure that broken cache entries do not get propagated to new
trees.  Part of that was making sure not to re-use an existing cache
tree that includes a null oid.

It did so by dropping the cache tree in 'do_write_index()' if one of
the entries contains a null oid.  In split index mode however, there
are two invocations to 'do_write_index()', one for the shared index
and one for the split index.  The cache tree is only written once, to
the split index.

As we only loop through the elements that are effectively being
written by the current invocation, that may not include the entry with
a null oid in the split index (when it is already written to the
shared index), where we write the cache tree.  Therefore in split
index mode we may still end up writing the cache tree, even though
there is an entry with a null oid in the index.

Fix this by checking for null oids in prepare_to_write_split_index,
where we loop the entries of the shared index as well as the entries for
the split index.

This fixes t7009 with GIT_TEST_SPLIT_INDEX.  Also add a new test that's
more specifically showing the problem.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19 10:36:39 -08:00
Thomas Gummerer
a125a22334 read-cache: fix reading the shared index for other repos
read_index_from() takes a path argument for the location of the index
file.  For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.

This works as long as an index in the_repository is read.  Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.

For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.

be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.

Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.

read_cache_from() defaults to using the gitdir of the_repository.  As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().

Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19 10:36:34 -08:00
Christian Couder
ea6577303f sha1_file: remove static strbuf from sha1_file_name()
Using a static buffer in sha1_file_name() is error prone
and the performance improvements it gives are not needed
in many of the callers.

So let's get rid of this static buffer and, if necessary
or helpful, let's use one in the caller.

Suggested-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-17 12:21:32 -08:00
Junio C Hamano
b6825b5c8e Merge branch 'ew/empty-merge-with-dirty-index-maint' into ew/empty-merge-with-dirty-index
* ew/empty-merge-with-dirty-index-maint:
  merge-recursive: avoid incorporating uncommitted changes in a merge
  move index_has_changes() from builtin/am.c to merge.c for reuse
  t6044: recursive can silently incorporate dirty changes in a merge
2017-12-22 12:48:38 -08:00
Elijah Newren
b101793c43 move index_has_changes() from builtin/am.c to merge.c for reuse
index_has_changes() is a function we want to reuse outside of just am,
making it also available for merge-recursive and merge-ort.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-22 12:20:29 -08:00
Junio C Hamano
0c69a132cb Merge branch 'ls/editor-waiting-message'
Git shows a message to tell the user that it is waiting for the
user to finish editing when spawning an editor, in case the editor
opens to a hidden window or somewhere obscure and the user gets
lost.

* ls/editor-waiting-message:
  launch_editor(): indicate that Git waits for user input
  refactor "dumb" terminal determination
2017-12-19 11:33:59 -08:00
Junio C Hamano
8d7fefaac4 Merge branch 'ar/unconfuse-three-dots'
Ancient part of codebase still shows dots after an abbreviated
object name just to show that it is not a full object name, but
these ellipses are confusing to people who newly discovered Git
who are used to seeing abbreviated object names and find them
confusing with the range syntax.

* ar/unconfuse-three-dots:
  t2020: test variations that matter
  t4013: test new output from diff --abbrev --raw
  diff: diff_aligned_abbrev: remove ellipsis after abbreviated SHA-1 value
  t4013: prepare for upcoming "diff --raw --abbrev" output format change
  checkout: describe_detached_head: remove ellipsis after committish
  print_sha1_ellipsis: introduce helper
  Documentation: user-manual: limit usage of ellipsis
  Documentation: revisions: fix typo: "three dot" ---> "three-dot" (in line with "two-dot").
2017-12-19 11:33:58 -08:00
Junio C Hamano
721cc4314c Merge branch 'bc/hash-algo'
An infrastructure to define what hash function is used in Git is
introduced, and an effort to plumb that throughout various
codepaths has been started.

* bc/hash-algo:
  repository: fix a sparse 'using integer as NULL pointer' warning
  Switch empty tree and blob lookups to use hash abstraction
  Integrate hash algorithm support with repo setup
  Add structure representing hash algorithm
  setup: expose enumerated repo info
2017-12-13 13:28:54 -08:00
Jeff Hostetler
1e1e39b308 partial-clone: define partial clone settings in config
Create get and set routines for "partial clone" config settings.
These will be used in a future commit by clone and fetch to
remember the promisor remote and the default filter-spec.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:58:51 -08:00
Jonathan Tan
8b4c0103a9 sha1_file: support lazily fetching missing objects
Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Junio C Hamano
4c6dad0059 Merge branch 'bw/protocol-v1'
A new mechanism to upgrade the wire protocol in place is proposed
and demonstrated that it works with the older versions of Git
without harming them.

* bw/protocol-v1:
  Documentation: document Extra Parameters
  ssh: introduce a 'simple' ssh variant
  i5700: add interop test for protocol transition
  http: tell server that the client understands v1
  connect: tell server that the client understands v1
  connect: teach client to recognize v1 server response
  upload-pack, receive-pack: introduce protocol version 1
  daemon: recognize hidden request arguments
  protocol: introduce protocol extension mechanisms
  pkt-line: add packet_write function
  connect: in ref advertisement, shallows are last
2017-12-06 09:23:44 -08:00
Jonathan Tan
498f1f61f1 fsck: introduce partialclone extension
Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Jonathan Tan
75b97fec17 extension.partialclone: introduce partial clone extension
Introduce new repository extension option:
    `extensions.partialclone`

See the update to Documentation/technical/repository-version.txt
in this patch for more information.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Lars Schneider
a64f213d3f refactor "dumb" terminal determination
Move the code to detect "dumb" terminals into a single location. This
avoids duplicating the terminal detection code yet again in a subsequent
commit.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-04 09:38:30 -08:00
Ann T Ropea
a2cd709de3 print_sha1_ellipsis: introduce helper
Introduce a helper print_sha1_ellipsis() that pays attention to the
GIT_PRINT_SHA1_ELLIPSIS environment variable, and prepare the tests to
unconditionally set it for the test pieces that will be broken once the code
stops showing the extra dots by default.

The removal of these dots is merely a plan at this step and has not happened
yet but soon will.

Document GIT_PRINT_SHA1_ELLIPSIS.

Signed-off-by: Ann T Ropea <bedhanger@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-04 08:25:35 -08:00
Junio C Hamano
af6e0fe3a5 Merge branch 'tb/add-renormalize'
"git add --renormalize ." is a new and safer way to record the fact
that you are correcting the end-of-line convention and other
"convert_to_git()" glitches in the in-repository data.

* tb/add-renormalize:
  add: introduce "--renormalize"
2017-11-27 11:06:37 +09:00
Junio C Hamano
c9fdbca92c Merge branch 'av/fsmonitor'
Various fixes to bp/fsmonitor topic.

* av/fsmonitor:
  fsmonitor: simplify determining the git worktree under Windows
  fsmonitor: store fsmonitor bitmap before splitting index
  fsmonitor: read from getcwd(), not the PWD environment variable
  fsmonitor: delay updating state until after split index is merged
  fsmonitor: document GIT_TRACE_FSMONITOR
  fsmonitor: don't bother pretty-printing JSON from watchman
  fsmonitor: set the PWD to the top of the working tree
2017-11-21 14:07:51 +09:00
Junio C Hamano
e05336bdda Merge branch 'bp/fsmonitor'
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.

* bp/fsmonitor:
  fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
  fsmonitor: read entirety of watchman output
  fsmonitor: MINGW support for watchman integration
  fsmonitor: add a performance test
  fsmonitor: add a sample integration script for Watchman
  fsmonitor: add test cases for fsmonitor extension
  split-index: disable the fsmonitor extension when running the split index test
  fsmonitor: add a test tool to dump the index extension
  update-index: add fsmonitor support to update-index
  ls-files: Add support in ls-files to display the fsmonitor valid bit
  fsmonitor: add documentation for the fsmonitor extension.
  fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  update-index: add a new --force-write-index option
  preload-index: add override to enable testing preload-index
  bswap: add 64 bit endianness helper get_be64
2017-11-21 14:07:50 +09:00
Torsten Bögershausen
9472935d81 add: introduce "--renormalize"
Make it safer to normalize the line endings in a repository.
Files that had been commited with CRLF will be commited with LF.

The old way to normalize a repo was like this:

 # Make sure that there are not untracked files
 $ echo "* text=auto" >.gitattributes
 $ git read-tree --empty
 $ git add .
 $ git commit -m "Introduce end-of-line normalization"

The user must make sure that there are no untracked files,
otherwise they would have been added and tracked from now on.

The new "add --renormalize" does not add untracked files:

 $ echo "* text=auto" >.gitattributes
 $ git add --renormalize .
 $ git commit -m "Introduce end-of-line normalization"

Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".

While at it, document that the same renormalization may be needed,
whenever a clean filter is added or changed.

Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-17 10:31:05 +09:00
Junio C Hamano
e539a83455 Merge branch 'bp/read-index-from-skip-verification'
Drop (perhaps overly cautious) sanity check before using the index
read from the filesystem at runtime.

* bp/read-index-from-skip-verification:
  read_index_from(): speed index loading by skipping verification of the entry order
2017-11-15 12:14:37 +09:00
brian m. carlson
eb0ccfd7f5 Switch empty tree and blob lookups to use hash abstraction
Switch the uses of empty_tree_oid and empty_blob_oid to use the
current_hash abstraction that represents the current hash algorithm in
use.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-13 13:20:44 +09:00
brian m. carlson
78a6766802 Integrate hash algorithm support with repo setup
In future versions of Git, we plan to support an additional hash
algorithm.  Integrate the enumeration of hash algorithms with repository
setup, and store a pointer to the enumerated data in struct repository.
Of course, we currently only support SHA-1, so hard-code this value in
read_repository_format.  In the future, we'll enumerate this value from
the configuration.

Add a constant, the_hash_algo, which points to the hash_algo structure
pointer in the repository global.  Note that this is the hash which is
used to serialize data to disk, not the hash which is used to display
items to the user.  The transition plan anticipates that these may be
different.  We can add an additional element in the future (say,
ui_hash_algo) to provide for this case.

Include repository.h in cache.h since we now need to have access to
these struct and variable definitions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-13 13:20:44 +09:00
Junio C Hamano
bde1370010 Merge branch 'rs/hex-to-bytes-cleanup'
Code cleanup.

* rs/hex-to-bytes-cleanup:
  sha1_file: use hex_to_bytes()
  http-push: use hex_to_bytes()
  notes: move hex_to_bytes() to hex.c and export it
2017-11-09 14:31:27 +09:00
Ben Peart
00ec50e56d read_index_from(): speed index loading by skipping verification of the entry order
There is code in post_read_index_from() to catch out of order
entries when reading an index file.  This order verification is ~13%
of the cost of every call to read_index_from().

Update check_ce_order() so that it skips this verification unless
the "verify_ce_order" global variable is set.

Teach fsck to force this verification.

The effect can be seen using t/perf/p0002-read-cache.sh:

Test                                          HEAD              HEAD~1
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.41(0.04+0.04)   0.50(0.00+0.10) +22.0%

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-08 10:39:41 +09:00
Junio C Hamano
0b646bcac9 Merge branch 'ma/lockfile-fixes'
An earlier update made it possible to use an on-stack in-core
lockfile structure (as opposed to having to deliberately leak an
on-heap one).  Many codepaths have been updated to take advantage
of this new facility.

* ma/lockfile-fixes:
  read_cache: roll back lock in `update_index_if_able()`
  read-cache: leave lock in right state in `write_locked_index()`
  read-cache: drop explicit `CLOSE_LOCK`-flag
  cache.h: document `write_locked_index()`
  apply: remove `newfd` from `struct apply_state`
  apply: move lockfile into `apply_state`
  cache-tree: simplify locking logic
  checkout-index: simplify locking logic
  tempfile: fix documentation on `delete_tempfile()`
  lockfile: fix documentation on `close_lock_file_gently()`
  treewide: prefer lockfiles on the stack
  sha1_file: do not leak `lock_file`
2017-11-06 13:11:21 +09:00
Alex Vandiver
ba1b9caca6 fsmonitor: delay updating state until after split index is merged
If the fsmonitor extension is used in conjunction with the split index
extension, the set of entries in the index when it is first loaded is
only a subset of the real index.  This leads to only the non-"base"
index being marked as CE_FSMONITOR_VALID.

Delay the expansion of the ewah bitmap until after tweak_split_index
has been called to merge in the base index as well.

The new fsmonitor_dirty is kept from being leaked by dint of being
cleaned up in post_read_index_from, which is guaranteed to be called
after do_read_index in read_index_from.

Signed-off-by: Alex Vandiver <alexmv@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 13:28:20 +09:00
René Scharfe
0ec218656a notes: move hex_to_bytes() to hex.c and export it
Make the function for converting pairs of hexadecimal digits to binary
available to other call sites.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 10:35:35 +09:00
Brandon Williams
19113a26b6 http: tell server that the client understands v1
Tell a server that protocol v1 can be used by sending the http header
'Git-Protocol' with 'version=1' indicating this.

Also teach the apache http server to pass through the 'Git-Protocol'
header as an environment variable 'GIT_PROTOCOL'.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-17 10:51:29 +09:00
Brandon Williams
373d70efb2 protocol: introduce protocol extension mechanisms
Create protocol.{c,h} and provide functions which future servers and
clients can use to determine which protocol to use or is being used.

Also introduce the 'GIT_PROTOCOL' environment variable which will be
used to communicate a colon separated list of keys with optional values
to a server.  Unknown keys and values must be tolerated.  This mechanism
is used to communicate which version of the wire protocol a client would
like to use with a server.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-17 10:51:29 +09:00
Martin Ågren
b74c90fb41 read_cache: roll back lock in update_index_if_able()
`update_index_if_able()` used to always commit the lock or roll it back.
Commit 03b866477 (read-cache: new API write_locked_index instead of
write_index/write_cache, 2014-06-13) stopped rolling it back in case a
write was not even attempted. This change in behavior is not motivated
in the commit message and appears to be accidental: the `else`-path was
removed, although that changed the behavior in case the `if` shortcuts.

Reintroduce the rollback and document this behavior. While at it, move
the documentation on this function from the function definition to the
function declaration in cache.h.

If `write_locked_index(..., COMMIT_LOCK)` fails, it will roll back the
lock for us (see the previous commit).

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07 10:20:56 +09:00
Martin Ågren
df60cf5789 read-cache: leave lock in right state in write_locked_index()
If the original version of `write_locked_index()` returned with an
error, it didn't roll back the lockfile unless the error occured at the
very end, during closing/committing. See commit 03b866477 (read-cache:
new API write_locked_index instead of write_index/write_cache,
2014-06-13).

In commit 9f41c7a6b (read-cache: close index.lock in do_write_index,
2017-04-26), we learned to close the lock slightly earlier in the
callstack. That was mostly a side-effect of lockfiles being implemented
using temporary files, but didn't cause any real harm.

Recently, commit 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05) introduced a subtle bug. If the temporary file is deleted
(i.e., the lockfile is rolled back), the tempfile-pointer in the `struct
lock_file` will be left dangling. Thus, an attempt to reuse the
lockfile, or even just to roll it back, will induce undefined behavior
-- most likely a crash.

Besides not crashing, we clearly want to make things consistent. The
guarantees which the lockfile-machinery itself provides is A) if we ask
to commit and it fails, roll back, and B) if we ask to close and it
fails, do _not_ roll back. Let's do the same for consistency.

Do not delete the temporary file in `do_write_index()`. One of its
callers, `write_locked_index()` will thereby avoid rolling back the
lock. The other caller, `write_shared_index()`, will delete its
temporary file anyway. Both of these callers will avoid undefined
behavior (crashing).

Teach `write_locked_index(..., COMMIT_LOCK)` to roll back the lock
before returning. If we have already succeeded and committed, it will be
a noop. Simplify the existing callers where we now have a superfluous
call to `rollback_lockfile()`. That should keep future readers from
wondering why the callers are inconsistent.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07 10:20:56 +09:00
Martin Ågren
812d6b0075 read-cache: drop explicit CLOSE_LOCK-flag
`write_locked_index()` takes two flags: `COMMIT_LOCK` and `CLOSE_LOCK`.
At most one is allowed. But it is also possible to use no flag, i.e.,
`0`. But when `write_locked_index()` calls `do_write_index()`, the
temporary file, a.k.a. the lockfile, will be closed. So passing `0` is
effectively the same as `CLOSE_LOCK`, which seems like a bug.

We might feel tempted to restructure the code in order to close the file
later, or conditionally. It also feels a bit unfortunate that we simply
"happen" to close the lock by way of an implementation detail of
lockfiles. But note that we need to close the temporary file before
`stat`-ing it, at least on Windows. See 9f41c7a6b (read-cache: close
index.lock in do_write_index, 2017-04-26).

Drop `CLOSE_LOCK` and make it explicit that `write_locked_index()`
always closes the lock. Whether it is also committed is governed by the
remaining flag, `COMMIT_LOCK`.

This means we neither have nor suggest that we have a mode to write the
index and leave the file open. Whatever extra contents we might
eventually want to write, we should probably write it from within
`write_locked_index()` itself anyway.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07 10:20:56 +09:00
Martin Ågren
8dc3834610 cache.h: document write_locked_index()
The next patches will tweak the behavior of this function. Document it
in order to establish a basis for those patches.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 10:07:18 +09:00
Junio C Hamano
d4e93836a6 Merge branch 'jk/no-optional-locks'
Some commands (most notably "git status") makes an opportunistic
update when performing a read-only operation to help optimize later
operations in the same repository.  The new "--no-optional-locks"
option can be passed to Git to disable them.

* jk/no-optional-locks:
  git: add --no-optional-locks option
2017-10-03 15:42:49 +09:00
Ben Peart
883e248b8a fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.

We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.

In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.

To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.

Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01 17:23:01 +09:00
Junio C Hamano
14a8168e2f Merge branch 'rj/no-sign-compare'
Many codepaths have been updated to squelch -Wsign-compare
warnings.

* rj/no-sign-compare:
  ALLOC_GROW: avoid -Wsign-compare warnings
  cache.h: hex2chr() - avoid -Wsign-compare warnings
  commit-slab.h: avoid -Wsign-compare warnings
  git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
2017-09-29 11:23:42 +09:00
Jeff King
27344d6a6c git: add --no-optional-locks option
Some tools like IDEs or fancy editors may periodically run
commands like "git status" in the background to keep track
of the state of the repository. Some of these commands may
refresh the index and write out the result in an
opportunistic way: if they can get the index lock, then they
update the on-disk index with any updates they find. And if
not, then their in-core refresh is lost and just has to be
recomputed by the next caller.

But taking the index lock may conflict with other operations
in the repository. Especially ones that the user is doing
themselves, which _aren't_ opportunistic. In other words,
"git status" knows how to back off when somebody else is
holding the lock, but other commands don't know that status
would be happy to drop the lock if somebody else wanted it.

There are a couple possible solutions:

  1. Have some kind of "pseudo-lock" that allows other
     commands to tell status that they want the lock.

     This is likely to be complicated and error-prone to
     implement (and maybe even impossible with just
     dotlocks to work from, as it requires some
     inter-process communication).

  2. Avoid background runs of commands like "git status"
     that want to do opportunistic updates, preferring
     instead plumbing like diff-files, etc.

     This is awkward for a couple of reasons. One is that
     "status --porcelain" reports a lot more about the
     repository state than is available from individual
     plumbing commands. And two is that we actually _do_
     want to see the refreshed index. We just don't want to
     take a lock or write out the result. Whereas commands
     like diff-files expect us to refresh the index
     separately and write it to disk so that they can depend
     on the result. But that write is exactly what we're
     trying to avoid.

  3. Ask "status" not to lock or write the index.

     This is easy to implement. The big downside is that any
     work done in refreshing the index for such a call is
     lost when the process exits. So a background process
     may end up re-hashing a changed file multiple times
     until the user runs a command that does an index
     refresh themselves.

This patch implements the option 3. The idea (and the test)
is largely stolen from a Git for Windows patch by Johannes
Schindelin, 67e5ce7f63 (status: offer *not* to lock the
index and update it, 2016-08-12). The twist here is that
instead of making this an option to "git status", it becomes
a "git" option and matching environment variable.

The reason there is two-fold:

  1. An environment variable is carried through to
     sub-processes. And whether an invocation is a
     background process or not should apply to the whole
     process tree. So you could do "git --no-optional-locks
     foo", and if "foo" is a script or alias that calls
     "status", you'll still get the effect.

  2. There may be other programs that want the same
     treatment.

     I've punted here on finding more callers to convert,
     since "status" is the obvious one to call as a repeated
     background job. But "git diff"'s opportunistic refresh
     of the index may be a good candidate.

The test is taken from 67e5ce7f63, and it's worth repeating
Johannes's explanation:

  Note that the regression test added in this commit does
  not *really* verify that no index.lock file was written;
  that test is not possible in a portable way. Instead, we
  verify that .git/index is rewritten *only* when `git
  status` is run without `--no-optional-locks`.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-27 16:11:01 +09:00
Ramsay Jones
356a293f39 cache.h: hex2chr() - avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 13:00:38 +09:00
Jonathan Nieder
607bd8315c pack: make packed_git_mru global a value instead of a pointer
The MRU cache that keeps track of recently used packs is represented
using two global variables:

	struct mru packed_git_mru_storage;
	struct mru *packed_git_mru = &packed_git_mru_storage;

Callers never assign to the packed_git_mru pointer, though, so we can
simplify by eliminating it and using &packed_git_mru_storage (renamed
to &packed_git_mru) directly.  This variable is only used by the
packfile subsystem, making this a relatively uninvasive change (and
any new unadapted callers would trigger a compile error).

Noticed while moving these globals to the object_store struct.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14 15:05:48 +09:00
Junio C Hamano
f04f860dfa Merge branch 'sb/sha1-file-cleanup' into maint
Code clean-up.

* sb/sha1-file-cleanup:
  sha1_file: make read_info_alternates static
2017-09-10 17:03:04 +09:00
Junio C Hamano
822a4d4178 Merge branch 'jk/hashcmp-memcmp' into maint
Code clean-up.

* jk/hashcmp-memcmp:
  hashcmp: use memcmp instead of open-coded loop
2017-09-10 17:02:59 +09:00
Junio C Hamano
eabdcd4ab4 Merge branch 'jt/packmigrate'
Code movement to make it easier to hack later.

* jt/packmigrate: (23 commits)
  pack: move for_each_packed_object()
  pack: move has_pack_index()
  pack: move has_sha1_pack()
  pack: move find_pack_entry() and make it global
  pack: move find_sha1_pack()
  pack: move find_pack_entry_one(), is_pack_valid()
  pack: move check_pack_index_ptr(), nth_packed_object_offset()
  pack: move nth_packed_object_{sha1,oid}
  pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry()
  pack: move unpack_object_header()
  pack: move get_size_from_delta()
  pack: move unpack_object_header_buffer()
  pack: move {,re}prepare_packed_git and approximate_object_count
  pack: move install_packed_git()
  pack: move add_packed_git()
  pack: move unuse_pack()
  pack: move use_pack()
  pack: move pack-closing functions
  pack: move release_pack_memory()
  pack: move open_pack_index(), parse_pack_index()
  ...
2017-08-26 22:55:09 -07:00
Junio C Hamano
6b8aa3294e Merge branch 'po/object-id'
* po/object-id:
  sha1_file: convert index_stream to struct object_id
  sha1_file: convert hash_sha1_file_literally to struct object_id
  sha1_file: convert index_fd to struct object_id
  sha1_file: convert index_path to struct object_id
  read-cache: convert to struct object_id
  builtin/hash-object: convert to struct object_id
2017-08-26 22:55:07 -07:00
Junio C Hamano
b6c4058f97 Merge branch 'sb/diff-color-move'
"git diff" has been taught to optionally paint new lines that are
the same as deleted lines elsewhere differently from genuinely new
lines.

* sb/diff-color-move: (25 commits)
  diff: document the new --color-moved setting
  diff.c: add dimming to moved line detection
  diff.c: color moved lines differently, plain mode
  diff.c: color moved lines differently
  diff.c: buffer all output if asked to
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP
  diff.c: convert word diffing to use emit_diff_symbol
  diff.c: convert show_stats to use emit_diff_symbol
  diff.c: convert emit_binary_diff_body to use emit_diff_symbol
  submodule.c: migrate diff output to use emit_diff_symbol
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF
  diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS}
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN]
  diff.c: migrate emit_line_checked to use emit_diff_symbol
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF
  diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO
  ...
2017-08-26 22:55:03 -07:00
Jonathan Tan
7709f468fd pack: move for_each_packed_object()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
f9a8672a81 pack: move has_pack_index()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
150e3001d0 pack: move has_sha1_pack()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
d6fe0036fd pack: move find_sha1_pack()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
a2551953b9 pack: move find_pack_entry_one(), is_pack_valid()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
9e0f45f5a6 pack: move check_pack_index_ptr(), nth_packed_object_offset()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
d5a1676182 pack: move nth_packed_object_{sha1,oid}
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
f1d8130be0 pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry()
Both sha1_file.c and packfile.c now need read_object(), so a copy of
read_object() was created in packfile.c.

This patch makes both mark_bad_packed_object() and has_packed_and_bad()
global. Unlike most of the other patches in this series, these 2
functions need to remain global.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
3588dd6e99 pack: move unpack_object_header()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
7b3aa75df7 pack: move get_size_from_delta()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
32b42e152f pack: move unpack_object_header_buffer()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
0abe14f6a5 pack: move {,re}prepare_packed_git and approximate_object_count
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
e65f186242 pack: move install_packed_git()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
9a42865374 pack: move add_packed_git()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
97de1803f8 pack: move unuse_pack()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Jonathan Tan
84f80ad5e1 pack: move use_pack()
The function open_packed_git() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
Jonathan Tan
3836d88ae5 pack: move pack-closing functions
The function close_pack_fd() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
Jonathan Tan
0317f45576 pack: move open_pack_index(), parse_pack_index()
alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a
subsequent commit, alloc_packed_git() will be removed from sha1_file.c.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
Jonathan Tan
8e21176c3c pack: move pack_report()
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
Jonathan Tan
4f39cd821d pack: move pack name-related functions
Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.

Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c. It has a corresponding header packfile.h.

In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
Junio C Hamano
df2dd28316 Merge branch 'jt/subprocess-handshake' into maint
Code cleanup.

* jt/subprocess-handshake:
  sub-process: refactor handshake to common function
  Documentation: migrate sub-process docs to header
  convert: add "status=delayed" to filter process protocol
  convert: refactor capabilities negotiation
  convert: move multiple file filter error handling to separate function
  convert: put the flags field before the flag itself for consistent style
  t0021: write "OUT <size>" only on success
  t0021: make debug log file name configurable
  t0021: keep filter log files on comparison
2017-08-23 14:33:52 -07:00
Junio C Hamano
3830759c1c Merge branch 'sb/sha1-file-cleanup'
Code clean-up.

* sb/sha1-file-cleanup:
  sha1_file: make read_info_alternates static
2017-08-23 14:13:10 -07:00
Junio C Hamano
e45bbfc584 Merge branch 'jk/hashcmp-memcmp'
Code clean-up.

* jk/hashcmp-memcmp:
  hashcmp: use memcmp instead of open-coded loop
2017-08-22 10:29:09 -07:00
Junio C Hamano
5aa0b6c506 Merge branch 'bw/grep-recurse-submodules'
"git grep --recurse-submodules" has been reworked to give a more
consistent output across submodule boundary (and do its thing
without having to fork a separate process).

* bw/grep-recurse-submodules:
  grep: recurse in-process using 'struct repository'
  submodule: merge repo_read_gitmodules and gitmodules_config
  submodule: check for unmerged .gitmodules outside of config parsing
  submodule: check for unstaged .gitmodules outside of config parsing
  submodule: remove fetch.recursesubmodules from submodule-config parsing
  submodule: remove submodule.fetchjobs from submodule-config parsing
  config: add config_from_gitmodules
  cache.h: add GITMODULES_FILE macro
  repository: have the_repository use the_index
  repo_read_index: don't discard the index
2017-08-22 10:29:01 -07:00
Patryk Obara
da77611d73 sha1_file: convert hash_sha1_file_literally to struct object_id
Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20 21:52:53 -07:00
Patryk Obara
e3506559d4 sha1_file: convert index_fd to struct object_id
Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20 21:52:08 -07:00
Patryk Obara
98e019b067 sha1_file: convert index_path to struct object_id
Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20 21:51:38 -07:00
Stefan Beller
2456990dfd sha1_file: make read_info_alternates static
read_info_alternates is not used from outside, so let's make it static.

We have to declare the function before link_alt_odb_entry instead of
moving the code around, link_alt_odb_entry calls read_info_alternates,
which in turn calls link_alt_odb_entry.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-15 14:39:25 -07:00
Junio C Hamano
51b8aecabe Merge branch 'ls/filter-process-delayed'
The filter-process interface learned to allow a process with long
latency give a "delayed" response.

* ls/filter-process-delayed:
  convert: add "status=delayed" to filter process protocol
  convert: refactor capabilities negotiation
  convert: move multiple file filter error handling to separate function
  convert: put the flags field before the flag itself for consistent style
  t0021: write "OUT <size>" only on success
  t0021: make debug log file name configurable
  t0021: keep filter log files on comparison
2017-08-11 13:27:00 -07:00
Junio C Hamano
df422678a8 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id:
  sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ
  sha1_name: convert GET_SHA1* flags to GET_OID*
  sha1_name: convert get_sha1* to get_oid*
  Convert remaining callers of get_sha1 to get_oid.
  builtin/unpack-file: convert to struct object_id
  bisect: convert bisect_checkout to struct object_id
  builtin/update_ref: convert to struct object_id
  sequencer: convert to struct object_id
  remote: convert struct push_cas to struct object_id
  submodule: convert submodule config lookup to use object_id
  builtin/merge-tree: convert remaining caller of get_sha1 to object_id
  builtin/fsck: convert remaining caller of get_sha1 to object_id
2017-08-11 13:26:55 -07:00
Jeff King
0b006014c8 hashcmp: use memcmp instead of open-coded loop
In 1a812f3a70 (hashcmp(): inline memcmp() by hand to
optimize, 2011-04-28), it was reported that an open-coded
loop outperformed memcmp() for comparing sha1s.

Discussion[1] a few years later in 2013 showed that this
depends on your libc's version of memcmp(). In particular,
glibc 2.13 optimized their memcmp around 2011. Here are
current timings with glibc 2.24 (best-of-five, on
linux.git):

  [before this patch, open-coded]
  $ time git rev-list --objects --all
  real	0m35.357s
  user	0m35.016s
  sys	0m0.340s

  [after this patch, memcmp]
  real	0m32.930s
  user	0m32.630s
  sys	0m0.300s

Now that we've had 6 years for that version of glibc to
make its way onto people's machines, it's worth revisiting
our benchmarks and switching to memcmp().

It may be that there are other non-glibc systems where
memcmp() isn't as well optimized. But since our single data
point in favor of open-coding was on a now-ancient glibc, we
should probably assume the system memcmp is good unless
proven otherwise. We may end up with a SLOW_MEMCMP Makefile
knob, but we can hold off on that until we actually find
such a system in practice.

[1] https://public-inbox.org/git/20130318073229.GA5551@sigill.intra.peff.net/

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-09 11:03:25 -07:00
Junio C Hamano
230ce07d13 Git 2.13.5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZgNa+AAoJELC16IaWr+bLaMIP/1tHYbkQ/iMvYE8RpV5SXZOC
 nKm8IHP4Hu+05gmp874Gw3XtF+FELC53Q2nMc3L4mJ/ZSjuJOuein9aVisapBluw
 IZ8UaxmgN1NUA8gDVkXULNMJGDaOQw+VckMrEAI3A0uYXGY2eAiHR3Q+p0txhHb9
 jfhSsnl7Rv3q6LeDOPMKpwPVT0+uxBklrli7YcIn9IssbQhAvDUpbZ0Ab/fEOH6j
 NDIsZZ8opEESsUE5WBCOVXKUYjZOpLLpU4dQXa+JBj019LRmUYxLgjGVt2BSuUh/
 K8xe6/3P1FOQF1tMY4Bjb2iIUnc0wzIQYULn9dqJthV0Ybz0qwT5bTt4IYYKs86I
 /XjJPI9cAQHNirafyUyTrWy95HGnvYSyvmNC4a2ElvD24i/GKCuRQY7O5MCT3fjB
 5jUH2VxxA5E1TvkeG4VHl0d8WZib+/4CWd0OwSXk9LJJC/C/OTUlBa2dakOpwtgS
 RNGM+8+gzzd5rv1/UL+vAiqtCYjDfU+uqsjP5fRnMyTZiCmbhRcdW9b1TRc4OMoe
 wpbSbz0L18IAsyqZ+KLhyZOCr5mxjrVCxV++efI+NhsRecmO5nbPNtRGKf7/AtAQ
 +e5hROZRSFwf8/bXoobcOvhpuvW36+0mVXxIOGIoYtXB6AdtvGFXi9TnC/rTLBZG
 zuj/z2fmgo3F0G2tnNxk
 =t0hU
 -----END PGP SIGNATURE-----

Merge tag 'v2.13.5' into maint
2017-08-04 12:40:37 -07:00
Brandon Williams
f9ee2fcdfa grep: recurse in-process using 'struct repository'
Convert grep to use 'struct repository' which enables recursing into
submodules to be handled in-process.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-02 14:26:46 -07:00
Brandon Williams
4c0eeafe47 cache.h: add GITMODULES_FILE macro
Add a macro to be used when specifying the '.gitmodules' file and
convert any existing hard coded '.gitmodules' file strings to use the
new macro.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-02 14:26:46 -07:00
Junio C Hamano
e312af164c Git 2.12.4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZflhUAAoJELC16IaWr+bLDyAP/jWDc9ic8S1ZH8W4ijAB24vP
 YRyQ1gbnRLhpEpbHYCUp7Uw9mrJBfdwYFlqxGJPH4JZL9qYLJUe5DJMWi5uAEptg
 tYPpPMLV5hgvGICwJbOaS5NlNf2NzLjRvzziOpUnE5CcR5Bw7doCPk4Uw6AVvAvK
 0x/6KDNLdKCBl3ZIoLdp9eW2PrTfYx6AK+Wf9oEgdMSB9+23acL7R/QEmH7oh9gl
 BS0riRQVHnku5akybMnRjeba7SvdhJlIV8rPc4WpuMRz0g2lPzOKQ+okeRtdQrfi
 REdEZ920EJR65KtxUgxYLrpPpmdRBxNI0jXC3Sm2Kac85MLvjFqhaosBWhTQuoOf
 tra68Gb9WSVkKLwRhRBYOG+dx00m1UETs7cYm6pw37RiMss1pcZWNdzjNNouVEEp
 3LBXcPJSpCbEjI+U/H2CqLqCk9gMfKLJXB9hK4b9jBcB9yrON2d75tPMhOcNx+Ej
 x6vZ4Zql6r1Bhe8y7T6KMnLe6vdli8Vrd7Tj5btogcEUmVfRQVHZzV94utevv9A5
 UEXLeCjJSjcY7rYtTdSLXgESioHW8WNfG+TPiyxjujSybtxGKmkcrSGCrugT26K8
 UT5VH2mYJOuHRtWnjWEEEhjayaXLv0mHNQ5XVfNDNPEFqRBQmIhLhcIf/aOF6r+F
 4Q6qN9QceJUEiaFnHsyO
 =ZBXN
 -----END PGP SIGNATURE-----

Merge tag 'v2.12.4' into maint
2017-08-01 12:27:31 -07:00
Junio C Hamano
3def5e9a8d Git 2.11.3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZfleHAAoJELC16IaWr+bLxB8QANsdFCtO+/PFnda2CmadVt/J
 d4AGMSu+cD74aUp5wzMscROCggn3vMHVeDMdVJ3ihcY6nLjJRy0EC/VJ5yTpSGli
 iq2GjmoH/oTS2tq2JWbTe86VMVYAzuWlWyowwH6OymDLkBQcAOap1WfUHTmKehUi
 BV2br1x15c7hRGToFqN8yed39iVmQoDJ5ETTBgFqkVyVHDdlyc81FRt0RfiA2x3N
 nm5/gOOWvH5X4Cyu7yP2C9GSV9p1mufEtw1DNwp+MV3n3wa2P4wJeNnYYmW85hpS
 ZzuWEM9QcU3fbShHxHcwHCyy2imXUUsfm1/Y6rCH3ZVSzo1icz5ghL2rnmcxdZvS
 JMp60EKbaapUiIkI23R2Yvlh81J5frwOp739DYytlai3rZF7le9KYGQnsUrv95Ie
 CvFGr3Btiy3oEVOP7xRiGnGtThmVRP4mFsIIIgf3YsBJqRXRwxqn1D6jbkHBqu7z
 VfFnpp63BsKY59Udo1qilkxS2qQ35gAS+TNczPV9D0m3n3bZ5UXEMuonahAE5YwG
 d20wBNOd86oK4khtMWcxXx4BBx+tlA99FfQOgxvn3XWnHmTAJE3+L0uEajZpEpcU
 gkHLo0EutMY+xmX9+jwszmBS9gNL9xzFADtAoYIoAsmpaD7jBJsTjwyzstTyXLvr
 5jcZT/hyX4iZtOUlC67J
 =fCBm
 -----END PGP SIGNATURE-----

Merge tag 'v2.11.3' into maint-2.12

Git 2.11.3
2017-07-30 15:04:22 -07:00
Junio C Hamano
05bb78abc1 Git 2.10.4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZflbsAAoJELC16IaWr+bLUUAQAONDi4Ty/I9K79Nwv9/15HcV
 4oMfCC56Y8nLUs83GsS0aZadX15iABsOACtZUA0kPQB8PV2XYcUnM6rPFtpeovRI
 sfs0XqfK5l+cVDoMMb2sGAfQYEBIXRy8sUb1EXIuJ4MzHxfRbmm1sKd7ko3lg6hN
 JHhGsNpzIVRspuUZh+yXp0Qa8CKKnekhwEntVd5b71eahG3lJNBO7UXvDAkDyl33
 amoc5eqKdoGvjs3yYBvOV0qX8ePV53wieKwL5uBG6LdjMrjtWpLJOuMk6IYR18Sm
 ++A+WiCb14lQ/6Wfu+r7WhjaWIXHHMPV/5YMhm1OzrWKiw+DuucLVaorl3cSPA2G
 zNPoHGUGxfnKz0NLiMkpbjUfB0gYqqLKts5pcnKeTconUcLZlpYKEYNpypfgbJyr
 XvIgkjAt3KwRa8mrGvCURkelmYKzFzd+hZdxvXiJ/flk4CcssgMgYorWCMwwy86a
 uErlgWDcGh9wtV9Pwy8M7EwXcRDggBND5jqH2dpFUaQ+8Kzm11lX5BRseZIOASzL
 ++MuZGEQiETz2HkWb+DWMIDAJMej2N2DF1eq7DnsmEUZgOarf2ZP3Lsd84W43WLI
 PdLhA1zpL2YVz9EEeFT/hLSX3fC16+lkeVQhtV5pJlIiLumHOdWYBElsnX694Nv3
 JTE4X1l38kCBQ4on8eEo
 =R9MM
 -----END PGP SIGNATURE-----

Merge tag 'v2.10.4' into maint-2.11

Git 2.10.4
2017-07-30 15:01:31 -07:00
Junio C Hamano
d78f06a1b7 Git 2.9.5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZflVaAAoJELC16IaWr+bL/5QP/1NoUGqrwB+zwJ8+oDqd+Djl
 PX8qyafoMXJr/w/fACk8r/tCSGKgK8Gx9FqZ9GIBCAZVNXkQnheRElOjiuRg4rbl
 +USiN2XM4ue/X7GqEBc7YVAmd+ifFFQ+ckm1g72A53B4Qh4/Ca4MnPYLOi7eKfC1
 85f+/zMj/5pYsmboFZzFiUPq+Khyb2e85Mm9ok+l/8zAXt4ER5cf4mhY3KSEtnfA
 6qGVUJ3fS9FzE4ud+/cx2qidsTrzZI/Hpv+3TVVXzSv5j32D3srnumLs+XnVIarV
 nJFoVUZV/XSC80YUkwbcdY6Rs2gVfhHJK6zVcs8MfHC28o+ZJDM+ceGVnUKcdpDW
 Gejsc7l0Blt0IodLoHAemBOsF3eeQBh5M5vodHdEFTiCdGRcCX3lvPxikCILW1Fv
 4FPmrjfOlWEz0ktV4eKacX+DVAa2p9P09v0B6pKFt/l5MiHKla8qdYXLjEnEHHaN
 ywIJPK0Lbgr+rjf3XcEQ96sjP+2XOcmtwTxychEcQ7Z2IwqyJA/GtdyCh1/jinap
 0M9odRHtYHRk1qUcZBLosM3C3Y0rgc2k1RZJRgdAY1kiBezctoU6FkH5Pb7LFRtH
 hr3/llk9X1ivh6fruLZ6Lu2EZ/vJVOwtUNLFqPO8fLP4cABkhDdxX13o5PS+qYMJ
 THXReDUV4vgtmzKrgJ+7
 =w1+M
 -----END PGP SIGNATURE-----

Merge tag 'v2.9.5' into maint-2.10

Git 2.9.5
2017-07-30 14:57:33 -07:00
Junio C Hamano
af0178aec7 Git 2.8.6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZflRcAAoJELC16IaWr+bLOnAQAIEtjMActDfpYb+tXftBIzzm
 Od/tBG3WZMRPyq/fWExV9nPO5xYOf6O9PlU6H7rNMDh+2n5/ypxqEXDvjzNHRMyh
 TIk1oAjG0zDiSe/fHO4v3fcCeIne0C0ZDwzYjS9+mSnybmPRLMQ1j8ykV7oBIUlB
 A081Tcb86bxG9kdxO4Sih+0zIglZ1lNA9fH7PqY5v/DqBY9TkaZIuoEjCIo7wUYu
 k+kSrNjXWz8HdYovpO/snhgtU7TFS7OtWmYEvXBg4+p6R1nGCuSWejHeWrbqx3fI
 QPXdLXIua/NqZKdd6ad4K+K91XW1OaqnK49IY58sSzHXYiDRnfnmBDzduyuagEE1
 C3BQhALMvkGZBmkNI1unZBqxsz4E7hviyxeOt1W3Z/I8mt6IGGnLWg+oVEy4b3yj
 TAx4rQJs1xmGU5maR25yBnQI/ElZWHNg+vrtGhdt5XvklASwn8egukjAjUWJodie
 hs/BiMKf+Rk7dVPY6RnK94pHWtNpkTlD9VCaLXhmFN863Zc3DwYBcbUF2D78d5G8
 zLG1pQRtWizAjF9XJ/q01JAutHUyyoYGWwa8lKJvplxQaXwe0bntzPILZN81G1Cy
 mC955bsbyIGv+88elRAeYpu7SxQJ1uGmpMYcamdLr7irDF2bUZp7n55Ogia4IKvK
 LgvwELkejo1WgDBYvqET
 =iOsd
 -----END PGP SIGNATURE-----

Merge tag 'v2.8.6' into maint-2.9

Git 2.8.6
2017-07-30 14:52:14 -07:00
Junio C Hamano
7720c33f63 Git 2.7.6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZflNxAAoJELC16IaWr+bLpSsQAIT1s4c/uKAXJBw8CegM4SP1
 SeB5NMnjz7VVtBsdXKPy6fVXBHCjffON/MvNXcXwGqzx3lh6SiMAVNjYknBkQcKN
 b639dD9HEEBRFf62a+QAyRYbFeg0NONVydB25s7RfR57HUNxFibaJDT5SoymO0/5
 YCdmMENuvijvCYcwyb3MSjAKCkwDDErPzyI4NZ2YZpC7IG46Uoxq8BCdHpKhXa5I
 3TNEDruBAd/UJCIQiMW1HP3OMQXzXmCTL5i4QSr/uloO1kNzkWgCZDkkFrSGFPdx
 UeTRXOM0r5QdFXZC36zZNoL5ELflgzrYFSerj6VkCAbiG4FAWL+43CCxuUcq5OkZ
 JsTYObieBMFiaowTn9hKo3ix1xDSjR2+p0bfZbOPy5jMB85oegnjV3Rp/eBoXsDm
 h4qo+5kv0h8H2wKdxcBfVg6LkpBZGsvEOveAtWZIcFIVIOyULj9UAsnTwOotwQiL
 NHO4J2fJhcvSYUj6oGB3SpabKZfcbVXRE2fzZq+3+Mt4DdzSdSmx5CEJfUmxN7sQ
 YLb8UKSr2vv03YfKRghCGxqjOcmQL5vY79O8+QSN3cCDFFAwxzNYaGeHJ+/chvh2
 NySOkUf/uA7H1xQiZmJI1mfwQvi527MEzblCPDButm6n8ty6QyWOQ+kQYzcW5jjI
 kPWdqc5pCZQ+Q+q6lQc0
 =rNay
 -----END PGP SIGNATURE-----

Merge tag 'v2.7.6' into maint-2.8

Git 2.7.6
2017-07-30 14:46:43 -07:00
Jeff King
2491f77b90 connect: factor out "looks like command line option" check
We reject hostnames that start with a dash because they may
be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-28 15:51:56 -07:00
Junio C Hamano
487fe1ffcd Merge branch 'ls/filter-process-delayed' into jt/subprocess-handshake
* ls/filter-process-delayed:
  convert: add "status=delayed" to filter process protocol
  convert: refactor capabilities negotiation
  convert: move multiple file filter error handling to separate function
  convert: put the flags field before the flag itself for consistent style
  t0021: write "OUT <size>" only on success
  t0021: make debug log file name configurable
  t0021: keep filter log files on comparison
2017-07-26 12:56:19 -07:00
brian m. carlson
321c89bf5f sha1_name: convert GET_SHA1* flags to GET_OID*
Convert the flags for get_oid_with_context and friends to use "OID"
instead of "SHA1" in their names.

This transform was made by running the following one-liner on the
affected files:

  perl -pi -e 's/GET_SHA1/GET_OID/g'

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-17 13:54:51 -07:00
brian m. carlson
e82caf384b sha1_name: convert get_sha1* to get_oid*
Now that all the callers of get_sha1 directly or indirectly use struct
object_id, rename the functions starting with get_sha1 to start with
get_oid.  Convert the internals in sha1_name.c to use struct object_id
as well, and eliminate explicit length checks where possible.  Convert a
use of 40 in get_oid_basic to GIT_SHA1_HEXSZ.

Outside of sha1_name.c and cache.h, this transition was made with the
following semantic patch:

@@
expression E1, E2;
@@
- get_sha1(E1, E2.hash)
+ get_oid(E1, &E2)

@@
expression E1, E2;
@@
- get_sha1(E1, E2->hash)
+ get_oid(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_committish(E1, E2.hash)
+ get_oid_committish(E1, &E2)

@@
expression E1, E2;
@@
- get_sha1_committish(E1, E2->hash)
+ get_oid_committish(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_treeish(E1, E2.hash)
+ get_oid_treeish(E1, &E2)

@@
expression E1, E2;
@@
- get_sha1_treeish(E1, E2->hash)
+ get_oid_treeish(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_commit(E1, E2.hash)
+ get_oid_commit(E1, &E2)

@@
expression E1, E2;
@@
- get_sha1_commit(E1, E2->hash)
+ get_oid_commit(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_tree(E1, E2.hash)
+ get_oid_tree(E1, &E2)

@@
expression E1, E2;
@@
- get_sha1_tree(E1, E2->hash)
+ get_oid_tree(E1, E2)

@@
expression E1, E2;
@@
- get_sha1_blob(E1, E2.hash)
+ get_oid_blob(E1, &E2)

@@
expression E1, E2;
@@
- get_sha1_blob(E1, E2->hash)
+ get_oid_blob(E1, E2)

@@
expression E1, E2, E3, E4;
@@
- get_sha1_with_context(E1, E2, E3.hash, E4)
+ get_oid_with_context(E1, E2, &E3, E4)

@@
expression E1, E2, E3, E4;
@@
- get_sha1_with_context(E1, E2, E3->hash, E4)
+ get_oid_with_context(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-17 13:54:51 -07:00
Junio C Hamano
00b7cf2379 Merge branch 'jt/unify-object-info'
Code clean-ups.

* jt/unify-object-info:
  sha1_file: refactor has_sha1_file_with_flags
  sha1_file: do not access pack if unneeded
  sha1_file: teach sha1_object_info_extended more flags
  sha1_file: refactor read_object
  sha1_file: move delta base cache code up
  sha1_file: rename LOOKUP_REPLACE_OBJECT
  sha1_file: rename LOOKUP_UNKNOWN_OBJECT
  sha1_file: teach packed_object_info about typename
2017-07-05 13:32:57 -07:00
Junio C Hamano
5ab148dda0 Merge branch 'rs/sha1-name-readdir-optim'
Optimize "what are the object names already taken in an alternate
object database?" query that is used to derive the length of prefix
an object name is uniquely abbreviated to.

* rs/sha1-name-readdir-optim:
  sha1_file: guard against invalid loose subdirectory numbers
  sha1_file: let for_each_file_in_obj_subdir() handle subdir names
  p4205: add perf test script for pretty log formats
  sha1_name: cache readdir(3) results in find_short_object_filename()
2017-07-05 13:32:56 -07:00
Lars Schneider
2841e8f81c convert: add "status=delayed" to filter process protocol
Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.

Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.

Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:50:41 -07:00
Stefan Beller
091f8e28b4 diff.c: migrate emit_line_checked to use emit_diff_symbol
Add a new flags field to emit_diff_symbol, that will be used by
context lines for:
* white space rules that are applicable (The first 12 bits)
  Take a note in cahe.c as well, when this ws rules are extended we have
  to fix the bits in the flags field.
* how the rules are evaluated (actually this double encodes the sign
  of the line, but the code is easier to keep this way, bits 13,14,15)
* if the line a blank line at EOF (bit 16)

The check if new lines need to be marked up as extra lines at the end of
file, is now done unconditionally. That should be ok, as
'new_blank_line_at_eof' has a quick early return.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:13:01 -07:00
Jonathan Tan
e83e71c5e1 sha1_file: refactor has_sha1_file_with_flags
has_sha1_file_with_flags() implements many mechanisms in common with
sha1_object_info_extended(). Make has_sha1_file_with_flags() a
convenience function for sha1_object_info_extended() instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26 10:28:58 -07:00
Jonathan Tan
dfdd4afcf9 sha1_file: teach sha1_object_info_extended more flags
Improve sha1_object_info_extended() by supporting additional
flags. This allows has_sha1_file_with_flags() to be modified to use
sha1_object_info_extended() in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26 10:28:42 -07:00
Junio C Hamano
f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Junio C Hamano
5812b3f73b Merge branch 'bw/ls-files-sans-the-index'
Code clean-up.

* bw/ls-files-sans-the-index:
  ls-files: factor out tag calculation
  ls-files: factor out debug info into a function
  ls-files: convert show_files to take an index
  ls-files: convert show_ce_entry to take an index
  ls-files: convert prune_cache to take an index
  ls-files: convert ce_excluded to take an index
  ls-files: convert show_ru_info to take an index
  ls-files: convert show_other_files to take an index
  ls-files: convert show_killed_files to take an index
  ls-files: convert write_eolinfo to take an index
  ls-files: convert overlay_tree_on_cache to take an index
  tree: convert read_tree to take an index parameter
  convert: convert renormalize_buffer to take an index
  convert: convert convert_to_git to take an index
  convert: convert convert_to_git_filter_fd to take an index
  convert: convert crlf_to_git to take an index
  convert: convert get_cached_convert_stats_ascii to take an index
2017-06-24 14:28:40 -07:00
René Scharfe
70c49050d4 sha1_file: guard against invalid loose subdirectory numbers
Loose object subdirectories have hexadecimal names based on the first
byte of the hash of contained objects, thus their numerical
representation can range from 0 (0x00) to 255 (0xff).  Change the type
of the corresponding variable in for_each_file_in_obj_subdir() and
associated callback functions to unsigned int and add a range check.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24 11:09:52 -07:00
Brandon Williams
e7d72d0753 path: create path.h
Move all path related declarations from cache.h to a new path.h header
file.  This makes cache.h smaller and makes it easier to add new path
related functions.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-23 18:24:34 -07:00
Brandon Williams
c14c234f22 environment: place key repository state in the_repository
Migrate 'git_dir', 'git_common_dir', 'git_object_dir', 'git_index_file',
'git_graft_file', and 'namespace' to be stored in 'the_repository'.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-23 18:24:34 -07:00
Brandon Williams
73f192c991 setup: don't perform lazy initialization of repository state
Under some circumstances (bogus GIT_DIR value or the discovered gitdir
is '.git') 'setup_git_directory()' won't initialize key repository
state.  This leads to inconsistent state after running the setup code.
To account for this inconsistent state, lazy initialization is done once
a caller asks for the repository's gitdir or some other piece of
repository state.  This is confusing and can be error prone.

Instead let's tighten the expected outcome of 'setup_git_directory()'
and ensure that it initializes repository state in all cases that would
have been handled by lazy initialization.

This also lets us drop the requirement to have 'have_git_dir()' check if
the environment variable GIT_DIR was set as that will be handled by the
end of the setup code.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-23 18:24:34 -07:00
Junio C Hamano
25bf951381 Merge branches 'bw/ls-files-sans-the-index' and 'bw/config-h' into bw/repo-object
* bw/ls-files-sans-the-index:
  ls-files: factor out tag calculation
  ls-files: factor out debug info into a function
  ls-files: convert show_files to take an index
  ls-files: convert show_ce_entry to take an index
  ls-files: convert prune_cache to take an index
  ls-files: convert ce_excluded to take an index
  ls-files: convert show_ru_info to take an index
  ls-files: convert show_other_files to take an index
  ls-files: convert show_killed_files to take an index
  ls-files: convert write_eolinfo to take an index
  ls-files: convert overlay_tree_on_cache to take an index
  tree: convert read_tree to take an index parameter
  convert: convert renormalize_buffer to take an index
  convert: convert convert_to_git to take an index
  convert: convert convert_to_git_filter_fd to take an index
  convert: convert crlf_to_git to take an index
  convert: convert get_cached_convert_stats_ascii to take an index

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
  alias: use the early config machinery to expand aliases
  t7006: demonstrate a problem with aliases in subdirectories
  t1308: relax the test verifying that empty alias values are disallowed
  help: use early config when autocorrecting aliases
  config: report correct line number upon error
  discover_git_directory(): avoid setting invalid git_dir
2017-06-23 18:24:00 -07:00
René Scharfe
cc817ca3ef sha1_name: cache readdir(3) results in find_short_object_filename()
Read each loose object subdirectory at most once when looking for unique
abbreviated hashes.  This speeds up commands like "git log --pretty=%h"
considerably, which previously caused one readdir(3) call for each
candidate, even for subdirectories that were visited before.

The new cache is kept until the program ends and never invalidated.  The
same is already true for pack indexes.  The inherent racy nature of
finding unique short hashes makes it still fit for this purpose -- a
conflicting new object may be added at any time.  Tasks with higher
consistency requirements should not use it, though.

The cached object names are stored in an oid_array, which is quite
compact.  The bitmap for remembering which subdir was already read is
stored as a char array, with one char per directory -- that's not quite
as compact, but really simple and incurs only an overhead equivalent to
11 hashes after all.

Suggested-by: Jeff King <peff@peff.net>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-22 12:07:51 -07:00
Jonathan Tan
c84a1f3ed4 sha1_file: refactor read_object
read_object() and sha1_object_info_extended() both implement mechanisms
such as object replacement, retrying the packed store after failing to
find the object in the packed store then the loose store, and being able
to mark a packed object as bad and then retrying the whole process.
Consolidating these mechanisms would be a great help to maintainability.

Therefore, consolidate them by extending sha1_object_info_extended() to
support the functionality needed, and then modifying read_object() to
use sha1_object_info_extended().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21 18:54:43 -07:00
Jonathan Tan
1f0c0d36c1 sha1_file: rename LOOKUP_REPLACE_OBJECT
The LOOKUP_REPLACE_OBJECT flag controls whether the
lookup_replace_object() function is invoked by
sha1_object_info_extended(), read_sha1_file_extended(), and
lookup_replace_object_extended(), but it is not immediately clear which
functions accept that flag.

Therefore restrict this flag to only sha1_object_info_extended(),
renaming it appropriately to OBJECT_INFO_LOOKUP_REPLACE and adding some
documentation. Update read_sha1_file_extended() to have a boolean
parameter instead, and delete lookup_replace_object_extended().

parse_sha1_header() also passes this flag to
parse_sha1_header_extended() since commit 46f0344 ("sha1_file: support
reading from a loose object of unknown type", 2015-05-03), but that has
had no effect since that commit. Therefore this patch also removes this
flag from that invocation.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21 18:54:43 -07:00
Jonathan Tan
19fc5e84a7 sha1_file: rename LOOKUP_UNKNOWN_OBJECT
The LOOKUP_UNKNOWN_OBJECT flag was introduced in commit 46f0344
("sha1_file: support reading from a loose object of unknown type",
2015-05-03) in order to support a feature in cat-file subsequently
introduced in commit 39e4ae3 ("cat-file: teach cat-file a
'--allow-unknown-type' option", 2015-05-03). Despite its name and
location in cache.h, this flag is used neither in
read_sha1_file_extended() nor in any of the lookup functions, but used
only in sha1_object_info_extended().

Therefore rename this flag to OBJECT_INFO_ALLOW_UNKNOWN_TYPE, taking the
name of the cat-file flag that invokes this feature, and move it closer
to the declaration of sha1_object_info_extended(). Also add
documentation for this flag.

OBJECT_INFO_ALLOW_UNKNOWN_TYPE is defined to 2, not 1, to avoid
conflicting with LOOKUP_REPLACE_OBJECT. Avoidance of this conflict is
necessary because sha1_object_info_extended() supports both flags.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21 18:54:43 -07:00
Junio C Hamano
a6f38c109b Merge branch 'bw/object-id'
Conversion from uchar[20] to struct object_id continues.

* bw/object-id: (33 commits)
  diff: rename diff_fill_sha1_info to diff_fill_oid_info
  diffcore-rename: use is_empty_blob_oid
  tree-diff: convert path_appendnew to object_id
  tree-diff: convert diff_tree_paths to struct object_id
  tree-diff: convert try_to_follow_renames to struct object_id
  builtin/diff-tree: cleanup references to sha1
  diff-tree: convert diff_tree_sha1 to struct object_id
  notes-merge: convert write_note_to_worktree to struct object_id
  notes-merge: convert verify_notes_filepair to struct object_id
  notes-merge: convert find_notes_merge_pair_ps to struct object_id
  notes-merge: convert merge_from_diffs to struct object_id
  notes-merge: convert notes_merge* to struct object_id
  tree-diff: convert diff_root_tree_sha1 to struct object_id
  combine-diff: convert find_paths_* to struct object_id
  combine-diff: convert diff_tree_combined to struct object_id
  diff: convert diff_flush_patch_id to struct object_id
  patch-ids: convert to struct object_id
  diff: finish conversion for prepare_temp_file to struct object_id
  diff: convert reuse_worktree_file to struct object_id
  diff: convert fill_filespec to struct object_id
  ...
2017-06-19 12:38:44 -07:00
Brandon Williams
d3fb71b3cb setup: teach discover_git_directory to respect the commondir
Currently 'discover_git_directory' only looks at the gitdir to determine
if a git directory was discovered.  This causes a problem in the event
that the gitdir which was discovered was in fact a per-worktree git
directory and not the common git directory.  This is because the
repository config, which is checked to verify the repository's format,
is stored in the commondir and not in the per-worktree gitdir.  Correct
this behavior by checking the config stored in the commondir.

It will also be of use for callers to have access to the commondir, so
lets also return that upon successfully discovering a git directory.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Brandon Williams
b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Brandon Williams
e67a57fc51 config: create config.h
Move all config related declarations from cache.h to a new config.h
header file.  This makes cache.h smaller and allows for the opportunity
in a following patch to only include config.h when needed.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Brandon Williams
312c984a02 ls-files: convert overlay_tree_on_cache to take an index
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13 11:40:51 -07:00
Junio C Hamano
7ef0d04738 Merge branch 'jk/diff-blob'
The result from "git diff" that compares two blobs, e.g. "git diff
$commit1:$path $commit2:$path", used to be shown with the full
object name as given on the command line, but it is more natural to
use the $path in the output and use it to look up .gitattributes.

* jk/diff-blob:
  diff: use blob path for blob/file diffs
  diff: use pending "path" if it is available
  diff: use the word "path" instead of "name" for blobs
  diff: pass whole pending entry in blobinfo
  handle_revision_arg: record paths for pending objects
  handle_revision_arg: record modes for "a..b" endpoints
  t4063: add tests of direct blob diffs
  get_sha1_with_context: dynamically allocate oc->path
  get_sha1_with_context: always initialize oc->symlink_path
  sha1_name: consistently refer to object_context as "oc"
  handle_revision_arg: add handle_dotdot() helper
  handle_revision_arg: hoist ".." check out of range parsing
  handle_revision_arg: stop using "dotdot" as a generic pointer
  handle_revision_arg: simplify commit reference lookups
  handle_revision_arg: reset "dotdot" consistently
2017-06-02 15:06:05 +09:00
Brandon Williams
1c41c82bc4 grep: convert to struct object_id
Convert the remaining parts of grep to use struct object_id.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:06 +09:00
Junio C Hamano
fa0624f79f Merge branch 'dt/unpack-save-untracked-cache-extension'
When "git checkout", "git merge", etc. manipulates the in-core
index, various pieces of information in the index extensions are
discarded from the original state, as it is usually not the case
that they are kept up-to-date and in-sync with the operation on the
main index.  The untracked cache extension is copied across these
operations now, which would speed up "git status" (as long as the
cache is properly invalidated).

* dt/unpack-save-untracked-cache-extension:
  unpack-trees: preserve index extensions
2017-05-30 11:16:45 +09:00
Junio C Hamano
6b526ced6f Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (53 commits)
  object: convert parse_object* to take struct object_id
  tree: convert parse_tree_indirect to struct object_id
  sequencer: convert do_recursive_merge to struct object_id
  diff-lib: convert do_diff_cache to struct object_id
  builtin/ls-tree: convert to struct object_id
  merge: convert checkout_fast_forward to struct object_id
  sequencer: convert fast_forward_to to struct object_id
  builtin/ls-files: convert overlay_tree_on_cache to object_id
  builtin/read-tree: convert to struct object_id
  sha1_name: convert internals of peel_onion to object_id
  upload-pack: convert remaining parse_object callers to object_id
  revision: convert remaining parse_object callers to object_id
  revision: rename add_pending_sha1 to add_pending_oid
  http-push: convert process_ls_object and descendants to object_id
  refs/files-backend: convert many internals to struct object_id
  refs: convert struct ref_update to use struct object_id
  ref-filter: convert some static functions to struct object_id
  Convert struct ref_array_item to struct object_id
  Convert the verify_pack callback to struct object_id
  Convert lookup_tag to struct object_id
  ...
2017-05-29 12:34:43 +09:00
Jeff King
dc944b65f1 get_sha1_with_context: dynamically allocate oc->path
When a sha1 lookup returns the tree path via "struct
object_context", it just copies it into a fixed-size buffer.
This means the result can be truncated, and it means our
"struct object_context" consumes a lot of stack space.

Instead, let's allocate a string on the heap. Because most
callers don't care about this information, we'll avoid doing
it by default (so they don't all have to start calling
free() on the result). There are basically two options for
the caller to signal to us that it's interested:

  1. By setting a pointer to storage in the object_context.

  2. By passing a flag in another parameter.

Doing (1) would match the way that sha1_object_info_extended()
works. But it would mean that every caller would have to
initialize the object_context, which they don't currently
have to do.

This patch does (2), and adds a new bit to the function's
flags field. All of the callers that look at the "path"
field are updated to pass the new flag.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-24 10:59:27 +09:00
Jeff King
c0a487eafb sha1_name: consistently refer to object_context as "oc"
An early version of the patch to add object_context used the
name object_resolve_context. This was later shortened to
just object_context, but the "orc" variable name stuck in a
few places.  Let's use "oc", which is used elsewhere in the
code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-24 10:59:27 +09:00
David Turner
edf3b90553 unpack-trees: preserve index extensions
Make git checkout (and other unpack_tree operations) preserve the
untracked cache. This is valuable for two reasons:

1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.

2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.

[jes: backed out the watchman-specific parts]

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-20 18:26:45 +09:00
Junio C Hamano
b15667bbdc Merge branch 'js/larger-timestamps'
Some platforms have ulong that is smaller than time_t, and our
historical use of ulong for timestamp would mean they cannot
represent some timestamp that the platform allows.  Invent a
separate and dedicated timestamp_t (so that we can distingiuish
timestamps and a vanilla ulongs, which along is already a good
move), and then declare uintmax_t is the type to be used as the
timestamp_t.

* js/larger-timestamps:
  archive-tar: fix a sparse 'constant too large' warning
  use uintmax_t for timestamps
  date.c: abort if the system time cannot handle one of our timestamps
  timestamp_t: a new data type for timestamps
  PRItime: introduce a new "printf format" for timestamps
  parse_timestamp(): specify explicitly where we parse timestamps
  t0006 & t5000: skip "far in the future" test when time_t is too limited
  t0006 & t5000: prepare for 64-bit timestamps
  ref-filter: avoid using `unsigned long` for catch-all data type
2017-05-16 11:51:59 +09:00
brian m. carlson
f06e90dac1 merge: convert checkout_fast_forward to struct object_id
Converting checkout_fast_forward is required to convert
parse_tree_indirect.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:58 +09:00
Johannes Schindelin
dddbad728c timestamp_t: a new data type for timestamps
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).

So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.

By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.

As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-27 13:07:39 +09:00
Junio C Hamano
6cbc478d83 Merge branch 'jh/add-index-entry-optim'
"git checkout" that handles a lot of paths has been optimized by
reducing the number of unnecessary checks of paths in the
has_dir_name() function.

* jh/add-index-entry-optim:
  read-cache: speed up has_dir_name (part 2)
  read-cache: speed up has_dir_name (part 1)
  read-cache: speed up add_index_entry during checkout
  p0006-read-tree-checkout: perf test to time read-tree
  read-cache: add strcmp_offset function
2017-04-26 15:39:07 +09:00
Junio C Hamano
c9672ba4c8 Merge branch 'nd/conditional-config-in-early-config'
The recently introduced conditional inclusion of configuration did
not work well when early-config mechanism was involved.

* nd/conditional-config-in-early-config:
  config: correct file reading order in read_early_config()
  config: handle conditional include when $GIT_DIR is not set up
  config: prepare to pass more info in git_config_with_options()
2017-04-26 15:39:05 +09:00
Junio C Hamano
cdfe138b36 Merge branch 'jh/verify-index-checksum-only-in-fsck'
The index file has a trailing SHA-1 checksum to detect file
corruption, and historically we checked it every time the index
file is used.  Omit the validation during normal use, and instead
verify only in "git fsck".

* jh/verify-index-checksum-only-in-fsck:
  read-cache: force_verify_index_checksum
2017-04-23 22:07:49 -07:00
Junio C Hamano
a2e2c04683 Merge branch 'nd/conditional-config-include'
$GIT_DIR may in some cases be normalized with all symlinks resolved
while "gitdir" path expansion in the pattern does not receive the
same treatment, leading to incorrect mismatch.  This has been fixed.

* nd/conditional-config-include:
  config: resolve symlinks in conditional include's patterns
  path.c: and an option to call real_path() in expand_user_path()
2017-04-23 22:07:46 -07:00
Junio C Hamano
4c01f67d91 Merge branch 'dt/http-postbuffer-can-be-large'
Allow the http.postbuffer configuration variable to be set to a
size that can be expressed in size_t, which can be larger than
ulong on some platforms.

* dt/http-postbuffer-can-be-large:
  http.postbuffer: allow full range of ssize_t values
2017-04-23 22:07:45 -07:00
Junio C Hamano
b1081e4004 Merge branch 'bc/object-id'
Conversion from unsigned char [40] to struct object_id continues.

* bc/object-id:
  Documentation: update and rename api-sha1-array.txt
  Rename sha1_array to oid_array
  Convert sha1_array_for_each_unique and for_each_abbrev to object_id
  Convert sha1_array_lookup to take struct object_id
  Convert remaining callers of sha1_array_lookup to object_id
  Make sha1_array_append take a struct object_id *
  sha1-array: convert internal storage for struct sha1_array to object_id
  builtin/pull: convert to struct object_id
  submodule: convert check_for_new_submodule_commits to object_id
  sha1_name: convert disambiguate_hint_fn to take object_id
  sha1_name: convert struct disambiguate_state to object_id
  test-sha1-array: convert most code to struct object_id
  parse-options-cb: convert sha1_array_append caller to struct object_id
  fsck: convert init_skiplist to struct object_id
  builtin/receive-pack: convert portions to struct object_id
  builtin/pull: convert portions to struct object_id
  builtin/diff: convert to struct object_id
  Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
  Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
  Define new hash-size constants for allocating memory
2017-04-19 21:37:13 -07:00
Nguyễn Thái Ngọc Duy
2185fde563 config: handle conditional include when $GIT_DIR is not set up
If setup_git_directory() and friends have not been called,
get_git_dir() (because of includeIf.gitdir:XXX) would lead to

    die("BUG: setup_git_env called without repository");

There are two cases when a config file could be read before $GIT_DIR
is located.

The first one is check_repository_format(), where we read just the one
file $GIT_DIR/config to check if we could understand this
repository. This case should be safe. We do not parse include
directives, which can only be triggered from git_config_with_options,
but setup code uses a lower-level function. The concerned variables
should never be hidden away behind includes anyway.

The second one is triggered in check_pager_config() when we're about
to run an external git command. We might be able to find $GIT_DIR in
this case, which is exactly what read_early_config() does (and also is
what check_pager_config() uses). Conditional includes and
get_git_dir() could be triggered by the first
git_config_with_options() call there, before discover_git_directory()
is used as a fallback $GIT_DIR detection.

Detect this special "early reading" case, pass down the $GIT_DIR,
either from previous setup or detected by discover_git_directory(),
and make conditional include use it.

Noticed-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-17 19:18:43 -07:00
Nguyễn Thái Ngọc Duy
c48f4b379e config: prepare to pass more info in git_config_with_options()
So far we can only pass one flag, respect_includes, to thie function. We
need to pass some more (non-flag even), so let's make it accept a struct
instead of an integer.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-17 19:18:40 -07:00
Junio C Hamano
cb054eb264 Merge branch 'jk/snprintf-cleanups'
Code clean-up.

* jk/snprintf-cleanups:
  daemon: use an argv_array to exec children
  gc: replace local buffer with git_path
  transport-helper: replace checked snprintf with xsnprintf
  convert unchecked snprintf into xsnprintf
  combine-diff: replace malloc/snprintf with xstrfmt
  replace unchecked snprintf calls with heap buffers
  receive-pack: print --pack-header directly into argv array
  name-rev: replace static buffer with strbuf
  create_branch: use xstrfmt for reflog message
  create_branch: move msg setup closer to point of use
  avoid using mksnpath for refs
  avoid using fixed PATH_MAX buffers for refs
  fetch: use heap buffer to format reflog
  tag: use strbuf to format tag header
  diff: avoid fixed-size buffer for patch-ids
  odb_mkstemp: use git_path_buf
  odb_mkstemp: write filename into strbuf
  do not check odb_mkstemp return value for errors
2017-04-16 23:29:26 -07:00
Jeff Hostetler
a6db3fbb6e read-cache: add strcmp_offset function
Add strcmp_offset() function to also return the offset of the
first change.

Add unit test and helper to verify.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 02:21:12 -07:00
Jeff Hostetler
a33fc72fe9 read-cache: force_verify_index_checksum
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.

Teach fsck to force this verification.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

These effect can be seen using t/perf/p0002-read-cache.sh:

Test                                          HEAD~1            HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.66(0.44+0.20)   0.30(0.27+0.02) -54.5%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 00:58:36 -07:00
Nguyễn Thái Ngọc Duy
4aad2f1627 path.c: and an option to call real_path() in expand_user_path()
In the next patch we need the ability to expand '~' to
real_path($HOME). But we can't do that from outside because '~' is part
of a pattern, not a true path. Add an option to expand_user_path() to do
so.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-14 23:51:38 -07:00
David Turner
37ee680d9b http.postbuffer: allow full range of ssize_t values
Unfortunately, in order to push some large repos where a server does
not support chunked encoding, the http postbuffer must sometimes
exceed two gigabytes.  On a 64-bit system, this is OK: we just malloc
a larger buffer.

This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
buffer size.

Signed-off-by: David Turner <dturner@twosigma.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-13 18:24:32 -07:00
brian m. carlson
1b7ba794d2 Convert sha1_array_for_each_unique and for_each_abbrev to object_id
Make sha1_array_for_each_unique take a callback using struct object_id.
Since one of these callbacks is an argument to for_each_abbrev, convert
those as well.  Rename various functions, replacing "sha1" with "oid".

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:55 -07:00
Junio C Hamano
3736c92558 Merge branch 'bw/recurse-submodules-relative-fix'
A few commands that recently learned the "--recurse-submodule"
option misbehaved when started from a subdirectory of the
superproject.

* bw/recurse-submodules-relative-fix:
  ls-files: fix bug when recursing with relative pathspec
  ls-files: fix typo in variable name
  grep: fix bug when recursing with relative pathspec
  setup: allow for prefix to be passed to git commands
  grep: fix help text typo
2017-03-30 14:07:15 -07:00
Jeff King
594fa9998c odb_mkstemp: write filename into strbuf
The odb_mkstemp() function expects the caller to provide a
fixed buffer to write the resulting tempfile name into. But
it creates the template using snprintf without checking the
return value. This means we could silently truncate the
filename.

In practice, it's unlikely that the truncation would end in
the template-pattern that mkstemp needs to open the file. So
we'd probably end up failing either way, unless the path was
specially crafted.

The simplest fix would be to notice the truncation and die.
However, we can observe that most callers immediately
xstrdup() the result anyway. So instead, let's switch to
using a strbuf, which is easier for them (and isn't a big
deal for the other 2 callers, who can just strbuf_release
when they're done with it).

Note that many of the callers used static buffers, but this
was purely to avoid putting a large buffer on the stack. We
never passed the static buffers out of the function, so
there's no complicated memory handling we need to change.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-28 15:28:04 -07:00
Jeff King
892e723afd do not check odb_mkstemp return value for errors
The odb_mkstemp function does not return an error; it dies
on failure instead. But many of its callers compare the
resulting descriptor against -1 and die themselves.

Mostly this is just pointless, but it does raise a question
when looking at the callers: if they show the results of the
"template" buffer after a failure, what's in it? The answer
is: it doesn't matter, because it cannot happen.

So let's make that clear by removing the bogus error checks.
In bitmap_writer_finish(), we can drop the error-handling
code entirely. In the other two cases, it's shared with the
open() in another code path; we can just move the
error-check next to that open() call.

And while we're at it, let's flesh out the function's
docstring a bit to make the error behavior clear.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-28 15:28:04 -07:00
Junio C Hamano
0330344e0f Merge branch 'jh/memihash-opt'
The name-hash used for detecting paths that are different only in
cases (which matter on case insensitive filesystems) has been
optimized to take advantage of multi-threading when it makes sense.

* jh/memihash-opt:
  name-hash: add test-lazy-init-name-hash to .gitignore
  name-hash: add perf test for lazy_init_name_hash
  name-hash: add test-lazy-init-name-hash
  name-hash: perf improvement for lazy_init_name_hash
  hashmap: document memihash_cont, hashmap_disallow_rehash api
  hashmap: add disallow_rehash setting
  hashmap: allow memihash computation to be continued
  name-hash: specify initial size for istate.dir_hash table
2017-03-28 14:06:00 -07:00
Junio C Hamano
ba5e05ffef Merge branch 'jk/pack-name-cleanups' into maint
Code clean-up.

* jk/pack-name-cleanups:
  index-pack: make pointer-alias fallbacks safer
  replace snprintf with odb_pack_name()
  odb_pack_keep(): stop generating keepfile name
  sha1_file.c: make pack-name helper globally accessible
  move odb_* declarations out of git-compat-util.h
2017-03-28 13:52:25 -07:00
Junio C Hamano
41534b626e Merge branch 'jk/interpret-branch-name' into maint
"git branch @" created refs/heads/@ as a branch, and in general the
code that handled @{-1} and @{upstream} was a bit too loose in
disambiguating.

* jk/interpret-branch-name:
  checkout: restrict @-expansions when finding branch
  strbuf_check_ref_format(): expand only local branches
  branch: restrict @-expansions when deleting
  t3204: test git-branch @-expansion corner cases
  interpret_branch_name: allow callers to restrict expansions
  strbuf_branchname: add docstring
  strbuf_branchname: drop return value
  interpret_branch_name: move docstring to header file
  interpret_branch_name(): handle auto-namelen for @{-1}
2017-03-28 13:52:22 -07:00
Junio C Hamano
c772d1bcdc Merge branch 'jk/parse-config-key-cleanup' into maint
The "parse_config_key()" API function has been cleaned up.

* jk/parse-config-key-cleanup:
  parse_hide_refs_config: tell parse_config_key we don't want a subsection
  parse_config_key: allow matching single-level config
  parse_config_key: use skip_prefix instead of starts_with
  refs: parse_hide_refs_config to use parse_config_key
2017-03-28 13:52:19 -07:00
Junio C Hamano
a026bde1ac Merge branch 'jk/prefix-filename'
Code clean-up with minor bugfixes.

* jk/prefix-filename:
  bundle: use prefix_filename with bundle path
  prefix_filename: simplify windows #ifdef
  prefix_filename: return newly allocated string
  prefix_filename: drop length parameter
  prefix_filename: move docstring to header file
  hash-object: fix buffer reuse with --path in a subdirectory
2017-03-27 10:59:26 -07:00
brian m. carlson
cd02599c48 Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
Since we will likely be introducing a new hash function at some point,
and that hash function might be longer than 20 bytes, use the constant
GIT_MAX_RAWSZ, which is designed to be suitable for allocations, instead
of GIT_SHA1_RAWSZ.  This will ease the transition down the line by
distinguishing between places where we need to allocate memory suitable
for the largest hash from those where we need to handle the current
hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26 22:08:21 -07:00
brian m. carlson
5028bf628c Define new hash-size constants for allocating memory
Since we will want to transition to a new hash at some point in the
future, and that hash may be larger in size than 160 bits, introduce two
constants that can be used for allocating a sufficient amount of memory.
They can be increased to reflect the largest supported hash size.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26 22:08:21 -07:00
Junio C Hamano
78cf8efec3 Merge branch 'dl/credential-cache-socket-in-xdg-cache'
The default location "~/.git-credential-cache/socket" for the
socket used to communicate with the credential-cache daemon has
been moved to "~/.cache/git/credential/socket".

* dl/credential-cache-socket-in-xdg-cache:
  credential-cache: add tests for XDG functionality
  credential-cache: use XDG_CACHE_HOME for socket
  path.c: add xdg_cache_home
2017-03-24 13:07:34 -07:00
Jeff Hostetler
ea19489532 name-hash: add test-lazy-init-name-hash
Add t/helper/test-lazy-init-name-hash.c test code
to demonstrate performance times for lazy_init_name_hash()
using the original single-threaded and the new multi-threaded
code paths.

Includes a --dump option to dump the created hashmaps to
stdout.  You can use this to run both code paths and
confirm that they generate the same hashmaps.

Includes a --analyze option to analyze performance of both
code paths over a range of index sizes to help you find a
lower bound for the LAZY_THREAD_COST in name-hash.c.
For example, passing "-a 4000" will set "istate.cache_nr"
to 4000 and then try the multi-threaded code -- probably
giving 2 threads with 2000 entries each.  It will then
run both the single-threaded (1x4000) and the multi-threaded
(2x2000) and compare the times.  It will then repeat the
test with 8000, 12000, and etc. so that you can see the
cross over.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24 11:00:03 -07:00
Junio C Hamano
45cbc37c5f Merge branch 'jk/pack-name-cleanups'
Code clean-up.

* jk/pack-name-cleanups:
  index-pack: make pointer-alias fallbacks safer
  replace snprintf with odb_pack_name()
  odb_pack_keep(): stop generating keepfile name
  sha1_file.c: make pack-name helper globally accessible
  move odb_* declarations out of git-compat-util.h
2017-03-21 15:07:17 -07:00
Junio C Hamano
f56a4390ee Merge branch 'rj/remove-unused-mktemp' into maint
Code cleanup.

* rj/remove-unused-mktemp:
  wrapper.c: remove unused gitmkstemps() function
  wrapper.c: remove unused git_mkstemp() function
2017-03-21 15:03:24 -07:00
Jeff King
e4da43b1f0 prefix_filename: return newly allocated string
The prefix_filename() function returns a pointer to static
storage, which makes it easy to use dangerously. We already
fixed one buggy caller in hash-object recently, and the
calls in apply.c are suspicious (I didn't dig in enough to
confirm that there is a bug, but we call the function once
in apply_all_patches() and then again indirectly from
parse_chunk()).

Let's make it harder to get wrong by allocating the return
value. For simplicity, we'll do this even when the prefix is
empty (and we could just return the original file pointer).
That will cause us to allocate sometimes when we wouldn't
otherwise need to, but this function isn't called in
performance critical code-paths (and it already _might_
allocate on any given call, so a caller that cares about
performance is questionable anyway).

The downside is that the callers need to remember to free()
the result to avoid leaking. Most of them already used
xstrdup() on the result, so we know they are OK. The
remainder have been converted to use free() as appropriate.

I considered retaining a prefix_filename_unsafe() for cases
where we know the static lifetime is OK (and handling the
cleanup is awkward). This is only a handful of cases,
though, and it's not worth the mental energy in worrying
about whether the "unsafe" variant is OK to use in any
situation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-21 11:18:41 -07:00
Jeff King
116fb64e43 prefix_filename: drop length parameter
This function takes the prefix as a ptr/len pair, but in
every caller the length is exactly strlen(ptr). Let's
simplify the interface and just take the string. This saves
callers specifying it (and in some cases handling a NULL
prefix).

In a handful of cases we had the length already without
calling strlen, so this is technically slower. But it's not
likely to matter (after all, if the prefix is non-empty
we'll allocate and copy it into a buffer anyway).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-21 11:12:53 -07:00
Jeff King
598019769c prefix_filename: move docstring to header file
This is a public function, so we should make its
documentation available near the declaration.

While we're at it, we can give a few details about how it
works.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-21 11:12:52 -07:00
Junio C Hamano
a0393a298f Merge branch 'js/early-config'
The start-up sequence of "git" needs to figure out some configured
settings before it finds and set itself up in the location of the
repository and was quite messy due to its "chicken-and-egg" nature.
The code has been restructured.

* js/early-config:
  setup.c: mention unresolved problems
  t1309: document cases where we would want early config not to die()
  setup_git_directory_gently_1(): avoid die()ing
  t1309: test read_early_config()
  read_early_config(): really discover .git/
  read_early_config(): avoid .git/config hack when unneeded
  setup: make read_early_config() reusable
  setup: introduce the discover_git_directory() function
  setup_git_directory_1(): avoid changing global state
  setup: prepare setup_discovered_git_dir() for the root directory
  setup_git_directory(): use is_dir_sep() helper
  t7006: replace dubious test
2017-03-17 13:50:28 -07:00
Junio C Hamano
81944e9b54 Merge branch 'bc/sha1-header-selection-with-cpp-macros'
Our source code has used the SHA1_HEADER cpp macro after "#include"
in the C code to switch among the SHA-1 implementations. Instead,
list the exact header file names and switch among implementations
using "#ifdef BLK_SHA1/#include "block-sha1/sha1.h"/.../#endif";
this helps some IDE tools.

* bc/sha1-header-selection-with-cpp-macros:
  hash.h: move SHA-1 implementation selection into a header file
2017-03-17 13:50:27 -07:00
Junio C Hamano
e1fae93019 Merge branch 'bc/object-id'
"uchar [40]" to "struct object_id" conversion continues.

* bc/object-id:
  wt-status: convert to struct object_id
  builtin/merge-base: convert to struct object_id
  Convert object iteration callbacks to struct object_id
  sha1_file: introduce an nth_packed_object_oid function
  refs: simplify parsing of reflog entries
  refs: convert each_reflog_ent_fn to struct object_id
  reflog-walk: convert struct reflog_info to struct object_id
  builtin/replace: convert to struct object_id
  Convert remaining callers of resolve_refdup to object_id
  builtin/merge: convert to struct object_id
  builtin/clone: convert to struct object_id
  builtin/branch: convert to struct object_id
  builtin/grep: convert to struct object_id
  builtin/fmt-merge-message: convert to struct object_id
  builtin/fast-export: convert to struct object_id
  builtin/describe: convert to struct object_id
  builtin/diff-tree: convert to struct object_id
  builtin/commit: convert to struct object_id
  hex: introduce parse_oid_hex
2017-03-17 13:50:25 -07:00
Junio C Hamano
94c9b5af70 Merge branch 'cc/split-index-config'
The experimental "split index" feature has gained a few
configuration variables to make it easier to use.

* cc/split-index-config: (22 commits)
  Documentation/git-update-index: explain splitIndex.*
  Documentation/config: add splitIndex.sharedIndexExpire
  read-cache: use freshen_shared_index() in read_index_from()
  read-cache: refactor read_index_from()
  t1700: test shared index file expiration
  read-cache: unlink old sharedindex files
  config: add git_config_get_expiry() from gc.c
  read-cache: touch shared index files when used
  sha1_file: make check_and_freshen_file() non static
  Documentation/config: add splitIndex.maxPercentChange
  t1700: add tests for splitIndex.maxPercentChange
  read-cache: regenerate shared index if necessary
  config: add git_config_get_max_percent_split_change()
  Documentation/git-update-index: talk about core.splitIndex config var
  Documentation/config: add information for core.splitIndex
  t1700: add tests for core.splitIndex
  update-index: warn in case of split-index incoherency
  read-cache: add and then use tweak_split_index()
  split-index: add {add,remove}_split_index() functions
  config: add git_config_get_split_index()
  ...
2017-03-17 13:50:23 -07:00
Brandon Williams
b58a68c1c1 setup: allow for prefix to be passed to git commands
In a future patch child processes which act on submodules need a little
more context about the original command that was invoked.  This patch
teaches git to use the prefix stored in `GIT_INTERNAL_TOPLEVEL_PREFIX`
instead of the prefix that was potentally found during the git directory
setup process.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-17 11:54:50 -07:00
Jeff King
eaeefc3276 odb_pack_keep(): stop generating keepfile name
The odb_pack_keep() function generates the name of a .keep
file and opens it. This has two problems:

  1. It requires a fixed-size buffer to create the filename
     and doesn't notice when the result is truncated.

  2. Of the two callers, one sometimes wants to open a
     filename it already has, which makes things awkward (it
     has to do so manually, and skips the leading-directory
     creation).

Instead, let's have odb_pack_keep() just open the file.
Generating the name isn't hard, and a future patch will
switch callers over to odb_pack_name() anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:17:00 -07:00
Jeff King
1cec8c634f sha1_file.c: make pack-name helper globally accessible
We provide sha1_pack_name() and sha1_pack_index_name(), but
the more generic form (which takes its own strbuf and an
arbitrary extension) is only used to implement the other
two.  Let's make it available, but clean up a few things:

  1. Name it odb_pack_name(), as the original
     sha1_get_pack_name() is long but not all that
     descriptive.

  2. Switch the strbuf argument to the beginning, so that it
     matches similar path-building functions like
     git_path_buf().

  3. Clean up the out-dated docstring and move it to the
     public declaration.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:05:17 -07:00
Jeff King
82c9d6614b move odb_* declarations out of git-compat-util.h
These functions were originally conceived as wrapper
functions similar to xmkstemp(). They were later moved by
463db9b10 (wrapper: move odb_* to environment.c,
2010-11-06). The more appropriate place for a declaration is
in cache.h.

While we're at it, let's add some basic docstrings.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:04:34 -07:00
brian m. carlson
f18f816cb1 hash.h: move SHA-1 implementation selection into a header file
Many developers use functionality in their editors that allows for quick
syntax checks, including warning about questionable constructs.  This
functionality allows rapid development with fewer errors.  However, such
functionality generally does not allow the specification of
project-specific defines or command-line options.

Since the SHA1_HEADER include is not defined in such a case,
developers see spurious errors when using these tools.  Furthermore,
there are known implementations of "cc" whose '#include' is unhappy
with this construct.

Instead of using SHA1_HEADER, create a hash.h header and use #if
and #elif to select the desired header.  Have the Makefile pass an
appropriate option to help the header select the right implementation to
use.

[jc: make BLK_SHA1 the fallback default as discussed on list,
e.g. <20170314201424.vccij5z2ortq4a4o@sigill.intra.peff.net>; also
remove SHA1_HEADER and SHA1_HEADER_SQ that are no longer used].

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-15 11:00:09 -07:00
Junio C Hamano
c809496c97 Merge branch 'jk/interpret-branch-name'
"git branch @" created refs/heads/@ as a branch, and in general the
code that handled @{-1} and @{upstream} was a bit too loose in
disambiguating.

* jk/interpret-branch-name:
  checkout: restrict @-expansions when finding branch
  strbuf_check_ref_format(): expand only local branches
  branch: restrict @-expansions when deleting
  t3204: test git-branch @-expansion corner cases
  interpret_branch_name: allow callers to restrict expansions
  strbuf_branchname: add docstring
  strbuf_branchname: drop return value
  interpret_branch_name: move docstring to header file
  interpret_branch_name(): handle auto-namelen for @{-1}
2017-03-14 15:23:18 -07:00
Johannes Schindelin
0654aa57f3 setup: make read_early_config() reusable
The pager configuration needs to be read early, possibly before
discovering any .git/ directory.

Let's not hide this function in pager.c, but make it available to other
callers.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-14 14:24:16 -07:00
Johannes Schindelin
16ac8b8db6 setup: introduce the discover_git_directory() function
We modified the setup_git_directory_gently_1() function earlier to make
it possible to discover the GIT_DIR without changing global state.

However, it is still a bit cumbersome to use if you only need to figure
out the (possibly absolute) path of the .git/ directory. Let's just
provide a convenient wrapper function with an easier signature that
*just* discovers the .git/ directory.

We will use it in a subsequent patch to fix the early config.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-14 14:24:16 -07:00
Devin Lehmacher
e7f136bf93 path.c: add xdg_cache_home
We already have xdg_config_home to format paths relative to
XDG_CONFIG_HOME. Let's provide a similar function xdg_cache_home to do
the same for paths relative to XDG_CACHE_HOME.

Signed-off-by: Devin Lehmacher <lehmacdj@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-13 14:39:36 -07:00
Junio C Hamano
ba37c92df9 Merge branch 'js/realpath-pathdup-fix'
Git v2.12 was shipped with an embarrassing breakage where various
operations that verify paths given from the user stopped dying when
seeing an issue, and instead later triggering segfault.

* js/realpath-pathdup-fix:
  real_pathdup(): fix callsites that wanted it to die on error
  t1501: demonstrate NULL pointer access with invalid GIT_WORK_TREE
2017-03-12 23:21:33 -07:00
Junio C Hamano
fb907176de Merge branch 'rj/remove-unused-mktemp'
Code cleanup.

* rj/remove-unused-mktemp:
  wrapper.c: remove unused gitmkstemps() function
  wrapper.c: remove unused git_mkstemp() function
2017-03-10 13:24:24 -08:00
Junio C Hamano
963792ed27 Merge branch 'jk/parse-config-key-cleanup'
The "parse_config_key()" API function has been cleaned up.

* jk/parse-config-key-cleanup:
  parse_hide_refs_config: tell parse_config_key we don't want a subsection
  parse_config_key: allow matching single-level config
  parse_config_key: use skip_prefix instead of starts_with
2017-03-10 13:24:22 -08:00
Johannes Schindelin
ce83eadd9a real_pathdup(): fix callsites that wanted it to die on error
In 4ac9006f83 (real_path: have callers use real_pathdup and
strbuf_realpath, 2016-12-12), we changed the xstrdup(real_path())
pattern to use real_pathdup() directly.

The problem with this change is that real_path() calls
strbuf_realpath() with die_on_error = 1 while real_pathdup() calls
it with die_on_error = 0. Meaning that in cases where real_path()
causes Git to die() with an error message, real_pathdup() is silent
and returns NULL instead.

The callers, however, are ill-prepared for that change, as they expect
the return value to be non-NULL (and otherwise the function died
with an appropriate error message).

Fix this by extending real_pathdup()'s signature to accept the
die_on_error flag and simply pass it through to strbuf_realpath(),
and then adjust all callers after a careful audit whether they would
handle NULLs well.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-08 14:38:41 -08:00
Jeff King
0e9f62dab9 interpret_branch_name: allow callers to restrict expansions
The interpret_branch_name() function converts names like
@{-1} and @{upstream} into branch names. The expanded ref
names are not fully qualified, and may be outside of the
refs/heads/ namespace (e.g., "@" expands to "HEAD", and
"@{upstream}" is likely to be in "refs/remotes/").

This is OK for callers like dwim_ref() which are primarily
interested in resolving the resulting name, no matter where
it is. But callers like "git branch" treat the result as a
branch name in refs/heads/.  When we expand to a ref outside
that namespace, the results are very confusing (e.g., "git
branch @" tries to create refs/heads/HEAD, which is
nonsense).

Callers can't know from the returned string how the
expansion happened (e.g., did the user really ask for a
branch named "HEAD", or did we do a bogus expansion?). One
fix would be to return some out-parameters describing the
types of expansion that occurred. This has the benefit that
the caller can generate precise error messages ("I
understood @{upstream} to mean origin/master, but that is a
remote tracking branch, so you cannot create it as a local
name").

However, out-parameters make the function interface somewhat
cumbersome. Instead, let's do the opposite: let the caller
tell us which elements to expand. That's easier to pass in,
and none of the callers give more precise error messages
than "@{upstream} isn't a valid branch name" anyway (which
should be sufficient).

The strbuf_branchname() function needs a similar parameter,
as most of the callers access interpret_branch_name()
through it.

We can break the callers down into two groups:

  1. Callers that are happy with any kind of ref in the
     result. We pass "0" here, so they continue to work
     without restrictions. This includes merge_name(),
     the reflog handling in add_pending_object_with_path(),
     and substitute_branch_name(). This last is what powers
     dwim_ref().

  2. Callers that have funny corner cases (mostly in
     git-branch and git-checkout). These need to make use of
     the new parameter, but I've left them as "0" in this
     patch, and will address them individually in follow-on
     patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 11:05:04 -08:00
Jeff King
e322b60d65 interpret_branch_name: move docstring to header file
We generally put docstrings with function declarations,
because it's the callers who need to know how the function
works. Let's do so for interpret_branch_name().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 11:05:03 -08:00
Christian Couder
77d67977ca config: add git_config_get_expiry() from gc.c
This function will be used in a following commit to get the expiration
time of the shared index files from the config, and it is generic
enough to be put in "config.c".

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:34:54 -08:00
Christian Couder
6a5e6f5e44 sha1_file: make check_and_freshen_file() non static
This function will be used in a commit soon, so let's make
it available globally.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:24:22 -08:00
Christian Couder
72dcb7b360 config: add git_config_get_max_percent_split_change()
This new function will be used in a following commit to get the
value of the "splitIndex.maxPercentChange" config variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:24:22 -08:00
Christian Couder
1f44b09b58 config: add git_config_get_split_index()
This new function will be used in a following commit to know
if we want to use the split index feature or not.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:24:21 -08:00
Ramsay Jones
34de5e4bb0 wrapper.c: remove unused git_mkstemp() function
The last caller of git_mkstemp() was removed in commit 6fec0a89
("verify_signed_buffer: use tempfile object", 16-06-2016). Since
the introduction of the 'tempfile' APIs, along with git_mkstemp_mode,
it is unlikely that new callers will materialize. Remove the dead
code.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-28 11:54:14 -08:00
Junio C Hamano
3ad8b5bf26 Merge branch 'mh/ref-remove-empty-directory'
Deletion of a branch "foo/bar" could remove .git/refs/heads/foo
once there no longer is any other branch whose name begins with
"foo/", but we didn't do so so far.  Now we do.

* mh/ref-remove-empty-directory: (23 commits)
  files_transaction_commit(): clean up empty directories
  try_remove_empty_parents(): teach to remove parents of reflogs, too
  try_remove_empty_parents(): don't trash argument contents
  try_remove_empty_parents(): rename parameter "name" -> "refname"
  delete_ref_loose(): inline function
  delete_ref_loose(): derive loose reference path from lock
  log_ref_write_1(): inline function
  log_ref_setup(): manage the name of the reflog file internally
  log_ref_write_1(): don't depend on logfile argument
  log_ref_setup(): pass the open file descriptor back to the caller
  log_ref_setup(): improve robustness against races
  log_ref_setup(): separate code for create vs non-create
  log_ref_write(): inline function
  rename_tmp_log(): improve error reporting
  rename_tmp_log(): use raceproof_create_file()
  lock_ref_sha1_basic(): use raceproof_create_file()
  lock_ref_sha1_basic(): inline constant
  raceproof_create_file(): new function
  safe_create_leading_directories(): set errno on SCLD_EXISTS
  safe_create_leading_directories_const(): preserve errno
  ...
2017-02-27 13:57:12 -08:00
Jeff King
48f8d9f732 parse_config_key: allow matching single-level config
The parse_config_key() function was introduced to make it
easier to match "section.subsection.key" variables. It also
handles the simpler "section.key", and the caller is
responsible for distinguishing the two from its
out-parameters.

Most callers who _only_ want "section.key" would just use a
strcmp(var, "section.key"), since there is no parsing
required. However, they may still use parse_config_key() if
their "section" variable isn't a constant (an example of
this is in parse_hide_refs_config).

Using the parse_config_key is a bit clunky, though:

  const char *subsection;
  int subsection_len;
  const char *key;

  if (!parse_config_key(var, section, &subsection, &subsection_len, &key) &&
      !subsection) {
	  /* matched! */
  }

Instead, let's treat a NULL subsection as an indication that
the caller does not expect one. That lets us write:

  const char *key;

  if (!parse_config_key(var, section, NULL, NULL, &key)) {
	  /* matched! */
  }

Existing callers should be unaffected, as passing a NULL
subsection would currently segfault.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-24 13:22:11 -08:00
brian m. carlson
76c1d9a096 Convert object iteration callbacks to struct object_id
Convert each_loose_object_fn and each_packed_object_fn to take a pointer
to struct object_id.  Update the various callbacks.  Convert several
40-based constants to use GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-22 10:12:15 -08:00
brian m. carlson
068f85e313 sha1_file: introduce an nth_packed_object_oid function
There are places in the code where we would like to provide a struct
object_id *, yet read the hash directly from the pack.  Provide an
nth_packed_object_oid function that is similar to the
nth_packed_object_sha1 function.

In order to avoid a potentially invalid cast, nth_packed_object_oid
provides a variable into which to store the value, which it returns on
success; on error, it returns NULL, as nth_packed_object_sha1 does.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-22 10:12:15 -08:00
brian m. carlson
605f430efb hex: introduce parse_oid_hex
Introduce a function, parse_oid_hex, which parses a hexadecimal object
ID and if successful, sets a pointer to just beyond the last character.
This allows for simpler, more robust parsing without needing to
hard-code integer values throughout the codebase.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-20 01:11:26 -08:00
Junio C Hamano
fafca0f72a Merge branch 'cw/log-updates-for-all-refs-really'
The "core.logAllRefUpdates" that used to be boolean has been
enhanced to take 'always' as well, to record ref updates to refs
other than the ones that are expected to be updated (i.e. branches,
remote-tracking branches and notes).

* cw/log-updates-for-all-refs-really:
  doc: add note about ignoring '--no-create-reflog'
  update-ref: add test cases for bare repository
  refs: add option core.logAllRefUpdates = always
  config: add markup to core.logAllRefUpdates doc
2017-02-03 11:25:19 -08:00
Junio C Hamano
5348021c67 Merge branch 'sb/submodule-recursive-absorb'
When a submodule "A", which has another submodule "B" nested within
it, is "absorbed" into the top-level superproject, the inner
submodule "B" used to be left in a strange state.  The logic to
adjust the .git pointers in these submodules has been corrected.

* sb/submodule-recursive-absorb:
  submodule absorbing: fix worktree/gitdir pointers recursively for non-moves
  cache.h: expose the dying procedure for reading gitlinks
  setup: add gentle version of resolve_git_dir
2017-02-03 11:25:18 -08:00
Junio C Hamano
6f1c08bdb7 Merge branch 'rs/absolute-pathdup'
Code cleanup.

* rs/absolute-pathdup:
  use absolute_pathdup()
  abspath: add absolute_pathdup()
2017-02-02 13:36:55 -08:00
Junio C Hamano
1ac2ec6dd8 Merge branch 'sb/in-core-index-doc' into maint
Documentation and in-code comments updates.

* sb/in-core-index-doc:
  documentation: retire unfinished documentation
  cache.h: document add_[file_]to_index
  cache.h: document remove_index_entry_at
  cache.h: document index_name_pos
2017-01-31 13:32:11 -08:00
Junio C Hamano
feaad0eec7 Merge branch 'sb/in-core-index-doc'
Documentation and in-code comments updates.

* sb/in-core-index-doc:
  documentation: retire unfinished documentation
  cache.h: document add_[file_]to_index
  cache.h: document remove_index_entry_at
  cache.h: document index_name_pos
2017-01-31 13:14:59 -08:00
Junio C Hamano
42ace93e41 Merge branch 'jk/loose-object-fsck'
"git fsck" inspects loose objects more carefully now.

* jk/loose-object-fsck:
  fsck: detect trailing garbage in all object types
  fsck: parse loose object paths directly
  sha1_file: add read_loose_object() function
  t1450: test fsck of packed objects
  sha1_file: fix error message for alternate objects
  t1450: refactor loose-object removal
2017-01-31 13:14:57 -08:00
Cornelius Weig
341fb28621 refs: add option core.logAllRefUpdates = always
When core.logallrefupdates is true, we only create a new reflog for refs
that are under certain well-known hierarchies. The reason is that we
know that some hierarchies (like refs/tags) are not meant to change, and
that unknown hierarchies might not want reflogs at all (e.g., a
hypothetical refs/foo might be meant to change often and drop old
history immediately).

However, sometimes it is useful to override this decision and simply log
for all refs, because the safety and audit trail is more important than
the performance implications of keeping the log around.

This patch introduces a new "always" mode for the core.logallrefupdates
option which will log updates to everything under refs/, regardless
where in the hierarchy it is (we still will not log things like
ORIG_HEAD and FETCH_HEAD, which are known to be transient).

Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-31 10:01:24 -08:00
René Scharfe
b1edb40f25 abspath: add absolute_pathdup()
Add a function that returns a buffer containing the absolute path of its
argument and a semantic patch for its intended use.  It avoids an extra
string copy to a static buffer.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-26 14:51:06 -08:00
Stefan Beller
5f29433f1c cache.h: expose the dying procedure for reading gitlinks
In a later patch we want to react to only a subset of errors, defaulting
the rest to die as usual. Separate the block that takes care of dying
into its own function so we have easy access to it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-26 11:00:58 -08:00
Stefan Beller
40d9632514 setup: add gentle version of resolve_git_dir
This follows a93bedada (setup: add gentle version of read_gitfile,
2015-06-09), and assumes the same reasoning. resolve_git_dir is unsuited
for speculative calls, so we want to use the gentle version to find out
about potential errors.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-26 11:00:24 -08:00
Junio C Hamano
a06b4c337c Merge branch 'bw/read-blob-data-does-not-modify-index-state'
Code clean-up.

* bw/read-blob-data-does-not-modify-index-state:
  index: improve constness for reading blob data
2017-01-23 15:59:19 -08:00
Stefan Beller
20cf41d021 cache.h: document add_[file_]to_index
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-19 12:18:06 -08:00
Stefan Beller
3bd72adff1 cache.h: document remove_index_entry_at
Do this by moving the existing documentation from
read-cache.c to cache.h.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-19 12:17:57 -08:00
Stefan Beller
12733e9dd3 cache.h: document index_name_pos
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-19 12:13:46 -08:00
Junio C Hamano
fe9ec8bdf6 Merge branch 'bw/pathspec-cleanup'
Code clean-up in the pathspec API.

* bw/pathspec-cleanup:
  pathspec: rename prefix_pathspec to init_pathspec_item
  pathspec: small readability changes
  pathspec: create strip submodule slash helpers
  pathspec: create parse_element_magic helper
  pathspec: create parse_long_magic function
  pathspec: create parse_short_magic function
  pathspec: factor global magic into its own function
  pathspec: simpler logic to prefix original pathspec elements
  pathspec: always show mnemonic and name in unsupported_magic
  pathspec: remove unused variable from unsupported_magic
  pathspec: copy and free owned memory
  pathspec: remove the deprecated get_pathspec function
  ls-tree: convert show_recursive to use the pathspec struct interface
  dir: convert fill_directory to use the pathspec struct interface
  dir: remove struct path_simplify
  mv: remove use of deprecated 'get_pathspec()'
2017-01-18 15:12:15 -08:00
Junio C Hamano
55d128ae06 Merge branch 'bw/grep-recurse-submodules'
"git grep" has been taught to optionally recurse into submodules.

* bw/grep-recurse-submodules:
  grep: search history of moved submodules
  grep: enable recurse-submodules to work on <tree> objects
  grep: optionally recurse into submodules
  grep: add submodules as a grep source type
  submodules: load gitmodules file from commit sha1
  submodules: add helper to determine if a submodule is initialized
  submodules: add helper to determine if a submodule is populated
  real_path: canonicalize directory separators in root parts
  real_path: have callers use real_pathdup and strbuf_realpath
  real_path: create real_pathdup
  real_path: convert real_path_internal to strbuf_realpath
  real_path: resolve symlinks by hand
2017-01-18 15:12:11 -08:00
Jeff King
f6371f9210 sha1_file: add read_loose_object() function
It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.

However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.

The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.

We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().

Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-15 15:59:03 -08:00
Brandon Williams
875425080d index: improve constness for reading blob data
Improve constness of the index_state parameter to the
'read_blob_data_from_index' function.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-11 13:35:13 -08:00
Junio C Hamano
02d0457eb4 Merge branch 'jc/git-open-cloexec'
The codeflow of setting NOATIME and CLOEXEC on file descriptors Git
opens has been simplified.
We may want to drop the tip one, but we'll see.

* jc/git-open-cloexec:
  sha1_file: stop opening files with O_NOATIME
  git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback
  git_open(): untangle possible NOATIME and CLOEXEC interactions
2017-01-10 15:24:26 -08:00
Brandon Williams
34305f7753 pathspec: remove the deprecated get_pathspec function
Now that all callers of the old 'get_pathspec' interface have been
migrated to use the new pathspec struct interface it can be removed
from the codebase.

Since there are no more users of the '_raw' field in the pathspec struct
it can also be removed.  This patch also removes the old functionality
of modifying the const char **argv array that was passed into
parse_pathspec.  Instead the constructed 'match' string (which is a
pathspec element with the prefix prepended) is only stored in its
corresponding pathspec_item entry.

Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-08 18:04:17 -08:00
Michael Haggerty
177978f56a raceproof_create_file(): new function
Add a function that tries to create a file and any containing
directories in a way that is robust against races with other processes
that might be cleaning up empty directories at the same time.

The actual file creation is done by a callback function, which, if it
fails, should set errno to EISDIR or ENOENT according to the convention
of open(). raceproof_create_file() detects such failures, and
respectively either tries to delete empty directories that might be in
the way of the file or tries to create the containing directories. Then
it retries the callback function.

This function is not yet used.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-07 19:30:09 -08:00
Michael Haggerty
204a047f23 safe_create_leading_directories(): set errno on SCLD_EXISTS
The exit path for SCLD_EXISTS wasn't setting errno, which some callers
use to generate error messages for the user. Fix the problem and
document that the function sets errno correctly to help avoid similar
regressions in the future.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-07 19:30:08 -08:00
Brandon Williams
9ebf689aad submodules: load gitmodules file from commit sha1
teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-22 11:47:33 -08:00
Brandon Williams
7241764076 real_path: create real_pathdup
Create real_pathdup which returns a caller owned string of the resolved
realpath based on the provide path.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-12 15:22:32 -08:00
Brandon Williams
a1ae48410d real_path: convert real_path_internal to strbuf_realpath
Change the name of real_path_internal to strbuf_realpath.  In addition
push the static strbuf up to its callers and instead take as a
parameter a pointer to a strbuf to use for the final result.

This change makes strbuf_realpath reentrant.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-12 15:22:32 -08:00
Junio C Hamano
8de7eeb54b compression: unify pack.compression configuration parsing
There are three codepaths that use a variable whose name is
pack_compression_level to affect how objects and deltas sent to a
packfile is compressed.  Unlike zlib_compression_level that controls
the loose object compression, however, this variable was static to
each of these codepaths.  Two of them read the pack.compression
configuration variable, using core.compression as the default, and
one of them also allowed overriding it from the command line.

The other codepath in bulk-checkin did not pay any attention to the
configuration.

Unify the configuration parsing to git_default_config(), where we
implement the parsing of core.loosecompression and core.compression
and make the former override the latter, by moving code to parse
pack.compression and also allow core.compression to give default to
this variable.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-15 21:16:22 -08:00
Junio C Hamano
b4d065df03 sha1_file: stop opening files with O_NOATIME
When we open object files, we try to do so with O_NOATIME.
This dates back to 144bde78e9 (Use O_NOATIME when opening
the sha1 files., 2005-04-23), which is an optimization to
avoid creating a bunch of dirty inodes when we're accessing
many objects.  But a few things have changed since then:

  1. In June 2005, git learned about packfiles, which means
     we would do a lot fewer atime updates (rather than one
     per object access, we'd generally get one per packfile).

  2. In late 2006, Linux learned about "relatime", which is
     generally the default on modern installs. So
     performance around atimes updates is a non-issue there
     these days.

     All the world isn't Linux, but as it turns out, Linux
     is the only platform to implement O_NOATIME in the
     first place.

So it's very unlikely that this code is helping anybody
these days.

Helped-by: Jeff King <peff@peff.net>
[jc: took idea and log message from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-02 19:34:41 -07:00
Junio C Hamano
906d6906fb Merge branch 'ls/git-open-cloexec'
Git generally does not explicitly close file descriptors that were
open in the parent process when spawning a child process, but most
of the time the child does not want to access them. As Windows does
not allow removing or renaming a file that has a file descriptor
open, a slow-to-exit child can even break the parent process by
holding onto them.  Use O_CLOEXEC flag to open files in various
codepaths.

* ls/git-open-cloexec:
  read-cache: make sure file handles are not inherited by child processes
  sha1_file: open window into packfiles with O_CLOEXEC
  sha1_file: rename git_open_noatime() to git_open()
2016-10-31 13:15:21 -07:00
Junio C Hamano
39000e8499 Merge branch 'jk/fetch-quick-tag-following' into maint
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.

* jk/fetch-quick-tag-following:
  fetch: use "quick" has_sha1_file for tag following
2016-10-28 09:01:17 -07:00
Junio C Hamano
1b8ac5ead5 git_open(): untangle possible NOATIME and CLOEXEC interactions
The way we structured the fallback/retry mechanism for opening with
O_NOATIME and O_CLOEXEC meant that if we failed due to lack of
support to open the file with O_NOATIME option (i.e. EINVAL), we
would still try to drop O_CLOEXEC first and retry, and then drop
O_NOATIME.  A platform on which O_NOATIME is defined in the header
without support from the kernel wouldn't have a chance to open with
O_CLOEXEC option due to this code structure.

Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter
is mostly about performance, while the former can affect correctness.

Instead use O_CLOEXEC to open the file, and then use fcntl(2) to set
O_NOATIME on the resulting file descriptor.  open(2) itself does not
cause atime to be updated according to Linus [*1*].

The helper to do the former can be usable in the codepath in
ce_compare_data() that was recently added to open a file descriptor
with O_CLOEXEC; use it while we are at it.

*1* <CA+55aFw83E+zOd+z5h-CA-3NhrLjVr-anL6pubrSWttYx3zu8g@mail.gmail.com>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-28 06:23:07 -07:00
Junio C Hamano
0d9c527d59 Merge branch 'jk/no-looking-at-dotgit-outside-repo'
Update "git diff --no-index" codepath not to try to peek into .git/
directory that happens to be under the current directory, when we
know we are operating outside any repository.

* jk/no-looking-at-dotgit-outside-repo:
  diff: handle sha1 abbreviations outside of repository
  diff_aligned_abbrev: use "struct oid"
  diff_unique_abbrev: rename to diff_aligned_abbrev
  find_unique_abbrev: use 4-buffer ring
  test-*-cache-tree: setup git dir
  read info/{attributes,exclude} only when in repository
2016-10-27 14:58:48 -07:00
Junio C Hamano
d7ae013a31 Merge branch 'jk/abbrev-auto'
Updates the way approximate count of total objects is computed
while attempting to come up with a unique abbreviated object name,
which in turn needs to estimate how many hexdigits are necessary to
ensure uniqueness.

* jk/abbrev-auto:
  find_unique_abbrev: move logic out of get_short_sha1()
2016-10-27 14:58:47 -07:00
Junio C Hamano
580d820ece Merge branch 'lt/abbrev-auto'
Allow the default abbreviation length, which has historically been
7, to scale as the repository grows.  The logic suggests to use 12
hexdigits for the Linux kernel, and 9 to 10 for Git itself.

* lt/abbrev-auto:
  abbrev: auto size the default abbreviation
  abbrev: prepare for new world order
  abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
2016-10-27 14:58:47 -07:00
Jeff King
ef2ed5013c find_unique_abbrev: use 4-buffer ring
Some code paths want to format multiple abbreviated sha1s in
the same output line. Because we use a single static buffer
for our return value, they have to either break their output
into several calls or allocate their own arrays and use
find_unique_abbrev_r().

Intead, let's mimic sha1_to_hex() and use a ring of several
buffers, so that the return value stays valid through
multiple calls. This shortens some of the callers, and makes
it harder to for them to make a silly mistake.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-26 13:30:51 -07:00
Junio C Hamano
9fcd14491d Merge branch 'jk/fetch-quick-tag-following'
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.

* jk/fetch-quick-tag-following:
  fetch: use "quick" has_sha1_file for tag following
2016-10-26 13:14:47 -07:00
Junio C Hamano
1c2b1f7018 Merge branch 'bw/ls-files-recurse-submodules'
"git ls-files" learned "--recurse-submodules" option that can be
used to get a listing of tracked files across submodules (i.e. this
only works with "--cached" option, not for listing untracked or
ignored files).  This would be a useful tool to sit on the upstream
side of a pipe that is read with xargs to work on all working tree
files from the top-level superproject.

* bw/ls-files-recurse-submodules:
  ls-files: add pathspec matching for submodules
  ls-files: pass through safe options for --recurse-submodules
  ls-files: optionally recurse into submodules
  git: make super-prefix option
2016-10-26 13:14:44 -07:00
Lars Schneider
a5436b5794 sha1_file: rename git_open_noatime() to git_open()
This function is meant to be used when reading from files in the
object store, and the original objective was to avoid smudging atime
of loose object files too often, hence its name.  Because we'll be
extending its role in the next commit to also arrange the file
descriptors they return auto-closed in the child processes, rename
it to lose "noatime" part that is too specific.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-25 10:59:13 -07:00
Junio C Hamano
dec040192f Merge branch 'jk/alt-odb-cleanup'
Codepaths involved in interacting alternate object store have
been cleaned up.

* jk/alt-odb-cleanup:
  alternates: use fspathcmp to detect duplicates
  sha1_file: always allow relative paths to alternates
  count-objects: report alternates via verbose mode
  fill_sha1_file: write into a strbuf
  alternates: store scratch buffer as strbuf
  fill_sha1_file: write "boring" characters
  alternates: use a separate scratch space
  alternates: encapsulate alt->base munging
  alternates: provide helper for allocating alternate
  alternates: provide helper for adding to alternates list
  link_alt_odb_entry: refactor string handling
  link_alt_odb_entry: handle normalize_path errors
  t5613: clarify "too deep" recursion tests
  t5613: do not chdir in main process
  t5613: whitespace/style cleanups
  t5613: use test_must_fail
  t5613: drop test_valid_repo function
  t5613: drop reachable_via function
2016-10-17 13:25:20 -07:00
Junio C Hamano
25ab004c53 Merge branch 'jk/quarantine-received-objects'
In order for the receiving end of "git push" to inspect the
received history and decide to reject the push, the objects sent
from the sending end need to be made available to the hook and
the mechanism for the connectivity check, and this was done
traditionally by storing the objects in the receiving repository
and letting "git gc" to expire it.  Instead, store the newly
received objects in a temporary area, and make them available by
reusing the alternate object store mechanism to them only while we
decide if we accept the check, and once we decide, either migrate
them to the repository or purge them immediately.

* jk/quarantine-received-objects:
  tmp-objdir: do not migrate files starting with '.'
  tmp-objdir: put quarantine information in the environment
  receive-pack: quarantine objects until pre-receive accepts
  tmp-objdir: introduce API for temporary object directories
  check_connected: accept an env argument
2016-10-17 13:25:20 -07:00
Jeff King
5827a03545 fetch: use "quick" has_sha1_file for tag following
When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.

This has gone unnoticed for several years because it
requires a fairly unique setup to matter:

  1. You need to have a lot of packs on the client side to
     make reprepare_packed_git() expensive (the most
     expensive part is finding duplicates in an unsorted
     list, which is currently quadratic).

  2. You need a large number of tag refs on the server side
     that are candidates for auto-following (i.e., that the
     client doesn't have). Each one triggers a re-read of
     the pack directory.

  3. Under normal circumstances, the client would
     auto-follow those tags and after one large fetch, (2)
     would no longer be true. But if those tags point to
     history which is disconnected from what the client
     otherwise fetches, then it will never auto-follow, and
     those candidates will impact it on every fetch.

So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).

This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.

Here are results from the included perf script, which sets
up a situation similar to the one described above:

Test            HEAD^               HEAD
----------------------------------------------------------
5550.4: fetch   11.21(10.42+0.78)   0.08(0.04+0.02) -99.3%

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-14 11:31:32 -07:00
Junio C Hamano
e6e24c94df Merge branch 'jk/pack-objects-optim-mru'
"git pack-objects" in a repository with many packfiles used to
spend a lot of time looking for/at objects in them; the accesses to
the packfiles are now optimized by checking the most-recently-used
packfile first.

* jk/pack-objects-optim-mru:
  pack-objects: use mru list when iterating over packs
  pack-objects: break delta cycles before delta-search phase
  sha1_file: make packed_object_info public
  provide an initializer for "struct object_info"
2016-10-10 14:03:47 -07:00
Jeff King
e34c2e010f tmp-objdir: put quarantine information in the environment
The presence of the GIT_QUARANTINE_PATH variable lets any
called programs know that they're operating in a temporary
object directory (and where that directory is).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:54:02 -07:00
Jeff King
a5b34d2152 alternates: provide helper for adding to alternates list
The submodule code wants to temporarily add an alternate
object store to our in-memory alt_odb list, but does it
manually. Let's provide a helper so it can reuse the code in
link_alt_odb_entry().

While we're adding our new add_to_alternates_memory(), let's
document add_to_alternates_file(), as the two are related.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King
38dbe5f078 alternates: store scratch buffer as strbuf
We pre-size the scratch buffer to hold a loose object
filename of the form "xx/yyyy...", which leads to allocation
code that is hard to verify. We have to use some magic
numbers during the initial allocation, and then writers must
blindly assume that the buffer is big enough. Using a strbuf
makes it more clear that we cannot overflow.

Unfortunately, we do still need some magic numbers to grow
our strbuf before calling fill_sha1_path(), but the strbuf
growth is much closer to the point of use. This makes it
easier to see that it's correct, and opens the possibility
of pushing it even further down if fill_sha1_path() learns
to work on strbufs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King
597f9134de alternates: use a separate scratch space
The alternate_object_database struct uses a single buffer
both for storing the path to the alternate, and as a scratch
buffer for forming object names. This is efficient (since
otherwise we'd end up storing the path twice), but it makes
life hard for callers who just want to know the path to the
alternate. They have to remember to stop reading after
"alt->name - alt->base" bytes, and to subtract one for the
trailing '/'.

It would be much simpler if they could simply access a
NUL-terminated path string. We could encapsulate this in a
function which puts a NUL in the scratch buffer and returns
the string, but that opens up questions about the lifetime
of the result. The first time another caller uses the
alternate, the scratch buffer may get other data tacked onto
it.

Let's instead just store the root path separately from the
scratch buffer. There aren't enough alternates being stored
for the duplicated data to matter for performance, and this
keeps things simple and safe for the callers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Jeff King
7f0fa2c02a alternates: provide helper for allocating alternate
Allocating a struct alternate_object_database is tricky, as
we must over-allocate the buffer to provide scratch space,
and then put in particular '/' and NUL markers.

Let's encapsulate this in a function so that the complexity
doesn't leak into callers (and so that we can modify it
later).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 13:52:36 -07:00
Brandon Williams
74866d7579 git: make super-prefix option
Add a super-prefix environment variable 'GIT_INTERNAL_SUPER_PREFIX'
which can be used to specify a path from above a repository down to its
root.  When such a super-prefix is specified, the paths reported by Git
are prefixed with it to make them relative to that directory "above".
The paths given by the user on the command line
(e.g. "git subcmd --output-file=path/to/a/file" and pathspecs) are taken
relative to the directory "above" to match.

The immediate use of this option is by commands which have a
--recurse-submodule option in order to give context to submodules about
how they were invoked.  This option is currently only allowed for
builtins which support a super-prefix.

Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10 12:14:58 -07:00
Junio C Hamano
66c22ba6fb Merge branch 'jk/ambiguous-short-object-names'
When given an abbreviated object name that is not (or more
realistically, "no longer") unique, we gave a fatal error
"ambiguous argument".  This error is now accompanied by hints that
lists the objects that begins with the given prefix.  During the
course of development of this new feature, numerous minor bugs were
uncovered and corrected, the most notable one of which is that we
gave "short SHA1 xxxx is ambiguous." twice without good reason.

* jk/ambiguous-short-object-names:
  get_short_sha1: make default disambiguation configurable
  get_short_sha1: list ambiguous objects on error
  for_each_abbrev: drop duplicate objects
  sha1_array: let callbacks interrupt iteration
  get_short_sha1: mark ambiguity error for translation
  get_short_sha1: NUL-terminate hex prefix
  get_short_sha1: refactor init of disambiguation code
  get_short_sha1: parse tags when looking for treeish
  get_sha1: propagate flags to child functions
  get_sha1: avoid repeating ourselves via ONLY_TO_DIE
  get_sha1: detect buggy calls with multiple disambiguators
2016-10-06 14:53:10 -07:00
Jeff King
8e3f52d778 find_unique_abbrev: move logic out of get_short_sha1()
The get_short_sha1() is only about reading short sha1s; we
do call it in a loop to check "is this long enough" for each
object, but otherwise it should not need to know about
things like our default_abbrev setting.

So instead of asking it to set default_automatic_abbrev as a
side-effect, let's just have find_unique_abbrev() pick the
right place to start its loop.  This requires a separate
approximate_object_count() function, but that naturally
belongs with the rest of sha1_file.c.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 21:03:14 -07:00
Junio C Hamano
53eb85e623 Merge branch 'nd/init-core-worktree-in-multi-worktree-world'
"git init" tried to record core.worktree in the repository's
'config' file when GIT_WORK_TREE environment variable was set and
it was different from where GIT_DIR appears as ".git" at its top,
but the logic was faulty when .git is a "gitdir:" file that points
at the real place, causing trouble in working trees that are
managed by "git worktree".  This has been corrected.

* nd/init-core-worktree-in-multi-worktree-world:
  init: kill git_link variable
  init: do not set unnecessary core.worktree
  init: kill set_git_dir_init()
  init: call set_git_dir_init() from within init_db()
  init: correct re-initialization from a linked worktree
2016-10-03 13:30:35 -07:00
Linus Torvalds
e6c587c733 abbrev: auto size the default abbreviation
In fairly early days we somehow decided to abbreviate object names
down to 7-hexdigits, but as projects grow, it is becoming more and
more likely to see such a short object names made in earlier days
and recorded in the log messages no longer unique.

Currently the Linux kernel project needs 11 to 12 hexdigits, while
Git itself needs 10 hexdigits to uniquely identify the objects they
have, while many smaller projects may still be fine with the
original 7-hexdigit default.  One-size does not fit all projects.

Introduce a mechanism, where we estimate the number of objects in
the repository upon the first request to abbreviate an object name
with the default setting and come up with a sane default for the
repository.  Based on the expectation that we would see collision in
a repository with 2^(2N) objects when using object names shortened
to first N bits, use sufficient number of hexdigits to cover the
number of objects in the repository.  Each hexdigit (4-bits) we add
to the shortened name allows us to have four times (2-bits) as many
objects in the repository.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 12:54:29 -07:00
Junio C Hamano
65acfeacaa abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
We'll be introducing a new way to decide the default abbreviation
length by initialising DEFAULT_ABBREV to -1 to signal the first call
to "find unique abbreviation" codepath to compute a reasonable value
based on the number of objects we have to avoid collisions.

We have long relied on DEFAULT_ABBREV being a positive concrete
value that is used as the abbreviation length when no extra
configuration or command line option has overridden it.  Some
codepaths wants to use such a positive concrete default value
even before making their first request to actually trigger the
computation for the auto sized default.

Introduce FALLBACK_DEFAULT_ABBREV and use it to the code that
attempts to align the report from "git fetch".  For now, this
macro is also used to initialize the default_abbrev variable,
but the auto-sizing code will use -1 and then use the value of
FALLBACK_DEFAULT_ABBREV as the starting point of auto-sizing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-03 12:54:21 -07:00
Junio C Hamano
36f64036f6 Merge branch 'tg/add-chmod+x-fix' into maint
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.

* tg/add-chmod+x-fix:
  t3700-add: do not check working tree file mode without POSIXPERM
  t3700-add: create subdirectory gently
  add: modify already added files when --chmod is given
  read-cache: introduce chmod_index_entry
  update-index: add test for chmod flags
2016-09-29 16:49:47 -07:00
Jeff King
5b33cb1fd7 get_short_sha1: make default disambiguation configurable
When we find ambiguous short sha1s, we may get a
disambiguation rule from our caller's context. But if we
don't, we fall back to treating all sha1s the same, even
though most projects will tend to refer only to commits by
their short sha1s.

This patch introduces a configuration option that lets the
user pick a different fallback (e.g., only commits). It's
possible that we may want to make this the default, but it's
a good idea to start as a config option for two reasons:

  1. It lets people experiment with this and see if it's a
     good idea (i.e., the "tend to" above is an assumption;
     we don't really know if this will break some obscure
     cases).

  2. Even if we do flip the default, it gives people an
     escape hatch if it causes problems (you can sometimes
     override it by asking for "1234^{tree}", but not all
     combinations are possible).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 10:29:56 -07:00
Junio C Hamano
e683f17e63 Merge branch 'rs/checkout-init-macro'
Code cleanup.

* rs/checkout-init-macro:
  introduce CHECKOUT_INIT
2016-09-26 16:09:21 -07:00
Junio C Hamano
ebc63580a1 Merge branch 'tg/add-chmod+x-fix'
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.

* tg/add-chmod+x-fix:
  t3700-add: do not check working tree file mode without POSIXPERM
  t3700-add: create subdirectory gently
  add: modify already added files when --chmod is given
  read-cache: introduce chmod_index_entry
  update-index: add test for chmod flags
2016-09-26 16:09:20 -07:00
Jeff King
259942f549 get_sha1: detect buggy calls with multiple disambiguators
The get_sha1() family of functions takes a flags field, but
some of the flags are mutually exclusive. In particular, we
can only handle one disambiguating function, and the flags
quietly override each other. Let's instead detect these as
programming bugs.

Technically some of the flags are supersets of the others,
so treating COMMITTISH|TREEISH as just COMMITTISH is not
wrong, but it's a good sign the caller is confused. And
certainly asking for BLOB|TREE does not work.

We can do the check easily with some bit-twiddling, and as a
bonus, the bit-mask of disambiguators will come in handy in
a future patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 11:21:28 -07:00
Nguyễn Thái Ngọc Duy
33158701e2 init: call set_git_dir_init() from within init_db()
The next commit requires that set_git_dir_init() must be called before
init_db(). Let's make sure nobody can do otherwise.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-25 16:32:35 -07:00
René Scharfe
68e3d6292f introduce CHECKOUT_INIT
Add a static initializer for struct checkout and use it throughout the
code base.  It's shorter, avoids a memset(3) call and makes sure the
base_dir member is initialized to a valid (empty) string.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-22 13:42:18 -07:00
Junio C Hamano
d845d727cb Merge branch 'jk/setup-sequence-update'
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour.  The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.

* jk/setup-sequence-update:
  t1007: factor out repeated setup
  init: reset cached config when entering new repo
  init: expand comments explaining config trickery
  config: only read .git/config from configured repos
  test-config: setup git directory
  t1302: use "git -C"
  pager: handle early config
  pager: use callbacks instead of configset
  pager: make pager_program a file-local static
  pager: stop loading git_default_config()
  pager: remove obsolete comment
  diff: always try to set up the repository
  diff: handle --no-index prefixes consistently
  diff: skip implicit no-index check when given --no-index
  patch-id: use RUN_SETUP_GENTLY
  hash-object: always try to set up the git repository
2016-09-21 15:15:24 -07:00
Junio C Hamano
c3befaeab9 Merge branch 'rs/hex2chr' into maint
Code cleanup.

* rs/hex2chr:
  introduce hex2chr() for converting two hexadecimal digits to a character
2016-09-19 13:51:43 -07:00
Junio C Hamano
4af9a7d344 Merge branch 'bc/object-id'
The "unsigned char sha1[20]" to "struct object_id" conversion
continues.  Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.

It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2").  Extra sets of eyes double-checking for
mismerges are highly appreciated.

* bc/object-id:
  builtin/reset: convert to use struct object_id
  builtin/commit-tree: convert to struct object_id
  builtin/am: convert to struct object_id
  refs: add an update_ref_oid function.
  sha1_name: convert get_sha1_mb to struct object_id
  builtin/update-index: convert file to struct object_id
  notes: convert init_notes to use struct object_id
  builtin/rm: convert to use struct object_id
  builtin/blame: convert file to use struct object_id
  Convert read_mmblob to take struct object_id.
  notes-merge: convert struct notes_merge_pair to struct object_id
  builtin/checkout: convert some static functions to struct object_id
  streaming: make stream_blob_to_fd take struct object_id
  builtin: convert textconv_object to use struct object_id
  builtin/cat-file: convert some static functions to struct object_id
  builtin/cat-file: convert struct expand_data to use struct object_id
  builtin/log: convert some static functions to use struct object_id
  builtin/blame: convert struct origin to use struct object_id
  builtin/apply: convert static functions to struct object_id
  cache: convert struct cache_entry to use struct object_id
2016-09-19 13:47:19 -07:00
Thomas Gummerer
610d55af0f add: modify already added files when --chmod is given
When the chmod option was added to git add, it was hooked up to the diff
machinery, meaning that it only works when the version in the index
differs from the version on disk.

As the option was supposed to mirror the chmod option in update-index,
which always changes the mode in the index, regardless of the status of
the file, make sure the option behaves the same way in git add.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
Thomas Gummerer
d9d7096662 read-cache: introduce chmod_index_entry
As there are chmod options for both add and update-index, introduce a
new chmod_index_entry function to do the work.  Use it in update-index,
while it will be used in add in the next patch.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
Jeff King
4543926ba8 init: reset cached config when entering new repo
After we copy the templates into place, we re-read the
config in case we copied in a default config file. But since
git_config() is backed by a cache these days, it's possible
that the call will not actually touch the filesystem at all;
we need to tell it that something has changed behind the
scenes.

Note that we also need to reset the shared_repository
config. At first glance, it seems like this should probably
just be folded into git_config_clear(). But unfortunately
that is not quite right. The shared repository value may
come from config, _or_ it may have been set manually. So
only the caller who knows whether or not they set it is the
one who can clear it (and indeed, if you _do_ put it into
git_config_clear(), then many tests fail, as we have to
clear the config cache any time we set a new config
variable).

There are three tests here. The first two actually pass
already, though it's largely luck: they just don't happen to
actually read any config before we enter the new repo.

But the third one does fail without this patch; we look at
core.sharedrepository while creating the directory, but need
to make sure the value from the template config overrides
it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King
b9605bc4f2 config: only read .git/config from configured repos
When git_config() runs, it looks in the system, user-wide,
and repo-level config files. It gets the latter by calling
git_pathdup(), which in turn calls get_git_dir(). If we
haven't set up the git repository yet, this may simply
return ".git", and we will look at ".git/config".  This
seems like it would be helpful (presumably we haven't set up
the repository yet, so it tries to find it), but it turns
out to be a bad idea for a few reasons:

  - it's not sufficient, and therefore hides bugs in a
    confusing way. Config will be respected if commands are
    run from the top-level of the working tree, but not from
    a subdirectory.

  - it's not always true that we haven't set up the
    repository _yet_; we may not want to do it at all. For
    instance, if you run "git init /some/path" from inside
    another repository, it should not load config from the
    existing repository.

  - there might be a path ".git/config", but it is not the
    actual repository we would find via setup_git_directory().
    This may happen, e.g., if you are storing a git
    repository inside another git repository, but have
    munged one of the files in such a way that the
    inner repository is not valid (e.g., by removing HEAD).

We have at least two bugs of the second type in git-init,
introduced by ae5f677 (lazily load core.sharedrepository,
2016-03-11). It causes init to use git_configset(), which
loads all of the config, including values from the current
repo (if any).  This shows up in two ways:

  1. If we happen to be in an existing repository directory,
     we'll read and respect core.sharedrepository from it,
     even though it should have no bearing on the new
     repository. A new test in t1301 covers this.

  2. Similarly, if we're in an existing repo that sets
     core.logallrefupdates, that will cause init to fail to
     set it in a newly created repository (because it thinks
     that the user's templates already did so). A new test
     in t0001 covers this.

We also need to adjust an existing test in t1302, which
gives another example of why this patch is an improvement.

That test creates an embedded repository with a bogus
core.repositoryformatversion of "99". It wants to make sure
that we actually stop at the bogus repo rather than
continuing upward to find the outer repo. So it checks that
"git config core.repositoryformatversion" returns 99. But
that only works because we blindly read ".git/config", even
though we _know_ we're in a repository whose vintage we do
not understand.

After this patch, we avoid reading config from the unknown
vintage repository at all, which is a safer choice.  But we
need to tweak the test, since core.repositoryformatversion
will not return 99; it will claim that it could not find the
variable at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King
c0c08897c4 pager: make pager_program a file-local static
This variable is only ever used by the routines in pager.c,
and other parts of the code should always use those routines
(like git_pager()) to make decisions about which pager to
use. Let's reduce its scope to prevent accidents.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Junio C Hamano
e4ec05ed93 Merge branch 'rs/hex2chr'
* rs/hex2chr:
  introduce hex2chr() for converting two hexadecimal digits to a character
2016-09-12 15:34:36 -07:00
Junio C Hamano
305d7f1339 Merge branch 'jk/diff-submodule-diff-inline'
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.

* jk/diff-submodule-diff-inline:
  diff: teach diff to display submodule difference with an inline diff
  submodule: refactor show_submodule_summary with helper function
  submodule: convert show_submodule_summary to use struct object_id *
  allow do_submodule_path to work even if submodule isn't checked out
  diff: prepare for additional submodule formats
  graph: add support for --line-prefix on all graph-aware output
  diff.c: remove output_prefix_length field
  cache: add empty_tree_oid object and helper function
2016-09-12 15:34:31 -07:00
Junio C Hamano
02c6c14d6c Merge branch 'sb/submodule-clone-rr'
"git clone --resurse-submodules --reference $path $URL" is a way to
reduce network transfer cost by borrowing objects in an existing
$path repository when cloning the superproject from $URL; it
learned to also peek into $path for presense of corresponding
repositories of submodules and borrow objects from there when able.

* sb/submodule-clone-rr:
  clone: recursive and reference option triggers submodule alternates
  clone: implement optional references
  clone: clarify option_reference as required
  clone: factor out checking for an alternate path
  submodule--helper update-clone: allow multiple references
  submodule--helper module-clone: allow multiple references
  t7408: merge short tests, factor out testing method
  t7408: modernize style
2016-09-08 21:49:50 -07:00
Junio C Hamano
f59c6e6ccb Merge branch 'jk/reflog-date' into maint
The reflog output format is documented better, and a new format
--date=unix to report the seconds-since-epoch (without timezone)
has been added.

* jk/reflog-date:
  date: clarify --date=raw description
  date: add "unix" format
  date: document and test "raw-local" mode
  doc/pretty-formats: explain shortening of %gd
  doc/pretty-formats: describe index/time formats for %gd
  doc/rev-list-options: explain "-g" output formats
  doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
2016-09-08 21:35:52 -07:00
Junio C Hamano
7f5885ad2a Merge branch 'jc/renormalize-merge-kill-safer-crlf' into maint
"git merge" with renormalization did not work well with
merge-recursive, due to "safer crlf" conversion kicking in when it
shouldn't.

* jc/renormalize-merge-kill-safer-crlf:
  merge: avoid "safer crlf" during recording of merge results
  convert: unify the "auto" handling of CRLF
2016-09-08 21:35:52 -07:00
brian m. carlson
151b2911c1 sha1_name: convert get_sha1_mb to struct object_id
All of the callers of this function use struct object_id, so rename it
to get_oid_mb and make it take struct object_id instead of
unsigned char *.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson
99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
René Scharfe
d23309733a introduce hex2chr() for converting two hexadecimal digits to a character
Add and use a helper function that decodes the char value of two
hexadecimal digits.  It returns a negative number on error, avoids
running over the end of the given string and doesn't shift negative
values.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 10:42:46 -07:00
Jacob Keller
99b43a61f2 allow do_submodule_path to work even if submodule isn't checked out
Currently, do_submodule_path will attempt locating the .git directory by
using read_gitfile on <path>/.git. If this fails it just assumes the
<path>/.git is actually a git directory.

This is good because it allows for handling submodules which were cloned
in a regular manner first before being added to the superproject.

Unfortunately this fails if the <path> is not actually checked out any
longer, such as by removing the directory.

Fix this by checking if the directory we found is actually a gitdir. In
the case it is not, attempt to lookup the submodule configuration and
find the name of where it is stored in the .git/modules/ directory of
the superproject.

If we can't locate the submodule configuration, this might occur because
for example a submodule gitlink was added but the corresponding
.gitmodules file was not properly updated.  A die() here would not be
pleasant to the users of submodule diff formats, so instead, modify
do_submodule_path() to return an error code:

 - git_pathdup_submodule() returns NULL when we fail to find a path.
 - strbuf_git_path_submodule() propagates the error code to the caller.

Modify the callers of these functions to check the error code and fail
properly. This ensures we don't attempt to use a bad path that doesn't
match the corresponding submodule.

Because this change fixes add_submodule_odb() to work even if the
submodule is not checked out, update the wording of the submodule log
diff format to correctly display that the submodule is "not initialized"
instead of "not checked out"

Add tests to ensure this change works as expected.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:10 -07:00