Commit graph

184 commits

Author SHA1 Message Date
Derrick Stolee fb0882648e cache-tree: clean up cache_tree_update()
Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present. This is preferrable to a
BUG() statement or returning with an error because future callers will
want to populate an empty cache-tree using this method.

Callers can also remove their conditional allocations of cache_tree.

Also drop local variables that can be found directly from the 'istate'
parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23 17:14:07 -08:00
Derrick Stolee a4b6d202ca cache-tree: speed up consecutive path comparisons
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:

  git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

  Benchmark #1: v2.30.0
    Time (mean ± σ):     854.5 ms ±  18.2 ms
    Range (min … max):   825.0 ms … 892.8 ms

  Benchmark #2: Previous change
    Time (mean ± σ):     833.2 ms ±  10.3 ms
    Range (min … max):   815.8 ms … 849.7 ms

  Benchmark #3: This change
    Time (mean ± σ):     815.5 ms ±  18.1 ms
    Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than
v2.30.0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:05:13 -08:00
René Scharfe 0b72536a0b cache-tree: use ce_namelen() instead of strlen()
Use the name length field of cache entries instead of calculating its
value anew.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:05:13 -08:00
Derrick Stolee 0e5c950267 cache-tree: trace regions for prime_cache_tree
Commands such as "git reset --hard" rebuild the in-memory representation
of the cache tree index extension by parsing tree objects starting at a
known root tree. The performance of this operation can vary widely
depending on the width and depth of the repository's working directory
structure. Measure the time in this operation using trace2 regions in
prime_cache_tree().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:04:32 -08:00
Derrick Stolee 4c3e18723c cache-tree: trace regions for I/O
As we write or read the cache tree index extension, it can be good to
isolate how much of the file I/O time is spent constructing this
in-memory tree from the existing index or writing it out again to the
new index file. Use trace2 regions to indicate that we are spending time
on this operation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:04:21 -08:00
Derrick Stolee fa7ca5d4fe cache-tree: use trace2 in cache_tree_update()
This matches a trace_performance_enter()/trace_performance_leave() pair
added by 0d1ed59 (unpack-trees: add performance tracing, 2018-08-18).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04 15:23:08 -08:00
Matheus Tavares 2dcde20e1c sha1-file: pass git_hash_algo to hash_object_file()
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Matheus Tavares a651946730 cache-tree: use given repo's hash_algo at verify_one()
verify_one() takes a struct repository argument but uses the_hash_algo
internally. Replace it with the provided repo's git_hash_algo, for
consistency. For now, this is mainly a cosmetic change, as all callers
of this function currently only pass the_repository to it.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Junio C Hamano 280bd44551 Merge branch 'en/merge-recursive-cleanup'
The merge-recursive machiery is one of the most complex parts of
the system that accumulated cruft over time.  This large series
cleans up the implementation quite a bit.

* en/merge-recursive-cleanup: (26 commits)
  merge-recursive: fix the fix to the diff3 common ancestor label
  merge-recursive: fix the diff3 common ancestor label for virtual commits
  merge-recursive: alphabetize include list
  merge-recursive: add sanity checks for relevant merge_options
  merge-recursive: rename MERGE_RECURSIVE_* to MERGE_VARIANT_*
  merge-recursive: split internal fields into a separate struct
  merge-recursive: avoid losing output and leaking memory holding that output
  merge-recursive: comment and reorder the merge_options fields
  merge-recursive: consolidate unnecessary fields in merge_options
  merge-recursive: move some definitions around to clean up the header
  merge-recursive: rename merge_options argument to opt in header
  merge-recursive: rename 'mrtree' to 'result_tree', for clarity
  merge-recursive: use common name for ancestors/common/base_list
  merge-recursive: fix some overly long lines
  cache-tree: share code between functions writing an index as a tree
  merge-recursive: don't force external callers to do our logging
  merge-recursive: remove useless parameter in merge_trees()
  merge-recursive: exit early if index != head
  Ensure index matches head before invoking merge machinery, round N
  merge-recursive: remove another implicit dependency on the_repository
  ...
2019-10-15 13:47:59 +09:00
Junio C Hamano ae203ba414 Merge branch 'jt/cache-tree-avoid-lazy-fetch-during-merge'
The cache-tree code has been taught to be less aggressive in
attempting to see if a tree object it computed already exists in
the repository.

* jt/cache-tree-avoid-lazy-fetch-during-merge:
  cache-tree: do not lazy-fetch tentative tree
2019-10-07 11:32:58 +09:00
Junio C Hamano b9ac6c59b8 Merge branch 'cc/multi-promisor'
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.

* cc/multi-promisor:
  Move core_partial_clone_filter_default to promisor-remote.c
  Move repository_format_partial_clone to promisor-remote.c
  Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
  remote: add promisor and partial clone config to the doc
  partial-clone: add multiple remotes in the doc
  t0410: test fetching from many promisor remotes
  builtin/fetch: remove unique promisor remote limitation
  promisor-remote: parse remote.*.partialclonefilter
  Use promisor_remote_get_direct() and has_promisor_remote()
  promisor-remote: use repository_format_partial_clone
  promisor-remote: add promisor_remote_reinit()
  promisor-remote: implement promisor_remote_get_direct()
  Add initial support for many promisor remotes
  fetch-object: make functions return an error code
  t0410: remove pipes after git commands
2019-09-18 11:50:09 -07:00
Jonathan Tan f981ec18cf cache-tree: do not lazy-fetch tentative tree
The cache-tree datastructure is used to speed up the comparison
between the HEAD and the index, and when the index is updated by
a cherry-pick (for example), a tree object that would represent
the paths in the index in a directory is constructed in-core, to
see if such a tree object exists already in the object store.

When the lazy-fetch mechanism was introduced, we converted this
"does the tree exist?" check into an "if it does not, and if we
lazily cloned, see if the remote has it" call by mistake.  Since
the whole point of this check is to repair the cache-tree by
recording an already existing tree object opportunistically, we
shouldn't even try to fetch one from the remote.

Pass the OBJECT_INFO_SKIP_FETCH_OBJECT flag to make sure we only
check for existence in the local object store without triggering the
lazy fetch mechanism.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
[jc: rewritten the proposed log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-09 14:07:35 -07:00
Junio C Hamano 1b01cdbf2e Merge branch 'jk/tree-walk-overflow'
Codepaths to walk tree objects have been audited for integer
overflows and hardened.

* jk/tree-walk-overflow:
  tree-walk: harden make_traverse_path() length computations
  tree-walk: add a strbuf wrapper for make_traverse_path()
  tree-walk: accept a raw length for traverse_path_len()
  tree-walk: use size_t consistently
  tree-walk: drop oid from traverse_info
  setup_traverse_info(): stop copying oid
2019-08-22 12:34:10 -07:00
Elijah Newren 724dd767b2 cache-tree: share code between functions writing an index as a tree
write_tree_from_memory() appeared to be a merge-recursive special that
basically duplicated write_index_as_tree().  The two have a different
signature, but the bigger difference was just that write_index_as_tree()
would always unconditionally read the index off of disk instead of
working on the current in-memory index.  So:

  * split out common code into write_index_as_tree_internal()

  * rename write_tree_from_memory() to write_inmemory_index_as_tree(),
    make it call write_index_as_tree_internal(), and move it to
    cache-tree.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 10:08:03 -07:00
Jeff King 9055384710 tree-walk: drop oid from traverse_info
As the previous commit shows, the presence of an oid in each level of
the traverse_info is confusing and ultimately not necessary. Let's drop
it to make it clear that it will not always be set (as well as convince
us that it's unused, and let the compiler catch any merges with other
branches that do add new uses).

Since the oid is part of name_entry, we'll actually stop embedding a
name_entry entirely, and instead just separately hold the pathname, its
length, and the mode.

This makes the resulting code slightly more verbose as we have to pass
those elements around individually. But it also makes it more clear what
each code path is going to use (and in most of the paths, we really only
care about the pathname itself).

A few of these conversions are noisier than they need to be, as they
also take the opportunity to rename "len" to "namelen" for clarity
(especially where we also have "pathlen" or "ce_len" alongside).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-31 13:34:25 -07:00
Christian Couder b14ed5adaf Use promisor_remote_get_direct() and has_promisor_remote()
Instead of using the repository_format_partial_clone global
and fetch_objects() directly, let's use has_promisor_remote()
and promisor_remote_get_direct().

This way all the configured promisor remotes will be taken
into account, not only the one specified by
extensions.partialClone.

Also when cloning or fetching using a partial clone filter,
remote.origin.promisor will be set to "true" instead of
setting extensions.partialClone to "origin". This makes it
possible to use many promisor remote just by fetching from
them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:37 -07:00
Jeff Hostetler e9b9cc56bb cache-tree/blame: avoid reusing the DEBUG constant
In MS Visual C, the `DEBUG` constant is set automatically whenever
compiling with debug information.

This is clearly not what was intended in `cache-tree.c` nor in
`builtin/blame.c`, so let's use a less ambiguous name there.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 14:03:05 -07:00
Junio C Hamano cba595ab1a Merge branch 'jk/loose-object-cache-oid'
Code clean-up.

* jk/loose-object-cache-oid:
  prefer "hash mismatch" to "sha1 mismatch"
  sha1-file: avoid "sha1 file" for generic use in messages
  sha1-file: prefer "loose object file" to "sha1 file" in messages
  sha1-file: drop has_sha1_file()
  convert has_sha1_file() callers to has_object_file()
  sha1-file: convert pass-through functions to object_id
  sha1-file: modernize loose header/stream functions
  sha1-file: modernize loose object file functions
  http: use struct object_id instead of bare sha1
  update comment references to sha1_object_info()
  sha1-file: fix outdated sha1 comment references
2019-02-06 22:05:27 -08:00
Junio C Hamano 371820d5f1 Merge branch 'bc/tree-walk-oid'
The code to walk tree objects has been taught that we may be
working with object names that are not computed with SHA-1.

* bc/tree-walk-oid:
  cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
  tree-walk: store object_id in a separate member
  match-trees: use hashcpy to splice trees
  match-trees: compute buffer offset correctly when splicing
  tree-walk: copy object ID before use
2019-01-29 12:47:56 -08:00
brian m. carlson ea82b2a085 tree-walk: store object_id in a separate member
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.

Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 09:57:41 -08:00
Junio C Hamano f2b6aa98be Merge branch 'nd/indentation-fix'
Code cleanup.

* nd/indentation-fix:
  Indent code with TABs
2019-01-14 15:29:32 -08:00
Jeff King 98374a07c9 convert has_sha1_file() callers to has_object_file()
The only remaining callers of has_sha1_file() actually have an object_id
already. They can use the "object" variant, rather than dereferencing
the hash themselves.

The code changes here were completely generated by the included
coccinelle patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08 09:41:06 -08:00
Nguyễn Thái Ngọc Duy ec36c42a63 Indent code with TABs
We indent with TABs and sometimes for fine alignment, TABs followed by
spaces, but never all spaces (unless the indentation is less than 8
columns). Indenting with spaces slips through in some places. Fix
them.

Imported code and compat/ are left alone on purpose. The former should
remain as close as upstream as possible. The latter pretty much has
separate maintainers, it's up to them to decide.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-09 12:37:32 +09:00
Nguyễn Thái Ngọc Duy c207e9e1f6 cache-tree.c: remove the_repository references
This case is more interesting than other boring "remove the_repo"
commits because while we need access to the object database, we cannot
simply use r->index because unpack-trees.c can operate on a temporary
index, not $GIT_DIR/index. Ideally we should be able to pass an object
database to lookup_tree() but that ship has sailed.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 14:50:06 +09:00
Junio C Hamano a08b1d62b0 Merge branch 'jt/cache-tree-allow-missing-object-in-partial-clone'
In a partial clone that will lazily be hydrated from the
originating repository, we generally want to avoid "does this
object exist (locally)?" on objects that we deliberately omitted
when we created the clone.  The cache-tree codepath (which is used
to write a tree object out of the index) however insisted that the
object exists, even for paths that are outside of the partial
checkout area.  The code has been updated to avoid such a check.

* jt/cache-tree-allow-missing-object-in-partial-clone:
  cache-tree: skip some blob checks in partial clone
2018-10-19 13:34:08 +09:00
Jonathan Tan 2f215ff10b cache-tree: skip some blob checks in partial clone
In a partial clone, whenever a sparse checkout occurs, the existence of
all blobs in the index is verified, whether they are included or
excluded by the .git/info/sparse-checkout specification. This
significantly degrades performance because a lazy fetch occurs whenever
the existence of a missing blob is checked.

This is because cache_tree_update() checks the existence of all objects
in the index, whether or not CE_SKIP_WORKTREE is set on them. Teach
cache_tree_update() to skip checking CE_SKIP_WORKTREE objects when the
repository is a partial clone. This improves performance for sparse
checkout and also other operations that use cache_tree_update().

Instead of completely removing the check, an argument could be made that
the check should instead be replaced by a check that the blob is
promised, but for performance reasons, I decided not to do this.
If the user needs to verify the repository, it can be done using fsck
(which will notify if a tree points to a missing and non-promised blob,
whether the blob is included or excluded by the sparse-checkout
specification).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-10 10:20:43 +09:00
Jeff King e43d2dcce1 more oideq/hasheq conversions
We added faster equality-comparison functions for hashes in
14438c4497 (introduce hasheq() and oideq(), 2018-08-28). A
few topics were in-flight at the time, and can now be
converted. This covers all spots found by "make coccicheck"
in master (the coccicheck results were tweaked by hand for
style).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 03:42:48 -07:00
Junio C Hamano 769af0fd9e Merge branch 'jk/cocci'
spatch transformation to replace boolean uses of !hashcmp() to
newly introduced oideq() is added, and applied, to regain
performance lost due to support of multiple hash algorithms.

* jk/cocci:
  show_dirstat: simplify same-content check
  read-cache: use oideq() in ce_compare functions
  convert hashmap comparison functions to oideq()
  convert "hashcmp() != 0" to "!hasheq()"
  convert "oidcmp() != 0" to "!oideq()"
  convert "hashcmp() == 0" to hasheq()
  convert "oidcmp() == 0" to oideq()
  introduce hasheq() and oideq()
  coccinelle: use <...> for function exclusion
2018-09-17 13:53:57 -07:00
Junio C Hamano 7e794d0a3f Merge branch 'nd/unpack-trees-with-cache-tree'
The unpack_trees() API used in checking out a branch and merging
walks one or more trees along with the index.  When the cache-tree
in the index tells us that we are walking a tree whose flattened
contents is known (i.e. matches a span in the index), as linearly
scanning a span in the index is much more efficient than having to
open tree objects recursively and listing their entries, the walk
can be optimized, which is done in this topic.

* nd/unpack-trees-with-cache-tree:
  Document update for nd/unpack-trees-with-cache-tree
  cache-tree: verify valid cache-tree in the test suite
  unpack-trees: add missing cache invalidation
  unpack-trees: reuse (still valid) cache-tree from src_index
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: add performance tracing
  trace.h: support nested performance tracing
2018-09-17 13:53:53 -07:00
Jeff King 4a7e27e957 convert "oidcmp() == 0" to oideq()
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.

The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).

This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.

I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 11:32:49 -07:00
Nguyễn Thái Ngọc Duy 4592e6080f cache-tree: verify valid cache-tree in the test suite
This makes sure that cache-tree is consistent with the index. The main
purpose is to catch potential problems by saving the index in
unpack_trees() but the line in write_index() would also help spot
missing invalidation in other code.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18 09:47:46 -07:00
Nguyễn Thái Ngọc Duy 0d1ed5963d unpack-trees: add performance tracing
We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on webkit.git, 275k
files on worktree)

    performance: 0.056651714 s:  read cache .git/index
    performance: 0.183101080 s:  preload index
    performance: 0.008584433 s:  refresh index
    performance: 0.633767589 s:   traverse_trees
    performance: 0.340265448 s:   check_updates
    performance: 0.381884638 s:   cache_tree_update
    performance: 1.401562947 s:  unpack_trees
    performance: 0.338687914 s:  write index, changed mask = 2e
    performance: 0.411927922 s:    traverse_trees
    performance: 0.000023335 s:    check_updates
    performance: 0.423697246 s:   unpack_trees
    performance: 0.423708360 s:  diff-index
    performance: 2.559524127 s: git command: git checkout -

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18 09:47:46 -07:00
Nguyễn Thái Ngọc Duy 07096c9696 cache-tree: wrap the_index based wrappers with #ifdef
This puts update_main_cache_tree() and write_cache_as_tree() in the
same group of "index compat" functions that assume the_index
implicitly, which should only be used within builtin/ or t/helper.

sequencer.c is also updated to not use these functions. As of now, no
files outside builtin/ use these functions anymore.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
Stefan Beller f86bcc7b2c tree: add repository argument to lookup_tree
Add a repository argument to allow the callers of lookup_tree
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Junio C Hamano b16b60f71b Merge branch 'sb/object-store-grafts' into sb/object-store-lookup
* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-06-29 10:43:28 -07:00
Stefan Beller cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
brian m. carlson a055493436 cache-tree: use is_empty_tree_oid
When comparing an object ID against that of the empty tree, use the
is_empty_tree_oid function to ensure that we abstract over the hash
algorithm properly.  In addition, this is more readable than a plain
oidcmp.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:52 +09:00
brian m. carlson 69d124255e cache: add a function to read an object ID from a buffer
In various places throughout the codebase, we need to read data into a
struct object_id from a pack or other unsigned char buffer.  Add an
inline function that does this based on the current hash algorithm in
use, and use it in several places.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:48 +09:00
brian m. carlson 6dcb462530 cache-tree: convert remnants to struct object_id
Convert the remaining portions of cache-tree.c to use struct object_id.
Convert several instances of 20 to use the_hash_algo instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:47 -07:00
brian m. carlson fc5cb99f67 cache-tree: convert write_*_as_tree to object_id
Convert write_index_as_tree and write_cache_as_tree to use struct
object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:47 -07:00
Junio C Hamano 8be8342b4c Merge branch 'po/object-id'
Conversion from uchar[20] to struct object_id continues.

* po/object-id:
  sha1_file: rename hash_sha1_file_literally
  sha1_file: convert write_loose_object to object_id
  sha1_file: convert force_object_loose to object_id
  sha1_file: convert write_sha1_file to object_id
  notes: convert write_notes_tree to object_id
  notes: convert combine_notes_* to object_id
  commit: convert commit_tree* to object_id
  match-trees: convert splice_tree to object_id
  cache: clear whole hash buffer with oidclr
  sha1_file: convert hash_sha1_file to object_id
  dir: convert struct sha1_stat to use object_id
  sha1_file: convert pretend_sha1_file to object_id
2018-02-15 14:55:43 -08:00
Junio C Hamano cbf0240f82 Merge branch 'sg/cocci-move-array'
Code clean-up.

* sg/cocci-move-array:
  Use MOVE_ARRAY
2018-02-13 13:39:13 -08:00
Junio C Hamano e75c862125 Merge branch 'tg/split-index-fixes'
The split-index mode had a few corner case bugs fixed.

* tg/split-index-fixes:
  travis: run tests with GIT_TEST_SPLIT_INDEX
  split-index: don't write cache tree with null oid entries
  read-cache: fix reading the shared index for other repos
2018-02-13 13:39:13 -08:00
Patryk Obara a09c985eae sha1_file: convert write_sha1_file to object_id
Convert the definition and declaration of write_sha1_file to
struct object_id and adjust usage of this function.

This commit also converts static function write_sha1_file_prepare, as it
is closely related.

Rename these functions to write_object_file and
write_object_file_prepare respectively.

Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents
wherever possible.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara f070faccc1 sha1_file: convert hash_sha1_file to object_id
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.

Rename this function to hash_object_file.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
SZEDER Gábor f919ffebed Use MOVE_ARRAY
Use the helper macro MOVE_ARRAY to move arrays.  This is shorter and
safer, as it automatically infers the size of elements.

Patch generated by Coccinelle and contrib/coccinelle/array.cocci in
Travis CI's static analysis build job.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-22 11:32:51 -08:00
Thomas Gummerer a125a22334 read-cache: fix reading the shared index for other repos
read_index_from() takes a path argument for the location of the index
file.  For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.

This works as long as an index in the_repository is read.  Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.

For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.

be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.

Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.

read_cache_from() defaults to using the gitdir of the_repository.  As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().

Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19 10:36:34 -08:00
Junio C Hamano 0b646bcac9 Merge branch 'ma/lockfile-fixes'
An earlier update made it possible to use an on-stack in-core
lockfile structure (as opposed to having to deliberately leak an
on-heap one).  Many codepaths have been updated to take advantage
of this new facility.

* ma/lockfile-fixes:
  read_cache: roll back lock in `update_index_if_able()`
  read-cache: leave lock in right state in `write_locked_index()`
  read-cache: drop explicit `CLOSE_LOCK`-flag
  cache.h: document `write_locked_index()`
  apply: remove `newfd` from `struct apply_state`
  apply: move lockfile into `apply_state`
  cache-tree: simplify locking logic
  checkout-index: simplify locking logic
  tempfile: fix documentation on `delete_tempfile()`
  lockfile: fix documentation on `close_lock_file_gently()`
  treewide: prefer lockfiles on the stack
  sha1_file: do not leak `lock_file`
2017-11-06 13:11:21 +09:00
Derrick Stolee 19716b21a4 cleanup: fix possible overflow errors in binary search
A common mistake when writing binary search is to allow possible
integer overflow by using the simple average:

	mid = (min + max) / 2;

Instead, use the overflow-safe version:

	mid = min + (max - min) / 2;

This translation is safe since the operation occurs inside a loop
conditioned on "min < max". The included changes were found using
the following git grep:

	git grep '/ *2;' '*.c'

Making this cleanup will prevent future review friction when a new
binary search is contructed based on existing code.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-10 08:57:24 +09:00
Martin Ågren 2954e5ec43 cache-tree: simplify locking logic
After we have taken the lock using `LOCK_DIE_ON_ERROR`, we know that
`newfd` is non-negative. So when we check for exactly that property
before calling `write_locked_index()`, the outcome is guaranteed.

If we write and commit successfully, we set `newfd = -1`, so that we can
later avoid calling `rollback_lock_file` on an already-committed lock.
But we might just as well unconditionally call `rollback_lock_file()` --
it will be a no-op if we have already committed.

All in all, we use `newfd` as a bool and the only benefit we get from it
is that we can avoid calling a no-op. Remove `newfd` so that we have one
variable less to reason about.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 10:07:18 +09:00
Jeff King bfffb48c5d stop leaking lock structs in some simple cases
Now that it's safe to declare a "struct lock_file" on the
stack, we can do so (and avoid an intentional leak). These
leaks were found by running t0000 and t0001 under valgrind
(though certainly other similar leaks exist and just don't
happen to be exercised by those tests).

Initializing the lock_file's inner tempfile with NULL is not
strictly necessary in these cases, but it's a good practice
to model.  It means that if we were to call a function like
rollback_lock_file() on a lock that was never taken in the
first place, it becomes a quiet noop (rather than undefined
behavior).

Likewise, it's always safe to rollback_lock_file() on a file
that has already been committed or deleted, since that
operation is a noop on an inactive lockfile (and that's why
the case in config.c can drop the "if (lock)" check as we
move away from using a pointer).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:54 +09:00
Jeff King c82c75b951 write_index_as_tree: cleanup tempfile on error
If we failed to write our new index file, we rollback our
lockfile to remove the temporary index. But if we fail
before we even get to the write step (because reading the
old index failed), we leave the lockfile in place, which
makes no sense.

In practice this hasn't been a big deal because failing at
write_index_as_tree() typically results in the whole program
exiting (and thus the tempfile handler kicking in and
cleaning up the files). But this function should
consistently take responsibility for the resources it
allocates.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:52 +09:00
René Scharfe f331ab9d4c use MOVE_ARRAY
Simplify the code for moving members inside of an array and make it more
robust by using the helper macro MOVE_ARRAY.  It calculates the size
based on the specified number of elements for us and supports NULL
pointers when that number is zero.  Raw memmove(3) calls with NULL can
cause the compiler to (over-eagerly) optimize out later NULL checks.

This patch was generated with contrib/coccinelle/array.cocci and spatch
(Coccinelle).

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-17 14:54:56 -07:00
Junio C Hamano 6b526ced6f Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (53 commits)
  object: convert parse_object* to take struct object_id
  tree: convert parse_tree_indirect to struct object_id
  sequencer: convert do_recursive_merge to struct object_id
  diff-lib: convert do_diff_cache to struct object_id
  builtin/ls-tree: convert to struct object_id
  merge: convert checkout_fast_forward to struct object_id
  sequencer: convert fast_forward_to to struct object_id
  builtin/ls-files: convert overlay_tree_on_cache to object_id
  builtin/read-tree: convert to struct object_id
  sha1_name: convert internals of peel_onion to object_id
  upload-pack: convert remaining parse_object callers to object_id
  revision: convert remaining parse_object callers to object_id
  revision: rename add_pending_sha1 to add_pending_oid
  http-push: convert process_ls_object and descendants to object_id
  refs/files-backend: convert many internals to struct object_id
  refs: convert struct ref_update to use struct object_id
  ref-filter: convert some static functions to struct object_id
  Convert struct ref_array_item to struct object_id
  Convert the verify_pack callback to struct object_id
  Convert lookup_tag to struct object_id
  ...
2017-05-29 12:34:43 +09:00
brian m. carlson 740ee055c6 Convert lookup_tree to struct object_id
Convert the lookup_tree function to take a pointer to struct object_id.

The commit was created with manual changes to tree.c, tree.h, and
object.c, plus the following semantic patch:

@@
@@
- lookup_tree(EMPTY_TREE_SHA1_BIN)
+ lookup_tree(&empty_tree_oid)

@@
expression E1;
@@
- lookup_tree(E1.hash)
+ lookup_tree(&E1)

@@
expression E1;
@@
- lookup_tree(E1->hash)
+ lookup_tree(E1)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson e0a9280404 Convert struct cache_tree to use struct object_id
Convert the sha1 member of struct cache_tree to struct object_id by
changing the definition and applying the following semantic patch, plus
the standard object_id transforms:

@@
struct cache_tree E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_tree *E1;
@@
- E1->sha1
+ E1->oid.hash

Fix up one reference to active_cache_tree which was not automatically
caught by Coccinelle.  These changes are prerequisites for converting
parse_object.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-02 10:46:41 +09:00
Jeff King a96d3cc3f6 cache-tree: reject entries with null sha1
We generally disallow null sha1s from entering the index,
due to 4337b5856 (do not write null sha1s to on-disk index,
2012-07-28). However, we loosened that in 83bd7437c
(write_index: optionally allow broken null sha1s,
2013-08-27) so that tools like filter-branch could be used
to repair broken history.

However, we should make sure that these broken entries do
not get propagated into new trees. For most entries, we'd
catch them with the missing-object check (since presumably
the null sha1 does not exist in our object database). But
gitlink entries do not need reachability, so we may blindly
copy the entry into a bogus tree.

This patch rejects all null sha1s (with the same "invalid
entry" message that missing objects get) when building trees
from the index. It does so even for non-gitlinks, and even
when "write-tree" is given the --missing-ok flag. The null
sha1 is a special sentinel value that is already rejected in
trees by fsck; whether the object exists or not, it is an
error to put it in a tree.

Note that for this to work, we must also avoid reusing an
existing cache-tree that contains the null sha1. This patch
does so by just refusing to write out any cache tree when
the index contains a null sha1. This is blunter than we need
to be; we could just reject the subtree that contains the
offending entry. But it's not worth the complexity. The
behavior is unchanged unless you have a broken index entry,
and even then we'd refuse the whole index write unless the
emergency GIT_ALLOW_NULL_SHA1 is in use. And even then the
end result is only a performance drop (any write-tree will
have to generate the whole cache-tree from scratch).

The tests bear some explanation.

The existing test in t7009 doesn't catch this problem,
because our index-filter runs "git rm --cached", which will
try to rewrite the updated index and barf on the bogus
entry. So we never even make it to write-tree.  The new test
there adds a noop index-filter, which does show the problem.

The new tests in t1601 are slightly redundant with what
filter-branch is doing under the hood in t7009. But as
they're much more direct, they're easier to reason about.
And should filter-branch ever change or go away, we'd want
to make sure that these plumbing commands behave sanely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23 18:21:59 -07:00
brian m. carlson 99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Nguyễn Thái Ngọc Duy 6d6a782fbf cache-tree: do not generate empty trees as a result of all i-t-a subentries
If a subdirectory contains nothing but i-t-a entries, we generate an
empty tree object and add it to its parent tree. Which is wrong. Such
a subdirectory should not be added.

Note that this has a cascading effect. If subdir 'a/b/c' contains
nothing but i-t-a entries, we ignore it. But then if 'a/b' contains
only (the non-existing) 'a/b/c', then we should ignore 'a/b' while
building 'a' too. And it goes all the way up to top directory.

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18 13:45:33 -07:00
Nguyễn Thái Ngọc Duy c041d54a74 cache-tree.c: fix i-t-a entry skipping directory updates sometimes
Commit 3cf773e (cache-tree: fix writing cache-tree when CE_REMOVE is
present - 2012-12-16) skips i-t-a entries when building trees objects
from the index. Unfortunately it may skip too much.

The code in question checks if an entry is an i-t-a one, then no tree
entry will be written. But it does not take into account that
directories can also be written with the same code. Suppose we have
this in the index.

    a-file
    subdir/file1
    subdir/file2
    subdir/file3
    the-last-file

We write an entry for a-file as normal and move on to subdir/file1,
where we realize the entry name for this level is simply just
"subdir", write down an entry for "subdir" then jump three items ahead
to the-last-file.

That is what happens normally when the first file in subdir is not an
i-t-a entry. If subdir/file1 is an i-t-a, because of the broken
condition in this code, we still think "subdir" is an i-t-a file and
not writing "subdir" down and jump to the-last-file. The result tree
now only has two items: a-file and the-last-file. subdir should be
there too (even though it only records two sub-entries, file2 and
file3).

If the i-t-a entry is subdir/file2 or subdir/file3, this is not a
problem because we jump over them anyway. Which may explain why the
bug is hidden for nearly four years.

Fix it by making sure we only skip i-t-a entries when the entry in
question is actual an index entry, not a directory.

Reported-by: Yuri Kanivetsky <yuri.kanivetsky@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18 13:45:33 -07:00
brian m. carlson 7d924c9139 struct name_entry: use struct object_id instead of unsigned char sha1[20]
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:23:42 -07:00
Junio C Hamano 11529ecec9 Merge branch 'jk/tighten-alloc'
Update various codepaths to avoid manually-counted malloc().

* jk/tighten-alloc: (22 commits)
  ewah: convert to REALLOC_ARRAY, etc
  convert ewah/bitmap code to use xmalloc
  diff_populate_gitlink: use a strbuf
  transport_anonymize_url: use xstrfmt
  git-compat-util: drop mempcpy compat code
  sequencer: simplify memory allocation of get_message
  test-path-utils: fix normalize_path_copy output buffer size
  fetch-pack: simplify add_sought_entry
  fast-import: simplify allocation in start_packfile
  write_untracked_extension: use FLEX_ALLOC helper
  prepare_{git,shell}_cmd: use argv_array
  use st_add and st_mult for allocation size computation
  convert trivial cases to FLEX_ARRAY macros
  use xmallocz to avoid size arithmetic
  convert trivial cases to ALLOC_ARRAY
  convert manual allocations to argv_array
  argv-array: add detach function
  add helpers for allocating flex-array structs
  harden REALLOC_ARRAY and xcalloc against size_t overflow
  tree-diff: catch integer overflow in combine_diff_path allocation
  ...
2016-02-26 13:37:16 -08:00
Jeff King 96ffc06f72 convert trivial cases to FLEX_ARRAY macros
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Junio C Hamano cc14ea8cf4 Merge branch 'nd/ita-cleanup'
Paths that have been told the index about with "add -N" are not
quite yet in the index, but a few commands behaved as if they
already are in a harmful way.

* nd/ita-cleanup:
  grep: make it clear i-t-a entries are ignored
  add and use a convenience macro ce_intent_to_add()
  blame: remove obsolete comment
2016-01-20 11:43:25 -08:00
brian m. carlson ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson 7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Nguyễn Thái Ngọc Duy 895ff3b2c7 add and use a convenience macro ce_intent_to_add()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-06 20:01:13 -07:00
Paul Tan d23a5117f8 cache-tree: introduce write_index_as_tree()
A caller may wish to write a temporary index as a tree. However,
write_cache_as_tree() assumes that the index was read from, and will
write to, the default index file path. Introduce write_index_as_tree()
which removes this limitation by allowing the caller to specify its own
index_state and index file path.

Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-04 22:02:11 -07:00
Junio C Hamano e44da1bbb8 Merge branch 'jk/cache-tree-protect-from-broken-libgit2'
The code to use cache-tree trusted the on-disk data too much
and fell into an infinite loop.

* jk/cache-tree-protect-from-broken-libgit2:
  cache-tree: avoid infinite loop on zero-entry tree
2014-11-06 10:51:35 -08:00
Jeff King 729dbbd9fc cache-tree: avoid infinite loop on zero-entry tree
The loop in cache-tree's update_one iterates over all the
entries in the index. For each one, we find the cache-tree
subtree which represents our path (creating it if
necessary), and then recurse into update_one again. The
return value we get is the number of index entries that
belonged in that subtree. So for example, with entries:

    a/one
    a/two
    b/one

We start by processing the first entry, "a/one".  We would
find the subtree for "a" and recurse into update_one. That
would then handle "a/one" and "a/two", and return the value
2. The parent function then skips past the 2 handled
entries, and we continue by processing "b/one".

If the recursed-into update_one ever returns 0, then we make
no forward progress in our loop. We would process "a/one"
over and over, infinitely.

This should not happen normally. Any subtree we create must
have at least one path in it (the one that we are
processing!). However, we may also reuse a cache-tree entry
we found in the on-disk index. For the same reason, this
should also never have zero entries. However, certain buggy
versions of libgit2 could produce such bogus cache-tree
records. The libgit2 bug has since been fixed, but it does
not hurt to protect ourselves against bogus input coming
from the on-disk data structures.

Note that this is not a die("BUG") or assert, because it is
not an internal bug, but rather a corrupted on-disk
structure. It's possible that we could even recover from it
(by throwing out the bogus cache-tree entry), but it is not
worth the effort; the important thing is that we report an
error instead of looping infinitely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-30 11:17:51 -07:00
Michael Haggerty 697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Junio C Hamano 3fd13cbcd5 Merge branch 'dt/cache-tree-repair'
Add a few more places in "commit" and "checkout" that make sure
that the cache-tree is fully populated in the index.

* dt/cache-tree-repair:
  cache-tree: do not try to use an invalidated subtree info to build a tree
  cache-tree: Write updated cache-tree after commit
  cache-tree: subdirectory tests
  test-dump-cache-tree: invalid trees are not errors
  cache-tree: create/update cache-tree on checkout
2014-09-11 10:33:32 -07:00
Junio C Hamano 4ed115e9c5 cache-tree: do not try to use an invalidated subtree info to build a tree
We punt from repairing the cache-tree during a branch switching if
it involves having to create a new tree object that does not yet
exist in the object store.  "mkdir dir && >dir/file && git add dir"
followed by "git checkout" is one example, when a tree that records
the state of such "dir/" is not in the object store.

However, after discovering that we do not have a tree object that
records the state of "dir/", the caller failed to remember the fact
that it noticed the cache-tree entry it received for "dir/" is
invalidated, it already knows it should not be populating the level
that has "dir/" as its immediate subdirectory, and it is not an
error at all for the sublevel cache-tree entry gave it a bogus
object name it shouldn't even look at.

This led the caller to detect and report a non-existent error.  The
end result was the same and we avoided stuffing a non-existent tree
to the cache-tree, but we shouldn't have issued an alarming error
message to the user.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:21:33 -07:00
David Turner aecf567cbf cache-tree: create/update cache-tree on checkout
When git checkout checks out a branch, create or update the
cache-tree so that subsequent operations are faster.

update_main_cache_tree learned a new flag, WRITE_TREE_REPAIR.  When
WRITE_TREE_REPAIR is set, portions of the cache-tree which do not
correspond to existing tree objects are invalidated (and portions which
do are marked as valid).  No new tree objects are created.

Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 12:30:34 -07:00
Nguyễn Thái Ngọc Duy e6c286e8b2 cache-tree: mark istate->cache_changed on prime_cache_tree()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy d0cfc3e866 cache-tree: mark istate->cache_changed on cache tree update
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy a5400efe29 cache-tree: mark istate->cache_changed on cache tree invalidation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy 03b8664772 read-cache: new API write_locked_index instead of write_index/write_cache
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:10 -07:00
Junio C Hamano 6f75e48323 Merge branch 'rm/strchrnul-not-strlen'
* rm/strchrnul-not-strlen:
  use strchrnul() in place of strchr() and strlen()
2014-03-18 13:51:18 -07:00
Junio C Hamano da2e0579ad Merge branch 'mh/simplify-cache-tree-find'
* mh/simplify-cache-tree-find:
  cache_tree_find(): use path variable when passing over slashes
  cache_tree_find(): remove early return
  cache_tree_find(): remove redundant check
  cache_tree_find(): fix comment formatting
  cache_tree_find(): find the end of path component using strchrnul()
  cache_tree_find(): remove redundant checks
2014-03-18 13:51:02 -07:00
Rohit Mani 2c5495f7b6 use strchrnul() in place of strchr() and strlen()
Avoid scanning strings twice, once with strchr() and then with
strlen(), by using strchrnul().

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Mani <rohit.mani@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-10 08:35:30 -07:00
Michael Haggerty 3491047e14 cache_tree_find(): use path variable when passing over slashes
The search for the end of the slashes is part of the update of the
path variable for the next iteration as opposed to an update of the
slash variable.  So iterate using path rather than slash.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:34:26 -08:00
Michael Haggerty 8b7e5f7972 cache_tree_find(): remove early return
There is no need for an early

    return it;

from the loop if slash points at the end of the string, because that
is exactly what will happen when the while condition fails at the
start of the next iteration.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:34:05 -08:00
Michael Haggerty 03b0403b4a cache_tree_find(): remove redundant check
If *slash == '/', then it is necessarily non-NUL.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:33:53 -08:00
Michael Haggerty 79192b87ad cache_tree_find(): fix comment formatting
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:33:46 -08:00
Michael Haggerty 17e22ddc1c cache_tree_find(): find the end of path component using strchrnul()
Suggested-by: Junio Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:33:30 -08:00
Michael Haggerty 72c378d8a6 cache_tree_find(): remove redundant checks
slash is initialized to a value that cannot be NULL.  So remove the
guards against slash == NULL later in the loop.

Suggested-by: David Kastrup <dak@gnu.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:33:02 -08:00
Dmitry S. Dolzhenko bcc7a03285 cache-tree.c: use ALLOC_GROW() in find_subtree()
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 14:45:35 -08:00
Nguyễn Thái Ngọc Duy 9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Nguyễn Thái Ngọc Duy eec3e7e406 cache-tree: invalidate i-t-a paths after generating trees
Intent-to-add entries used to forbid writing trees so it was not a
problem. After commit 3f6d56d (commit: ignore intent-to-add entries
instead of refusing - 2012-02-07), we can generate trees from an index
with i-t-a entries.

However, the commit forgets to invalidate all paths leading to i-t-a
entries. With fully valid cache-tree (e.g. after commit or
write-tree), diff operations may prefer cache-tree to index and not
see i-t-a entries in the index, because cache-tree does not have them.

Reported-by: Jonathon Mah <me@JonathonMah.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:22 -08:00
Nguyễn Thái Ngọc Duy 3cf773e426 cache-tree: fix writing cache-tree when CE_REMOVE is present
entry_count is used in update_one() for two purposes:

1. to skip through the number of processed entries in in-memory index
2. to record the number of entries this cache-tree covers on disk

Unfortunately when CE_REMOVE is present these numbers are not the same
because CE_REMOVE entries are automatically removed before writing to
disk but entry_count is not adjusted and still counts CE_REMOVE
entries.

Separate the two use cases into two different variables. #1 is taken
care by the new field count in struct cache_tree_sub and entry_count
is prepared for #2.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:22 -08:00
Nguyễn Thái Ngọc Duy 386cc8b031 cache-tree: replace "for" loops in update_one with "while" loops
The loops in update_one can be increased in two different ways: step
by one for files and by <n> for directories. "for" loop is not
suitable for this as it always steps by one and special handling is
required for directories. Replace them with "while" loops for clarity.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:22 -08:00
Nguyễn Thái Ngọc Duy dbc3904ebc cache-tree: remove dead i-t-a code in verify_cache()
This code is added in 331fcb5 (git add --intent-to-add: do not let an
empty blob be committed by accident - 2008-11-28) to forbid committing
when i-t-a entries are present. When we allow that, we forgot to
remove this.

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:21 -08:00
Junio C Hamano a7844827da Merge branch 'nd/cache-tree-api-refactor'
* nd/cache-tree-api-refactor:
  cache-tree: update API to take abitrary flags
2012-02-12 22:42:42 -08:00
Junio C Hamano 44a1020d4d Merge branch 'jc/maint-commit-ignore-i-t-a'
* jc/maint-commit-ignore-i-t-a:
  commit: ignore intent-to-add entries instead of refusing

Conflicts:
	cache-tree.c
2012-02-12 22:42:10 -08:00
Nguyễn Thái Ngọc Duy e859c69b26 cache-tree: update API to take abitrary flags
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 16:35:43 -08:00
Junio C Hamano 3f6d56de5f commit: ignore intent-to-add entries instead of refusing
Originally, "git add -N" was introduced to help users from forgetting to
add new files to the index before they ran "git commit -a".  As an attempt
to help them further so that they do not forget to say "-a", "git commit"
to commit the index as-is was taught to error out, reminding the user that
they may have forgotten to add the final contents of the paths before
running the command.

This turned out to be a false "safety" that is useless.  If the user made
changes to already tracked paths and paths added with "git add -N", and
then ran "git add" to register the final contents of the paths added with
"git add -N", "git commit" will happily create a commit out of the index,
without including the local changes made to the already tracked paths. It
was not a useful "safety" measure to prevent "forgetful" mistakes from
happening.

It turns out that this behaviour is not just a useless false "safety", but
actively hurts use cases of "git add -N" that were discovered later and
have become popular, namely, to tell Git to be aware of these paths added
by "git add -N", so that commands like "git status" and "git diff" would
include them in their output, even though the user is not interested in
including them in the next commit they are going to make.

Fix this ancient UI mistake, and instead make a commit from the index
ignoring the paths added by "git add -N" without adding real contents.

Based on the work by Nguyễn Thái Ngọc Duy, and helped by injection of
sanity from Jonathan Nieder and others on the Git mailing list.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 12:14:40 -08:00
Thomas Rast 996277c520 Refactor cache_tree_update idiom from commit
We'll need to safely create or update the cache-tree data of the_index
from other places.  While at it, give it an argument that lets us
silence the messages produced by unmerged entries (which prevent it
from working).

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-06 14:57:36 -08:00
Elijah Newren e92fa514a9 cache_tree_free: Fix small memory leak
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-06 17:32:28 -07:00
Jonathan Nieder b6b56aceb8 write-tree: Avoid leak when index refers to an invalid object
Noticed by valgrind during test t0000.35 “writing this tree without
--missing-ok”.

Even in the cherry-pick foo..bar code path, such an error is the
end of the line.  But maybe some day an interactive porcelain will
want to link to libgit, making this matter.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-11 09:58:38 -07:00