1
0
mirror of https://github.com/git/git synced 2024-07-07 19:39:27 +00:00
Commit Graph

178 Commits

Author SHA1 Message Date
Jeff King
a96d3cc3f6 cache-tree: reject entries with null sha1
We generally disallow null sha1s from entering the index,
due to 4337b5856 (do not write null sha1s to on-disk index,
2012-07-28). However, we loosened that in 83bd7437c
(write_index: optionally allow broken null sha1s,
2013-08-27) so that tools like filter-branch could be used
to repair broken history.

However, we should make sure that these broken entries do
not get propagated into new trees. For most entries, we'd
catch them with the missing-object check (since presumably
the null sha1 does not exist in our object database). But
gitlink entries do not need reachability, so we may blindly
copy the entry into a bogus tree.

This patch rejects all null sha1s (with the same "invalid
entry" message that missing objects get) when building trees
from the index. It does so even for non-gitlinks, and even
when "write-tree" is given the --missing-ok flag. The null
sha1 is a special sentinel value that is already rejected in
trees by fsck; whether the object exists or not, it is an
error to put it in a tree.

Note that for this to work, we must also avoid reusing an
existing cache-tree that contains the null sha1. This patch
does so by just refusing to write out any cache tree when
the index contains a null sha1. This is blunter than we need
to be; we could just reject the subtree that contains the
offending entry. But it's not worth the complexity. The
behavior is unchanged unless you have a broken index entry,
and even then we'd refuse the whole index write unless the
emergency GIT_ALLOW_NULL_SHA1 is in use. And even then the
end result is only a performance drop (any write-tree will
have to generate the whole cache-tree from scratch).

The tests bear some explanation.

The existing test in t7009 doesn't catch this problem,
because our index-filter runs "git rm --cached", which will
try to rewrite the updated index and barf on the bogus
entry. So we never even make it to write-tree.  The new test
there adds a noop index-filter, which does show the problem.

The new tests in t1601 are slightly redundant with what
filter-branch is doing under the hood in t7009. But as
they're much more direct, they're easier to reason about.
And should filter-branch ever change or go away, we'd want
to make sure that these plumbing commands behave sanely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23 18:21:59 -07:00
brian m. carlson
99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Nguyễn Thái Ngọc Duy
6d6a782fbf cache-tree: do not generate empty trees as a result of all i-t-a subentries
If a subdirectory contains nothing but i-t-a entries, we generate an
empty tree object and add it to its parent tree. Which is wrong. Such
a subdirectory should not be added.

Note that this has a cascading effect. If subdir 'a/b/c' contains
nothing but i-t-a entries, we ignore it. But then if 'a/b' contains
only (the non-existing) 'a/b/c', then we should ignore 'a/b' while
building 'a' too. And it goes all the way up to top directory.

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18 13:45:33 -07:00
Nguyễn Thái Ngọc Duy
c041d54a74 cache-tree.c: fix i-t-a entry skipping directory updates sometimes
Commit 3cf773e (cache-tree: fix writing cache-tree when CE_REMOVE is
present - 2012-12-16) skips i-t-a entries when building trees objects
from the index. Unfortunately it may skip too much.

The code in question checks if an entry is an i-t-a one, then no tree
entry will be written. But it does not take into account that
directories can also be written with the same code. Suppose we have
this in the index.

    a-file
    subdir/file1
    subdir/file2
    subdir/file3
    the-last-file

We write an entry for a-file as normal and move on to subdir/file1,
where we realize the entry name for this level is simply just
"subdir", write down an entry for "subdir" then jump three items ahead
to the-last-file.

That is what happens normally when the first file in subdir is not an
i-t-a entry. If subdir/file1 is an i-t-a, because of the broken
condition in this code, we still think "subdir" is an i-t-a file and
not writing "subdir" down and jump to the-last-file. The result tree
now only has two items: a-file and the-last-file. subdir should be
there too (even though it only records two sub-entries, file2 and
file3).

If the i-t-a entry is subdir/file2 or subdir/file3, this is not a
problem because we jump over them anyway. Which may explain why the
bug is hidden for nearly four years.

Fix it by making sure we only skip i-t-a entries when the entry in
question is actual an index entry, not a directory.

Reported-by: Yuri Kanivetsky <yuri.kanivetsky@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18 13:45:33 -07:00
brian m. carlson
7d924c9139 struct name_entry: use struct object_id instead of unsigned char sha1[20]
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:23:42 -07:00
Junio C Hamano
11529ecec9 Merge branch 'jk/tighten-alloc'
Update various codepaths to avoid manually-counted malloc().

* jk/tighten-alloc: (22 commits)
  ewah: convert to REALLOC_ARRAY, etc
  convert ewah/bitmap code to use xmalloc
  diff_populate_gitlink: use a strbuf
  transport_anonymize_url: use xstrfmt
  git-compat-util: drop mempcpy compat code
  sequencer: simplify memory allocation of get_message
  test-path-utils: fix normalize_path_copy output buffer size
  fetch-pack: simplify add_sought_entry
  fast-import: simplify allocation in start_packfile
  write_untracked_extension: use FLEX_ALLOC helper
  prepare_{git,shell}_cmd: use argv_array
  use st_add and st_mult for allocation size computation
  convert trivial cases to FLEX_ARRAY macros
  use xmallocz to avoid size arithmetic
  convert trivial cases to ALLOC_ARRAY
  convert manual allocations to argv_array
  argv-array: add detach function
  add helpers for allocating flex-array structs
  harden REALLOC_ARRAY and xcalloc against size_t overflow
  tree-diff: catch integer overflow in combine_diff_path allocation
  ...
2016-02-26 13:37:16 -08:00
Jeff King
96ffc06f72 convert trivial cases to FLEX_ARRAY macros
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Junio C Hamano
cc14ea8cf4 Merge branch 'nd/ita-cleanup'
Paths that have been told the index about with "add -N" are not
quite yet in the index, but a few commands behaved as if they
already are in a harmful way.

* nd/ita-cleanup:
  grep: make it clear i-t-a entries are ignored
  add and use a convenience macro ce_intent_to_add()
  blame: remove obsolete comment
2016-01-20 11:43:25 -08:00
brian m. carlson
ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson
7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Nguyễn Thái Ngọc Duy
895ff3b2c7 add and use a convenience macro ce_intent_to_add()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-06 20:01:13 -07:00
Paul Tan
d23a5117f8 cache-tree: introduce write_index_as_tree()
A caller may wish to write a temporary index as a tree. However,
write_cache_as_tree() assumes that the index was read from, and will
write to, the default index file path. Introduce write_index_as_tree()
which removes this limitation by allowing the caller to specify its own
index_state and index file path.

Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-04 22:02:11 -07:00
Junio C Hamano
e44da1bbb8 Merge branch 'jk/cache-tree-protect-from-broken-libgit2'
The code to use cache-tree trusted the on-disk data too much
and fell into an infinite loop.

* jk/cache-tree-protect-from-broken-libgit2:
  cache-tree: avoid infinite loop on zero-entry tree
2014-11-06 10:51:35 -08:00
Jeff King
729dbbd9fc cache-tree: avoid infinite loop on zero-entry tree
The loop in cache-tree's update_one iterates over all the
entries in the index. For each one, we find the cache-tree
subtree which represents our path (creating it if
necessary), and then recurse into update_one again. The
return value we get is the number of index entries that
belonged in that subtree. So for example, with entries:

    a/one
    a/two
    b/one

We start by processing the first entry, "a/one".  We would
find the subtree for "a" and recurse into update_one. That
would then handle "a/one" and "a/two", and return the value
2. The parent function then skips past the 2 handled
entries, and we continue by processing "b/one".

If the recursed-into update_one ever returns 0, then we make
no forward progress in our loop. We would process "a/one"
over and over, infinitely.

This should not happen normally. Any subtree we create must
have at least one path in it (the one that we are
processing!). However, we may also reuse a cache-tree entry
we found in the on-disk index. For the same reason, this
should also never have zero entries. However, certain buggy
versions of libgit2 could produce such bogus cache-tree
records. The libgit2 bug has since been fixed, but it does
not hurt to protect ourselves against bogus input coming
from the on-disk data structures.

Note that this is not a die("BUG") or assert, because it is
not an internal bug, but rather a corrupted on-disk
structure. It's possible that we could even recover from it
(by throwing out the bogus cache-tree entry), but it is not
worth the effort; the important thing is that we report an
error instead of looping infinitely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-30 11:17:51 -07:00
Michael Haggerty
697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Junio C Hamano
3fd13cbcd5 Merge branch 'dt/cache-tree-repair'
Add a few more places in "commit" and "checkout" that make sure
that the cache-tree is fully populated in the index.

* dt/cache-tree-repair:
  cache-tree: do not try to use an invalidated subtree info to build a tree
  cache-tree: Write updated cache-tree after commit
  cache-tree: subdirectory tests
  test-dump-cache-tree: invalid trees are not errors
  cache-tree: create/update cache-tree on checkout
2014-09-11 10:33:32 -07:00
Junio C Hamano
4ed115e9c5 cache-tree: do not try to use an invalidated subtree info to build a tree
We punt from repairing the cache-tree during a branch switching if
it involves having to create a new tree object that does not yet
exist in the object store.  "mkdir dir && >dir/file && git add dir"
followed by "git checkout" is one example, when a tree that records
the state of such "dir/" is not in the object store.

However, after discovering that we do not have a tree object that
records the state of "dir/", the caller failed to remember the fact
that it noticed the cache-tree entry it received for "dir/" is
invalidated, it already knows it should not be populating the level
that has "dir/" as its immediate subdirectory, and it is not an
error at all for the sublevel cache-tree entry gave it a bogus
object name it shouldn't even look at.

This led the caller to detect and report a non-existent error.  The
end result was the same and we avoided stuffing a non-existent tree
to the cache-tree, but we shouldn't have issued an alarming error
message to the user.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:21:33 -07:00
David Turner
aecf567cbf cache-tree: create/update cache-tree on checkout
When git checkout checks out a branch, create or update the
cache-tree so that subsequent operations are faster.

update_main_cache_tree learned a new flag, WRITE_TREE_REPAIR.  When
WRITE_TREE_REPAIR is set, portions of the cache-tree which do not
correspond to existing tree objects are invalidated (and portions which
do are marked as valid).  No new tree objects are created.

Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 12:30:34 -07:00
Nguyễn Thái Ngọc Duy
e6c286e8b2 cache-tree: mark istate->cache_changed on prime_cache_tree()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
d0cfc3e866 cache-tree: mark istate->cache_changed on cache tree update
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
a5400efe29 cache-tree: mark istate->cache_changed on cache tree invalidation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
03b8664772 read-cache: new API write_locked_index instead of write_index/write_cache
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:10 -07:00
Junio C Hamano
6f75e48323 Merge branch 'rm/strchrnul-not-strlen'
* rm/strchrnul-not-strlen:
  use strchrnul() in place of strchr() and strlen()
2014-03-18 13:51:18 -07:00
Junio C Hamano
da2e0579ad Merge branch 'mh/simplify-cache-tree-find'
* mh/simplify-cache-tree-find:
  cache_tree_find(): use path variable when passing over slashes
  cache_tree_find(): remove early return
  cache_tree_find(): remove redundant check
  cache_tree_find(): fix comment formatting
  cache_tree_find(): find the end of path component using strchrnul()
  cache_tree_find(): remove redundant checks
2014-03-18 13:51:02 -07:00
Rohit Mani
2c5495f7b6 use strchrnul() in place of strchr() and strlen()
Avoid scanning strings twice, once with strchr() and then with
strlen(), by using strchrnul().

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Mani <rohit.mani@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-10 08:35:30 -07:00
Michael Haggerty
3491047e14 cache_tree_find(): use path variable when passing over slashes
The search for the end of the slashes is part of the update of the
path variable for the next iteration as opposed to an update of the
slash variable.  So iterate using path rather than slash.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:34:26 -08:00
Michael Haggerty
8b7e5f7972 cache_tree_find(): remove early return
There is no need for an early

    return it;

from the loop if slash points at the end of the string, because that
is exactly what will happen when the while condition fails at the
start of the next iteration.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:34:05 -08:00
Michael Haggerty
03b0403b4a cache_tree_find(): remove redundant check
If *slash == '/', then it is necessarily non-NUL.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:33:53 -08:00
Michael Haggerty
79192b87ad cache_tree_find(): fix comment formatting
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:33:46 -08:00
Michael Haggerty
17e22ddc1c cache_tree_find(): find the end of path component using strchrnul()
Suggested-by: Junio Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:33:30 -08:00
Michael Haggerty
72c378d8a6 cache_tree_find(): remove redundant checks
slash is initialized to a value that cannot be NULL.  So remove the
guards against slash == NULL later in the loop.

Suggested-by: David Kastrup <dak@gnu.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-05 12:33:02 -08:00
Dmitry S. Dolzhenko
bcc7a03285 cache-tree.c: use ALLOC_GROW() in find_subtree()
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 14:45:35 -08:00
Nguyễn Thái Ngọc Duy
9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Nguyễn Thái Ngọc Duy
eec3e7e406 cache-tree: invalidate i-t-a paths after generating trees
Intent-to-add entries used to forbid writing trees so it was not a
problem. After commit 3f6d56d (commit: ignore intent-to-add entries
instead of refusing - 2012-02-07), we can generate trees from an index
with i-t-a entries.

However, the commit forgets to invalidate all paths leading to i-t-a
entries. With fully valid cache-tree (e.g. after commit or
write-tree), diff operations may prefer cache-tree to index and not
see i-t-a entries in the index, because cache-tree does not have them.

Reported-by: Jonathon Mah <me@JonathonMah.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:22 -08:00
Nguyễn Thái Ngọc Duy
3cf773e426 cache-tree: fix writing cache-tree when CE_REMOVE is present
entry_count is used in update_one() for two purposes:

1. to skip through the number of processed entries in in-memory index
2. to record the number of entries this cache-tree covers on disk

Unfortunately when CE_REMOVE is present these numbers are not the same
because CE_REMOVE entries are automatically removed before writing to
disk but entry_count is not adjusted and still counts CE_REMOVE
entries.

Separate the two use cases into two different variables. #1 is taken
care by the new field count in struct cache_tree_sub and entry_count
is prepared for #2.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:22 -08:00
Nguyễn Thái Ngọc Duy
386cc8b031 cache-tree: replace "for" loops in update_one with "while" loops
The loops in update_one can be increased in two different ways: step
by one for files and by <n> for directories. "for" loop is not
suitable for this as it always steps by one and special handling is
required for directories. Replace them with "while" loops for clarity.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:22 -08:00
Nguyễn Thái Ngọc Duy
dbc3904ebc cache-tree: remove dead i-t-a code in verify_cache()
This code is added in 331fcb5 (git add --intent-to-add: do not let an
empty blob be committed by accident - 2008-11-28) to forbid committing
when i-t-a entries are present. When we allow that, we forgot to
remove this.

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:21 -08:00
Junio C Hamano
a7844827da Merge branch 'nd/cache-tree-api-refactor'
* nd/cache-tree-api-refactor:
  cache-tree: update API to take abitrary flags
2012-02-12 22:42:42 -08:00
Junio C Hamano
44a1020d4d Merge branch 'jc/maint-commit-ignore-i-t-a'
* jc/maint-commit-ignore-i-t-a:
  commit: ignore intent-to-add entries instead of refusing

Conflicts:
	cache-tree.c
2012-02-12 22:42:10 -08:00
Nguyễn Thái Ngọc Duy
e859c69b26 cache-tree: update API to take abitrary flags
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 16:35:43 -08:00
Junio C Hamano
3f6d56de5f commit: ignore intent-to-add entries instead of refusing
Originally, "git add -N" was introduced to help users from forgetting to
add new files to the index before they ran "git commit -a".  As an attempt
to help them further so that they do not forget to say "-a", "git commit"
to commit the index as-is was taught to error out, reminding the user that
they may have forgotten to add the final contents of the paths before
running the command.

This turned out to be a false "safety" that is useless.  If the user made
changes to already tracked paths and paths added with "git add -N", and
then ran "git add" to register the final contents of the paths added with
"git add -N", "git commit" will happily create a commit out of the index,
without including the local changes made to the already tracked paths. It
was not a useful "safety" measure to prevent "forgetful" mistakes from
happening.

It turns out that this behaviour is not just a useless false "safety", but
actively hurts use cases of "git add -N" that were discovered later and
have become popular, namely, to tell Git to be aware of these paths added
by "git add -N", so that commands like "git status" and "git diff" would
include them in their output, even though the user is not interested in
including them in the next commit they are going to make.

Fix this ancient UI mistake, and instead make a commit from the index
ignoring the paths added by "git add -N" without adding real contents.

Based on the work by Nguyễn Thái Ngọc Duy, and helped by injection of
sanity from Jonathan Nieder and others on the Git mailing list.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 12:14:40 -08:00
Thomas Rast
996277c520 Refactor cache_tree_update idiom from commit
We'll need to safely create or update the cache-tree data of the_index
from other places.  While at it, give it an argument that lets us
silence the messages produced by unmerged entries (which prevent it
from working).

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-06 14:57:36 -08:00
Elijah Newren
e92fa514a9 cache_tree_free: Fix small memory leak
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-06 17:32:28 -07:00
Jonathan Nieder
b6b56aceb8 write-tree: Avoid leak when index refers to an invalid object
Noticed by valgrind during test t0000.35 “writing this tree without
--missing-ok”.

Even in the cherry-pick foo..bar code path, such an error is the
end of the line.  But maybe some day an interactive porcelain will
want to link to libgit, making this matter.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-11 09:58:38 -07:00
Linus Torvalds
a38837341c Improve on the 'invalid object' error message at commit time
Not that anybody should ever get it, but somebody did (probably because
of a flaky filesystem, but whatever).  And each time I see an error
message that I haven't seen before, I decide that next time it will look
better.

So this makes us write more relevant information about exactly which
file ended up having issues with a missing object.  Which will tell
whether it was a tree object, for example, or just a regular file in the
index (and which one).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-14 13:50:33 -07:00
Junio C Hamano
b65982b608 Optimize "diff-index --cached" using cache-tree
When running "diff-index --cached" after making a change to only a small
portion of the index, there is no point unpacking unchanged subtrees into
the index recursively, only to find that all entries match anyway.  Tweak
unpack_trees() logic that is used to read in the tree object to catch the
case where the tree entry we are looking at matches the index as a whole
by looking at the cache-tree.

As an exercise, after modifying a few paths in the kernel tree, here are
a few numbers on my Athlon 64X2 3800+:

    (without patch, hot cache)
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+9407minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+2446minor)pagefaults 0swaps

Cold cache numbers are very impressive, but it does not matter very much
in practice:

    (without patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
    247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps

    (with patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
    18440inputs+0outputs (79major+2369minor)pagefaults 0swaps

This of course helps "git status" as well.

    (without patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+10970minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+3921minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-25 11:35:29 -07:00
Junio C Hamano
b87fc96476 cache-tree.c::cache_tree_find(): simplify internal API
Earlier cache_tree_find() needs to be called with a valid cache_tree,
but repeated look-up may find an invalid or missing cache_tree in between.
Help simplify the callers by returning NULL to mean "nothing appropriate
found" when the input is NULL.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-25 11:35:19 -07:00
Junio C Hamano
d11b8d3425 write-tree --ignore-cache-tree
This allows you to discard the cache-tree information before writing the
tree out of the index (i.e. it always recomputes the tree object names for
all the subtrees).

This is only useful as a debug option, so I did not bother documenting it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-20 11:07:07 -07:00
Junio C Hamano
b9d37a5420 Move prime_cache_tree() to cache-tree.c
The interface to build cache-tree belongs there.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-20 04:16:41 -07:00
Junio C Hamano
331fcb598e git add --intent-to-add: do not let an empty blob be committed by accident
Writing a tree out of an index with an "intent to add" entry is blocked.
This implies that you cannot "git commit" from such a state; however you
can still do "git commit -a" or "git commit $that_path".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-30 17:59:19 -08:00
Nanako Shiraishi
7ba04d9f37 cache-tree.c: make cache_tree_find() static
This function is not used by any other file.

Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-16 08:50:27 -07:00
Junio C Hamano
31c6390d40 Merge branch 'maint-1.5.4' into maint
* maint-1.5.4:
  t5516: remove ambiguity test (1)
  Linked glossary from cvs-migration page
  write-tree: properly detect failure to write tree objects
2008-04-24 21:50:48 -07:00
Junio C Hamano
edae5f0c20 write-tree: properly detect failure to write tree objects
Tomasz Fortuna reported that "git commit" does not error out properly when
it cannot write tree objects out.  "git write-tree" shares the same issue,
as the failure to notice the error is deep in the logic to write tree
objects out recursively.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-23 10:02:44 -07:00
Junio C Hamano
c21fdf3b60 Merge branch 'jc/error-message-in-cherry-pick'
* jc/error-message-in-cherry-pick:
  Make error messages from cherry-pick/revert more sensible
2008-02-11 16:46:27 -08:00
Junio C Hamano
45525bd022 Make error messages from cherry-pick/revert more sensible
The original "rewrite in C" did somewhat a sloppy job while
stealing code from git-write-tree.

The caller pretends as if the write_tree() function would return
an error code and being able to issue a sensible error message
itself, but write_tree() function just calls die() and never
returns an error.  Worse yet, the function claims that it was
running git-write-tree (which is no longer true after
cherry-pick stole it).

Tested-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-05 00:39:19 -08:00
Linus Torvalds
7a51ed66f6 Make on-disk index representation separate from in-core one
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.

In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.

This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-21 12:44:31 -08:00
Pierre Habouzit
1dffb8fa80 Small cache_tree_write refactor.
This function cannot fail, make it void. Also make write_one act on a
const char* instead of a char*.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-26 02:27:06 -07:00
Pierre Habouzit
ba3ed09728 Now that cache.h needs strbuf.h, remove useless includes.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 17:30:03 -07:00
Pierre Habouzit
f1696ee398 Strbuf API extensions and fixes.
* Add strbuf_rtrim to remove trailing spaces.
  * Add strbuf_insert to insert data at a given position.
  * Off-by one fix in strbuf_addf: strbuf_avail() does not counts the final
    \0 so the overflow test for snprintf is the strict comparison. This is
    not critical as the growth mechanism chosen will always allocate _more_
    memory than asked, so the second test will not fail. It's some kind of
    miracle though.
  * Add size extension hints for strbuf_init and strbuf_read. If 0, default
    applies, else:
      + initial buffer has the given size for strbuf_init.
      + first growth checks it has at least this size rather than the
        default 8192.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-10 12:48:24 -07:00
Pierre Habouzit
5242bcbb63 Use strbuf API in cache-tree.c
Should even be marginally faster.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-06 23:57:44 -07:00
Junio C Hamano
63daae47e5 Two trivial -Wcast-qual fixes
Luiz Fernando N. Capitulino noticed the one in tree-walk.h where
we cast away constness while computing the legnth of a tree
entry.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-22 23:19:43 -07:00
Martin Waitz
302b9282c9 rename dirlink to gitlink.
Unify naming of plumbing dirlink/gitlink concept:

git ls-files -z '*.[ch]' |
xargs -0 perl -pi -e 's/dirlink/gitlink/g;' -e 's/DIRLNK/GITLINK/g;'

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-21 23:34:54 -07:00
Linus Torvalds
f35a6d3bce Teach core object handling functions about gitlinks
This teaches the really fundamental core SHA1 object handling routines
about gitlinks.  We can compare trees with gitlinks in them (although we
can not actually generate patches for them yet - just raw git diffs),
and they show up as commits in "git ls-tree".

We also know to compare gitlinks as if they were directories (ie the
normal "sort as trees" rules apply).

[jc: amended a cut&paste error]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 13:50:43 -07:00
Johannes Sixt
3d12d0cfbb Catch errors when writing an index that contains invalid objects.
If git-write-index is called without --missing-ok, it reports invalid
objects that it finds in the index. But without this patch it dies
right away or may run into an infinite loop.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-13 14:26:51 -08:00
Junio C Hamano
4903161fb8 Surround "#define DEBUG 0" with "#ifndef DEBUG..#endif"
Otherwise "make CFLAGS=-DDEBUG=1" is cumbersome to run.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-30 15:29:53 -08:00
Rene Scharfe
abdc3fc842 Add hash_sha1_file()
Most callers of write_sha1_file_prepare() are only interested in the
resulting hash but don't care about the returned file name or the header.
This patch adds a simple wrapper named hash_sha1_file() which does just
that, and converts potential callers.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-14 11:49:52 -07:00
Shawn Pearce
e702496e43 Convert memcpy(a,b,20) to hashcpy(a,b).
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.

A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*.  This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.

[jc: Splitted the patch to "master" part, to be followed by a
 patch for merge-recursive.c which is not in "master" yet.

 Fixed the cast in the latter hunk to combine-diff.c which was
 wrong in the original.

 Also converted ones left-over in combine-diff.c, diff-lib.c and
 upload-pack.c ]

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23 13:53:10 -07:00
Junio C Hamano
b0121fb3f2 Merge branch 'jc/gitlink' into next
* jc/gitlink:
  write-tree: --prefix=<path>
  read-tree: --prefix=<path>/ option.
2006-05-07 16:17:43 -07:00
Junio C Hamano
00703e6d68 cache-tree: a bit more debugging support.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-03 16:10:45 -07:00
Junio C Hamano
6bd20358a9 write-tree: --prefix=<path>
The "bind" commit can express an aggregation of multiple
projects into a single commit.

In such an organization, there would be one project, root of
whose tree object is at the same level of the root of the
aggregated projects, and other projects have their toplevel in
separate subdirectories.  Let's call that root level project the
"primary project", and call other ones just "subprojects".

You would first read-tree the primary project, and then graft
the subprojects under their appropriate location using read-tree
--prefix=<subdir>/ repeatedly.

To write out a tree object from such an index for a subproject,
write-tree --prefix=<subdir>/ is used.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 22:29:16 -07:00
Johannes Schindelin
0111ea38cb cache-tree: replace a sscanf() by two strtol() calls
On one of my systems, sscanf() first calls strlen() on the buffer. But
this buffer is not terminated by NUL. So git crashed.

strtol() does not share that problem, as it stops reading after the
first non-digit.

[jc: original patch was wrong and did not read the cache-tree
 structure correctly; this has been fixed up and tested minimally
 with fsck-objects. ]

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 22:14:03 -07:00
Junio C Hamano
7bc70a590d cache-tree.c: typefix
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 22:48:27 -07:00
Junio C Hamano
2956dd3bd7 cache_tree_update: give an option to update cache-tree only.
When the extra "dryrun" parameter is true, cache_tree_update()
recomputes the invalid entry but does not actually creates
new tree object.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 16:21:54 -07:00
Junio C Hamano
7927a55d5b read-tree: teach 1-way merege and plain read to prime cache-tree.
This teaches read-tree to fully populate valid cache-tree when
reading a tree from scratch, or reading a single tree into an
existing index, reusing only the cached stat information (i.e.
one-way merge).  We have already taught update-index about cache-tree,
so "git checkout" followed by updates to a few path followed by
a "git commit" would become very efficient.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 01:33:07 -07:00
Junio C Hamano
61fa30972c cache-tree: sort the subtree entries.
Not that this makes practical performance difference; the kernel tree
for example has 200 or so directories that have subdirectory, and the
largest ones have 57 of them (fs and drivers).  With a test to apply
600 patches with git-apply and git-write-tree, this did not make more
than one per-cent of a difference, but it is a good cleanup.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-25 17:40:02 -07:00
Junio C Hamano
bad68ec924 index: make the index file format extensible.
... and move the cache-tree data into it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-24 21:24:13 -07:00
Junio C Hamano
dd0c34c46b cache-tree: protect against "git prune".
We reused the cache-tree data without verifying the tree object
still exists.  Recompute in cache_tree_update() an otherwise
valid cache-tree entry when the tree object disappeared.

This is not usually a problem, but theoretically without this
fix things can break when the user does something like this:

	- read-index from a side branch
	- write-tree the result
	- remove the side branch with "git branch -D"
	- remove the unreachable objects with "git prune"
	- write-tree what is in the index.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-24 15:12:42 -07:00
Junio C Hamano
749864627c Add cache-tree.
The cache_tree data structure is to cache tree object names that
would result from the current index file.

The idea is to have an optional file to record each tree object
name that corresponds to a directory path in the cache when we
run write_cache(), and read it back when we run read_cache().
During various index manupulations, we selectively invalidate
the parts so that the next write-tree can bypass regenerating
tree objects for unchanged parts of the directory hierarchy.

We could perhaps make the cache-tree data an optional part of
the index file, but that would involve the index format updates,
so unless we need it for performance reasons, the current plan
is to use a separate file, $GIT_DIR/index.aux to store this
information and link it with the index file with the checksum
that is already used for index file integrity check.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-23 20:18:16 -07:00