Commit graph

410 commits

Author SHA1 Message Date
Junio C Hamano 541dc4dfa0 Merge branch 'jk/write-broken-index-with-nul-sha1'
Earlier we started rejecting an attempt to add 0{40} object name to
the index and to tree objects, but it sometimes is necessary to
allow so to be able to use tools like filter-branch to correct such
broken tree objects.

* jk/write-broken-index-with-nul-sha1:
  write_index: optionally allow broken null sha1s
2013-09-17 11:40:27 -07:00
Eric Sunshine d28eec2673 name-hash: stop storing trailing '/' on paths in index_state.dir_hash
When 5102c617 (Add case insensitivity support for directories when using
git status, 2010-10-03) added directories to the name-hash there was
only a single hash table in which both real cache entries and leading
directory prefixes were registered. To distinguish between the two types
of entries, directories were stored with a trailing '/'.

2092678c (name-hash.c: fix endless loop with core.ignorecase=true,
2013-02-28), however, moved directories to a separate hash table
(index_state.dir_hash) but retained the (now) redundant trailing '/',
thus callers continue to bear the burden of ensuring the slash's
presence before searching the index for a directory. Eliminate this
redundancy by storing paths in the dir-hash without the trailing '/'.

An important benefit of this change is that it eliminates undocumented
and dangerous behavior of dir.c:directory_exists_in_index_icase() in
which it assumes not only that it can validly access one character
beyond the end of its incoming directory argument, but also that that
character will unconditionally be a '/'. This perilous behavior was
"tolerated" because the string passed in by its lone caller always had a
'/' in that position, however, things broke [1] when 2eac2a4c (ls-files
-k: a directory only can be killed if the index has a non-directory,
2013-08-15) added a new caller which failed to respect the undocumented
assumption.

[1]: http://thread.gmane.org/gmane.comp.version-control.git/232727

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 10:08:07 -07:00
Eric Sunshine ebbd7439b1 employ new explicit "exists in index?" API
Each caller of index_name_exists() knows whether it is looking for a
directory or a file, and can avoid the unnecessary indirection of
index_name_exists() by instead calling index_dir_exists() or
index_file_exists() directly.

Invoking the appropriate search function explicitly will allow a
subsequent patch to relieve callers of the artificial burden of having
to add a trailing '/' to the pathname given to index_dir_exists().

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 10:07:37 -07:00
Junio C Hamano b0d974d6d9 Merge branch 'tg/index-struct-sizes'
The code that reads from a region that mmaps an on-disk index
assumed that "int"/"short" are always 32/16 bits.

* tg/index-struct-sizes:
  read-cache: use fixed width integer types
2013-09-09 14:50:38 -07:00
Junio C Hamano b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Jeff King 83bd7437ca write_index: optionally allow broken null sha1s
Commit 4337b58 (do not write null sha1s to on-disk index,
2012-07-28) added a safety check preventing git from writing
null sha1s into the index. The intent was to catch errors in
other parts of the code that might let such an entry slip
into the index (or worse, a tree).

Some existing repositories may have invalid trees that
contain null sha1s already, though.  Until 4337b58, a common
way to clean this up would be to use git-filter-branch's
index-filter to repair such broken entries.  That now fails
when filter-branch tries to write out the index.

Introduce a GIT_ALLOW_NULL_SHA1 environment variable to
relax this check and make it easier to recover from such a
history.

It is tempting to not involve filter-branch in this commit
at all, and instead require the user to manually invoke

	GIT_ALLOW_NULL_SHA1=1 git filter-branch ...

to perform an index-filter on a history with trees with null
sha1s.  That would be slightly safer, but requires some
specialized knowledge from the user.  So let's set the
GIT_ALLOW_NULL_SHA1 variable automatically when checking out
the to-be-filtered trees.  Advice on using filter-branch to
remove such entries already exists on places like
stackoverflow, and this patch makes it Just Work again on
recent versions of git.

Further commands that touch the index will still notice and
fail, unless they actually remove the broken entries.  A
filter-branch whose filters do not touch the index at all
will not error out (since we complain of the null sha1 only
on writing, not when making a tree out of the index), but
this is acceptable, as we still print a loud warning, so the
problem is unlikely to go unnoticed.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 20:54:43 -07:00
Thomas Gummerer 7800c1ebcc read-cache: use fixed width integer types
Use the fixed width integer types uint16_t and uint32_t for on-disk
structures; unsigned short and unsigned int do not have a guaranteed
size.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-20 12:29:42 -07:00
Ondřej Bílka 98e023dea4 many small typofixes
Signed-off-by: Ondřej Bílka <neleai@seznam.cz>
Reviewed-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-29 12:32:25 -07:00
Junio C Hamano 65ed8684c4 Merge branch 'rs/discard-index-discard-array' into maint
* rs/discard-index-discard-array:
  read-cache: free cache in discard_index
  read-cache: add simple performance test
2013-07-19 10:39:01 -07:00
Nguyễn Thái Ngọc Duy 9b2d61499b convert refresh_index to take struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:08 -07:00
Nguyễn Thái Ngọc Duy 9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Junio C Hamano ac5611a1cc Merge branch 'fc/do-not-use-the-index-in-add-to-index' into maint
* fc/do-not-use-the-index-in-add-to-index:
  read-cache: trivial style cleanups
  read-cache: fix wrong 'the_index' usage
2013-07-03 15:40:38 -07:00
Junio C Hamano 079424a2cf Merge branch 'mh/ref-races'
"git pack-refs" that races with new ref creation or deletion have
been susceptible to lossage of refs under right conditions, which
has been tightened up.

* mh/ref-races:
  for_each_ref: load all loose refs before packed refs
  get_packed_ref_cache: reload packed-refs file when it changes
  add a stat_validity struct
  Extract a struct stat_data from cache_entry
  packed_ref_cache: increment refcount when locked
  do_for_each_entry(): increment the packed refs cache refcount
  refs: manage lifetime of packed refs cache via reference counting
  refs: implement simple transactions for the packed-refs file
  refs: wrap the packed refs cache in a level of indirection
  pack_refs(): split creation of packed refs and entry writing
  repack_without_ref(): split list curation and entry writing
2013-06-30 15:40:05 -07:00
Junio C Hamano 08bcd774f4 Merge branch 'rs/discard-index-discard-array'
* rs/discard-index-discard-array:
  read-cache: free cache in discard_index
  read-cache: add simple performance test
2013-06-20 16:02:30 -07:00
Michael Haggerty 3861253224 add a stat_validity struct
It can sometimes be useful to know whether a path in the
filesystem has been updated without going to the work of
opening and re-reading its content. We trust the stat()
information on disk already to handle index updates, and we
can use the same trick here.

This patch introduces a "stat_validity" struct which
encapsulates the concept of checking the stat-freshness of a
file. It is implemented on top of "struct stat_data" to
reuse the logic about which stat entries to trust for a
particular platform, but hides the complexity behind two
simple functions: check and update.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20 15:50:17 -07:00
Michael Haggerty c21d39d7c7 Extract a struct stat_data from cache_entry
Add public functions fill_stat_data() and match_stat_data() to work
with it.  This infrastructure will later be used to check the validity
of other types of file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20 15:50:17 -07:00
Junio C Hamano 6bf2227b92 Merge branch 'fc/do-not-use-the-index-in-add-to-index'
* fc/do-not-use-the-index-in-add-to-index:
  read-cache: trivial style cleanups
  read-cache: fix wrong 'the_index' usage
2013-06-11 13:30:28 -07:00
René Scharfe a0fc4db01d read-cache: free cache in discard_index
discard_cache doesn't have to free the array of cache entries, because
the next call of read_cache can simply reuse it, as they all operate on
the global variable the_index.

discard_index on the other hand does have to free it, because it can be
used e.g. with index_state variables on the stack, in which case a
missing free would cause an unrecoverable leak.  This patch releases the
memory and removes a comment that was relevant for discard_cache but has
become outdated.

Since discard_cache is just a wrapper around discard_index nowadays, we
lose the optimization that avoids reallocation of that array within
loops of read_cache and discard_cache.  That doesn't cause a performance
regression for me, however (HEAD = this patch, HEAD^ = master + p0002):

  Test           //              HEAD^             HEAD
  ---------------\\-----------------------------------------------------
  0002.1: read_ca// 1000 times   0.62(0.58+0.04)   0.61(0.58+0.02) -1.6%

Suggested-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-09 17:03:01 -07:00
Felipe Contreras c4aa3167fe read-cache: trivial style cleanups
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03 10:10:38 -07:00
Felipe Contreras 582eb8536b read-cache: fix wrong 'the_index' usage
We are dealing with the 'istate' index, not 'the_index'.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03 10:10:25 -07:00
René Scharfe 21a6b9fa42 read-cache: mark cache_entry pointers const
ie_match_stat and ie_modified only derefence their struct cache_entry
pointers for reading.  Add const to the parameter declaration here and
do the same for the static helper function used by them, as it's the
same there as well.  This allows callers to pass in const pointers.

Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02 15:31:12 -07:00
Junio C Hamano 4b35b007a6 Merge branch 'lf/read-blob-data-from-index'
Reduce duplicated code between convert.c and attr.c.

* lf/read-blob-data-from-index:
  convert.c: remove duplicate code
  read_blob_data_from_index(): optionally return the size of blob data
  attr.c: extract read_index_data() as read_blob_data_from_index()
2013-04-21 18:39:45 -07:00
Lukas Fleischer ff36682505 read_blob_data_from_index(): optionally return the size of blob data
This allows for optionally getting the size of the returned data and
will be used in a follow-up patch.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:51:47 -07:00
Lukas Fleischer 29fb37b272 attr.c: extract read_index_data() as read_blob_data_from_index()
Extract the read_index_data() function from attr.c and move it to
read-cache.c; rename it to read_blob_data_from_index() and update
the function signature of it to align better with index/cache API
functions.

This allows for reusing the function in convert.c later.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:49:11 -07:00
Junio C Hamano c81e2c61b3 Merge branch 'kb/name-hash' into maint-1.8.1
* kb/name-hash:
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-03 08:44:54 -07:00
Junio C Hamano c044bed8f0 Merge branch 'kb/name-hash'
The code to keep track of what directory names are known to Git on
platforms with case insensitive filesystems can get confused upon
a hash collision between these pathnames and looped forever.

* kb/name-hash:
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-01 08:59:53 -07:00
Junio C Hamano 865e99b5fd Merge branch 'nd/doc-index-format'
Update the index format documentation to mention the v4 format.

* nd/doc-index-format:
  update-index: list supported idx versions and their features
  read-cache.c: use INDEX_FORMAT_{LB,UB} in verify_hdr()
  index-format.txt: mention of v4 is missing in some places
2013-03-19 12:15:14 -07:00
Karsten Blees 2092678cd5 name-hash.c: fix endless loop with core.ignorecase=true
With core.ignorecase=true, name-hash.c builds a case insensitive index of
all tracked directories. Currently, the existing cache entry structures are
added multiple times to the same hashtable (with different name lengths and
hash codes). However, there's only one dir_next pointer, which gets
completely messed up in case of hash collisions. In the worst case, this
causes an endless loop if ce == ce->dir_next (see t7062).

Use a separate hashtable and separate structures for the directory index
so that each directory entry has its own next pointer. Use reference
counting to track which directory entry contains files.

There are only slight changes to the name-hash.c API:
- new free_name_hash() used by read_cache.c::discard_index()
- remove_name_hash() takes an additional index_state parameter
- index_name_exists() for a directory (trailing '/') may return a cache
  entry that has been removed (CE_UNHASHED). This is not a problem as the
  return value is only used to check if the directory exists (dir.c) or to
  normalize casing of directory names (read-cache.c).

Getting rid of cache_entry.dir_next reduces memory consumption, especially
with core.ignorecase=false (which doesn't use that member at all).

With core.ignorecase=true, building the directory index is slightly faster
as we add / check the parent directory first (instead of going through all
directory levels for each file in the index). E.g. with WebKit (~200k
files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms
to 130ms.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-27 23:29:04 -08:00
Nguyễn Thái Ngọc Duy b82a7b5bbc read-cache.c: use INDEX_FORMAT_{LB,UB} in verify_hdr()
9d22778 (read-cache.c: write prefix-compressed names in the index -
2012-04-04) defined these. Interestingly, they were not used by
read-cache.c, or anywhere in that patch. They were used in
builtin/update-index.c later for checking supported index
versions. Use them here too.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-22 12:48:41 -08:00
Robin Rosenberg c08e4d5b5c Enable minimal stat checking
Specifically the fields uid, gid, ctime, ino and dev are set to zero
by JGit. Other implementations, eg. Git in cygwin are allegedly also
somewhat incompatible with Git For Windows and on *nix platforms
the resolution of the timestamps may differ.

Any stat checking by git will then need to check content, which may
be very slow, particularly on Windows. Since mtime and size
is typically enough we should allow the user to tell git to avoid
checking these fields if they are set to zero in the index.

This change introduces a core.checkstat config option where the
the user can select to check all fields (default), or just size
and the whole second part of mtime (minimal).

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-22 09:33:16 -08:00
Junio C Hamano 357e9c69c9 read-cache.c: mark a private file-scope symbol as static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-15 22:58:21 -07:00
Junio C Hamano 3b753148b6 Merge branch 'jk/maint-null-in-trees'
We do not want a link to 0{40} object stored anywhere in our objects.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-08-27 11:54:28 -07:00
Junio C Hamano d0ae7e2e71 Merge branch 'nd/index-errno'
Assignments to errno before calling system functions that used to
matter in the old code were left behind after the code structure
changed sufficiently to make them useless.

* nd/index-errno:
  read_index_from: remove bogus errno assignments
2012-08-22 11:51:42 -07:00
Nguyễn Thái Ngọc Duy 57d84f8d93 read_index_from: remove bogus errno assignments
These assignments comes from the very first commit e83c516 (Initial
revision of "git", the information manager from hell - 2005-04-07).
Back then we did not die() when errors happened so correct errno was
required.

Since 5d1a5c0 ([PATCH] Better error reporting for "git status" -
2005-10-01), read_index_from() learned to die rather than just return
-1 and these assignments became irrelevant. Remove them.

While at it, move die_errno() next to xmmap() call because it's the
mmap's error code that we care about. Otherwise if close(fd); fails,
it could overwrite mmap's errno.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-06 10:01:21 -07:00
Jeff King 4337b5856f do not write null sha1s to on-disk index
We should never need to write the null sha1 into an index
entry (short of the 1 in 2^160 chance that somebody actually
has content that hashes to it). If we attempt to do so, it
is much more likely that it is a bug, since we use the null
sha1 as a sentinel value to mean "not valid".

The presence of null sha1s in the index (which can come
from, among other things, "update-index --cacheinfo", or by
reading a corrupted tree) can cause problems for later
readers, because they cannot distinguish the literal null
sha1 from its use a sentinel value.  For example, "git
diff-files" on such an entry would make it appear as if it
is stat-dirty, and until recently, the diff code assumed
such an entry meant that we should be diffing a working tree
file rather than a blob.

Ideally, we would stop such entries from entering even our
in-core index. However, we do sometimes legitimately add
entries with null sha1s in order to represent these sentinel
situations; simply forbidding them in add_index_entry breaks
a lot of the existing code. However, we can at least make
sure that our in-core sentinel representation never makes it
to disk.

To be thorough, we will test an attempt to add both a blob
and a submodule entry. In the former case, we might run into
problems anyway because we will be missing the blob object.
But in the latter case, we do not enforce connectivity
across gitlink entries, making this our only point of
enforcement. The current implementation does not care which
type of entry we are seeing, but testing both cases helps
future-proof the test suite in case that changes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29 15:13:36 -07:00
Junio C Hamano 30ea575876 Merge branch 'tg/ce-namelen-field'
Split lower bits of ce_flags field and creates a new ce_namelen
field in the in-core index structure.

* tg/ce-namelen-field:
  Strip namelen out of ce_flags into a ce_namelen field
2012-07-23 20:55:21 -07:00
Junio C Hamano 8fc824f397 Merge branch 'tg/maint-cache-name-compare'
Even though the index can record pathnames longer than 1<<12 bytes,
in some places we were not comparing them in full, potentially
replacing index entries instead of adding.

* tg/maint-cache-name-compare:
  cache_name_compare(): do not truncate while comparing paths
2012-07-15 21:40:18 -07:00
Thomas Gummerer b60e188c51 Strip namelen out of ce_flags into a ce_namelen field
Strip the name length from the ce_flags field and move it
into its own ce_namelen field in struct cache_entry. This
will both give us a tiny bit of a performance enhancement
when working with long pathnames and is a refactoring for
more readability of the code.

It enhances readability, by making it more clear what
is a flag, and where the length is stored and make it clear
which functions use stages in comparisions and which only
use the length.

It also makes CE_NAMEMASK private, so that users don't
mistakenly write the name length in the flags.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-11 09:42:45 -07:00
Junio C Hamano 01388518c3 Merge branch 'tg/maint-cache-name-compare' into tg/ce-namelen-field
* tg/maint-cache-name-compare:
  cache_name_compare(): do not truncate while comparing paths
2012-07-11 09:40:25 -07:00
Junio C Hamano d5f53338ab cache_name_compare(): do not truncate while comparing paths
We failed to use ce_namelen() equivalent and instead only compared
up to the CE_NAMEMASK bytes by mistake.  Adding an overlong path
that shares the same common prefix as an existing entry in the index
did not add a new entry, but instead replaced the existing one, as
the result.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-11 09:25:56 -07:00
Thomas Gummerer 68c4f6a577 Replace strlen() with ce_namelen()
Replace strlen(ce->name) with ce_namelen() in a couple
of places which gives us some additional bits of
performance.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08 19:49:34 -07:00
Junio C Hamano d4a5d872c0 Merge branch 'jc/index-v4'
Trivially shrinks the on-disk size of the index file to save both I/O and
checksum overhead.

The topic should give a solid base to build on further updates, with the
code refactoring in its earlier parts, and the backward compatibility
mechanism in its later parts.

* jc/index-v4:
  index-v4: document the entry format
  unpack-trees: preserve the index file version of original
  update-index: upgrade/downgrade on-disk index version
  read-cache.c: write prefix-compressed names in the index
  read-cache.c: read prefix-compressed names in index on-disk version v4
  read-cache.c: move code to copy incore to ondisk cache to a helper function
  read-cache.c: move code to copy ondisk to incore cache to a helper function
  read-cache.c: report the header version we do not understand
  read-cache.c: make create_from_disk() report number of bytes it consumed
  read-cache.c: allow unaligned mapping of the index file
  cache.h: hide on-disk index details
  varint: make it available outside the context of pack
2012-05-02 13:51:13 -07:00
Junio C Hamano 9d227781b6 read-cache.c: write prefix-compressed names in the index
Teach the code to write the index in the v4 on-disk format.

Record the format version of the on-disk index we read from in the
index_state, and use the format when writing the new index out.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-04 09:57:49 -07:00
Junio C Hamano 6c9cd161d9 read-cache.c: read prefix-compressed names in index on-disk version v4
Because the entries are sorted by path, adjacent entries in the index tend
to share the leading components of them, and it makes sense to only store
the differences in later entries.  In the v4 on-disk format of the index,
each on-disk cache entry stores the number of bytes to be stripped from
the end of the previous name, and the bytes to append to the result, to
come up with its name.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano f136f7bfe8 read-cache.c: move code to copy incore to ondisk cache to a helper function
This makes the change in a later patch look less scary.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano 3fc22b5331 read-cache.c: move code to copy ondisk to incore cache to a helper function
This makes the change in a later patch look less scary.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano 0136bac9b8 read-cache.c: report the header version we do not understand
Instead of just saying "bad index version", report the value we read
from the disk.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Junio C Hamano 936f53d055 read-cache.c: make create_from_disk() report number of bytes it consumed
The function is the one that is reading from the data stream. It only is
natural to make it responsible for reporting this number, not the caller.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Junio C Hamano d60c49c2d7 read-cache.c: allow unaligned mapping of the index file
Both the on-disk format v2 and v3 pads the "name" field to the multiple of
eight to make sure that various quantities in network long/short type can
be accessed with ntohl/ntohs without having to worry about alignment, but
this forces us to waste disk I/O bandwidth.

Introduce ntoh_s()/ntoh_l() macros that the callers can use as if they were
the regular ntohs()/ntohl() on a field that may not be aligned correctly.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Junio C Hamano db3b313c84 cache.h: hide on-disk index details
The on-disk format of the index file is a detail whose implementation is
neatly encapsulated in read-cache.c; there is no need to expose it to the
general public that include the cache.h header file.

Also add a prominent mark to read-cache.c to delineate the parts that deal
with the index file I/O routines from the remainder of the file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Jeff King f8582cad8d make is_empty_blob_sha1 available everywhere
The read-cache implementation defines this static function,
but it is a generally useful concept in git. Let's give
the empty blob the same treatment as the empty tree,
providing both hex and binary forms of the sha1.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-23 13:52:13 -07:00
Junio C Hamano 3d1f148c33 refresh_index: do not show unmerged path that is outside pathspec
When running "git add --refresh <pathspec>", we incorrectly showed the
path that is unmerged even if it is outside the specified pathspec, even
though we did honor pathspec and refreshed only the paths that matched.

Note that this cange does not affect "git update-index --refresh"; for
hysterical raisins, it does not take a pathspec (it takes real paths) and
more importantly itss command line options are parsed and executed one by
one as they are encountered, so "git update-index --refresh foo" means
"first refresh the index, and then update the entry 'foo' by hashing the
contents in file 'foo'", not "refresh only entry 'foo'".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-17 10:11:05 -08:00
Junio C Hamano ef87690b27 Merge branch 'rs/allocate-cache-entry-individually'
* rs/allocate-cache-entry-individually:
  cache.h: put single NUL at end of struct cache_entry
  read-cache.c: allocate index entries individually

Conflicts:
	read-cache.c
2011-12-09 13:36:56 -08:00
Jeff King 73b7eae60c refresh_index: make porcelain output more specific
If you have a deleted file and a porcelain refreshes the
cache, we print:

  Unstaged changes after reset:
  M	file

This is technically correct, in that the file is modified,
but it's friendlier to the user if we further differentiate
the case of a deleted file (especially because this output
looks a lot like "diff --name-status", which would also make
the distinction).

Similarly, we can distinguish typechanges ("T") and
intent-to-add files ("A"), both of which appear as just "M"
in the current output.

The plumbing output for all cases remains "needs update" for
historical compatibility.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 11:55:58 -08:00
Jeff King 4bd4e73093 refresh_index: rename format variables
When refreshing the index, for modified (or unmerged) files we will print
"needs update" (or "needs merge") for plumbing, or line similar to the
output from "diff --name-status" for porcelain.

The variables holding which type of message to show are named after the
plumbing messages. However, as we begin to differentiate more cases at the
porcelain level (with the plumbing message staying the same), that naming
scheme will become awkward.

Instead, name the variables after which case we found (modified or
unmerged), not what we will output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 11:55:05 -08:00
Jeff King d05e697010 read-cache: let refresh_cache_ent pass up changed flags
This will enable refresh_cache to differentiate more cases
of modification (such as typechange) when telling the user
what isn't fresh.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 11:53:46 -08:00
René Scharfe debed2a629 read-cache.c: allocate index entries individually
The code to estimate the in-memory size of the index based on its on-disk
representation is subtly wrong for certain architecture-dependent struct
layouts.  Instead of fixing it, replace the code to keep the index entries
in a single large block of memory and allocate each entry separately
instead.  This is both simpler and more flexible, as individual entries
can now be freed.  Actually using that added flexibility is left for a
later patch.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-26 15:25:59 -07:00
René Scharfe 8f41c07f90 read-cache.c: fix index memory allocation
estimate_cache_size() tries to guess how much memory is needed for the
in-memory representation of an index file.  It does that by using the
file size, the number of entries and the difference of the sizes of the
on-disk and in-memory structs -- without having to check the length of
the name of each entry, which varies for each entry, but their sums are
the same no matter the representation.

Except there can be a difference.  First of all, the size is really
calculated by ce_size and ondisk_ce_size based on offsetof(..., name),
not sizeof, which can be different.  And entries are padded with 1 to 8
NULs at the end (after the variable name) to make their total length a
multiple of eight.

So in order to allocate enough memory to hold the index, change the
delta calculation to be based on offsetof(..., name) and round up to
the next multiple of eight.

On a 32-bit Linux, this delta was used before:

	sizeof(struct cache_entry)        == 72
	sizeof(struct ondisk_cache_entry) == 64
	                                    ---
	                                      8

The actual difference for an entry with a filename length of one was,
however (find the definitions are in cache.h):

	offsetof(struct cache_entry, name)        == 72
	offsetof(struct ondisk_cache_entry, name) == 62

	ce_size        == (72 + 1 + 8) & ~7 == 80
	ondisk_ce_size == (62 + 1 + 8) & ~7 == 64
	                                      ---
	                                       16

So eight bytes less had been allocated for such entries.  The new
formula yields the correct delta:

	(72 - 62 + 7) & ~7 == 16

Reported-by: John Hsing <tsyj2007@gmail.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-26 14:35:16 -07:00
Junio C Hamano 1952e102b7 Merge branch 'maint'
* maint:
  whitespace: have SP on both sides of an assignment "="
  update-ref: whitespace fix
2011-08-25 16:00:07 -07:00
Junio C Hamano cd2b8ae983 whitespace: have SP on both sides of an assignment "="
I've deliberately excluded the borrowed code in compat/nedmalloc
directory.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-25 14:47:07 -07:00
Junio C Hamano 033c2dc436 Merge branch 'ef/maint-win-verify-path'
* ef/maint-win-verify-path:
  verify_dotfile(): do not assume '/' is the path seperator
  verify_path(): simplify check at the directory boundary
  verify_path: consider dos drive prefix
  real_path: do not assume '/' is the path seperator
  A Windows path starting with a backslash is absolute
2011-06-29 17:09:17 -07:00
Theo Niessink e0f530ff8a verify_dotfile(): do not assume '/' is the path seperator
verify_dotfile() currently assumes that the path seperator is '/', but on
Windows it can also be '\\', so use is_dir_sep() instead.

Signed-off-by: Theo Niessink <theo@taletn.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-08 16:34:38 -07:00
Junio C Hamano 3bdf09c7f5 verify_path(): simplify check at the directory boundary
We simply want to say "At a directory boundary, be careful with a name
that begins with a dot, forbid a name that ends with the boundary
character or has duplicated bounadry characters".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-07 12:22:51 -07:00
Erik Faye-Lund 56948cb6aa verify_path: consider dos drive prefix
If someone manage to create a repo with a 'C:' entry in the
root-tree, files can be written outside of the working-dir. This
opens up a can-of-worms of exploits.

Fix it by explicitly checking for a dos drive prefix when verifying
a paht. While we're at it, make sure that paths beginning with '\' is
considered absolute as well.

Noticed-by: Theo Niessink <theo@taletn.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-27 10:59:18 -07:00
Junio C Hamano c4ce46fc7a index_fd(): turn write_object and format_check arguments into one flag
The "format_check" parameter tucked after the existing parameters is too
ugly an afterthought to live in any reasonable API.

Combine it with the other boolean parameter "write_object" into a single
"flags" parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 11:58:19 -07:00
Junio C Hamano 44ec754dc7 Merge branch 'jc/index-update-if-able' into maint
* jc/index-update-if-able:
  update $GIT_INDEX_FILE when there are racily clean entries
  diff/status: refactor opportunistic index update
2011-04-03 12:33:05 -07:00
Junio C Hamano 149971badc Merge branch 'jc/index-update-if-able'
* jc/index-update-if-able:
  update $GIT_INDEX_FILE when there are racily clean entries
  diff/status: refactor opportunistic index update
2011-03-26 20:13:16 -07:00
Junio C Hamano 483fbe2b7c update $GIT_INDEX_FILE when there are racily clean entries
Traditional "opportunistic index update" done by read-only "diff" and
"status" was about updating cached lstat(2) information in the index for
the next round.  We missed another obvious optimization opportunity: when
there are racily clean entries that will cease to be racily clean by
updating $GIT_INDEX_FILE.  Detect that case and write $GIT_INDEX_FILE out
to give it a newer timestamp.

Noticed by Lasse Makholm by stracing "git status" in a fresh checkout and
counting the number of open(2) calls.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-21 14:49:46 -07:00
Junio C Hamano ccdc4ec304 diff/status: refactor opportunistic index update
When we had to refresh the index internally before running diff or status,
we opportunistically updated the $GIT_INDEX_FILE so that later invocation
of git can use the lstat(2) we already did in this invocation.

Make them share a helper function to do so.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-21 12:43:10 -07:00
Junio C Hamano fc7ae9c156 Merge branch 'nd/hash-object-sanity'
* nd/hash-object-sanity:
  Make hash-object more robust against malformed objects

Conflicts:
	cache.h
2011-02-27 21:58:30 -08:00
Junio C Hamano d5c87a802d Merge branch 'nd/struct-pathspec'
* nd/struct-pathspec: (22 commits)
  t6004: add pathspec globbing test for log family
  t7810: overlapping pathspecs and depth limit
  grep: drop pathspec_matches() in favor of tree_entry_interesting()
  grep: use writable strbuf from caller for grep_tree()
  grep: use match_pathspec_depth() for cache/worktree grepping
  grep: convert to use struct pathspec
  Convert ce_path_match() to use match_pathspec_depth()
  Convert ce_path_match() to use struct pathspec
  struct rev_info: convert prune_data to struct pathspec
  pathspec: add match_pathspec_depth()
  tree_entry_interesting(): optimize wildcard matching when base is matched
  tree_entry_interesting(): support wildcard matching
  tree_entry_interesting(): fix depth limit with overlapping pathspecs
  tree_entry_interesting(): support depth limit
  tree_entry_interesting(): refactor into separate smaller functions
  diff-tree: convert base+baselen to writable strbuf
  glossary: define pathspec
  Move tree_entry_interesting() to tree-walk.c and export it
  tree_entry_interesting(): remove dependency on struct diff_options
  Convert struct diff_options to use struct pathspec
  ...
2011-02-27 21:17:36 -08:00
Jonathan Nieder 046613c546 update-index --refresh --porcelain: add missing const
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-22 16:51:21 -08:00
Nguyễn Thái Ngọc Duy c879daa237 Make hash-object more robust against malformed objects
Commits, trees and tags have structure. Don't let users feed git
with malformed ones. Sooner or later git will die() when
encountering them.

Note that this patch does not check semantics. A tree that points
to non-existent objects is perfectly OK (and should be so, users
may choose to add commit first, then its associated tree for example).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:25 -08:00
Nguyễn Thái Ngọc Duy 898bbd9fb4 Convert ce_path_match() to use match_pathspec_depth()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:30 -08:00
Nguyễn Thái Ngọc Duy eb9cb55b94 Convert ce_path_match() to use struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:30 -08:00
Junio C Hamano 5e738ae820 Merge branch 'jj/icase-directory'
* jj/icase-directory:
  Support case folding in git fast-import when core.ignorecase=true
  Support case folding for git add when core.ignorecase=true
  Add case insensitivity support when using git ls-files
  Add case insensitivity support for directories when using git status
  Case insensitivity support for .gitignore via core.ignorecase
  Add string comparison functions that respect the ignore_case variable.
  Makefile & configure: add a NO_FNMATCH_CASEFOLD flag
  Makefile & configure: add a NO_FNMATCH flag

Conflicts:
	Makefile
	config.mak.in
	configure.ac
	fast-import.c
2010-12-03 16:10:34 -08:00
Joshua Jensen dc1ae70487 Support case folding for git add when core.ignorecase=true
When MyDir/ABC/filea.txt is added to Git, the disk directory MyDir/ABC/
is renamed to mydir/aBc/, and then mydir/aBc/fileb.txt is added, the
index will contain MyDir/ABC/filea.txt and mydir/aBc/fileb.txt. Although
the earlier portions of this patch series account for those differences
in case, this patch makes the pathing consistent by folding the case of
newly added files against the first file added with that path.

In read-cache.c's add_to_index(), the index_name_exists() support used
for git status's case insensitive directory lookups is used to find the
proper directory case according to what the user already checked in.
That is, MyDir/ABC/'s case is used to alter the stored path for
fileb.txt to MyDir/ABC/fileb.txt (instead of mydir/aBc/fileb.txt).

This is especially important when cloning a repository to a case
sensitive file system. MyDir/ABC/ and mydir/aBc/ exist in the same
directory on a Windows machine, but on Linux, the files exist in two
separate directories. The update to add_to_index(), in effect, treats a
Windows file system as case sensitive by making path case consistent.

Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06 11:19:59 -07:00
Jonathan Nieder 59efba64ac core: Stop leaking ondisk_cache_entrys
Noticed with valgrind.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-11 09:57:43 -07:00
Shawn O. Pearce b659b49bb0 Correct spelling of 'REUC' extension
The new dircache extension CACHE_EXT_RESOLVE_UNDO, whose value is
0x52455543, is actually the ASCII sequence 'REUC', not the ASCII
sequence 'REUN'.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-02 09:54:34 -08:00
Junio C Hamano 125fd98434 Make ce_uptodate() trustworthy again
The rule has always been that a cache entry that is ce_uptodate(ce)
means that we already have checked the work tree entity and we know
there is no change in the work tree compared to the index, and nobody
should have to double check.  Note that false ce_uptodate(ce) does not
mean it is known to be dirty---it only means we don't know if it is
clean.

There are a few codepaths (refresh-index and preload-index are among
them) that mark a cache entry as up-to-date based solely on the return
value from ie_match_stat(); this function uses lstat() to see if the
work tree entity has been touched, and for a submodule entry, if its
HEAD points at the same commit as the commit recorded in the index of
the superproject (a submodule that is not even cloned is considered
clean).

A submodule is no longer considered unmodified merely because its HEAD
matches the index of the superproject these days, in order to prevent
people from forgetting to commit in the submodule and updating the
superproject index with the new submodule commit, before commiting the
state in the superproject.  However, the patch to do so didn't update
the codepath that marks cache entries up-to-date based on the updated
definition and instead worked it around by saying "we don't trust the
return value of ce_uptodate() for submodules."

This makes ce_uptodate() trustworthy again by not marking submodule
entries up-to-date.

The next step _could_ be to introduce a few "in-core" flag bits to
cache_entry structure to record "this entry is _known_ to be dirty",
call is_submodule_modified() from ie_match_stat(), and use these new
bits to avoid running this rather expensive check more than once, but
that can be a separate patch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-24 00:15:29 -08:00
Linus Torvalds fb7d3f32b2 Remove diff machinery dependency from read-cache
Exal Sibeaz pointed out that some git files are way too big, and that
add_files_to_cache() brings in all the diff machinery to any git binary
that needs the basic git SHA1 object operations from read-cache.c. Which
is pretty much all of them.

It's doubly silly, since add_files_to_cache() is only used by builtin
programs (add, checkout and commit), so it's fairly easily fixed by just
moving the thing to builtin-add.c, and avoiding the dependency entirely.

I initially argued to Exal that it would probably be best to try to depend
on smart compilers and linkers, but after spending some time trying to
make -ffunction-sections work and giving up, I think Exal was right, and
the fix is to just do some trivial cleanups like this.

This trivial cleanup results in pretty stunning file size differences.
The diff machinery really is mostly used by just the builtin programs, and
you have things like these trivial before-and-after numbers:

  -rwxr-xr-x 1 torvalds torvalds 1727420 2010-01-21 10:53 git-hash-object
  -rwxrwxr-x 1 torvalds torvalds  940265 2010-01-21 11:16 git-hash-object

Now, I'm not saying that 940kB is good either, but that's mostly all the
debug information - you can see the real code with 'size':

   text	   data	    bss	    dec	    hex	filename
 418675	   3920	 127408	 550003	  86473	git-hash-object (before)
 230650	   2288	 111728	 344666	  5425a	git-hash-object (after)

ie we have a nice 24% size reduction from this trivial cleanup.

It's not just that one file either. I get:

	[torvalds@nehalem git]$ du -s /home/torvalds/libexec/git-core
	45640	/home/torvalds/libexec/git-core (before)
	33508	/home/torvalds/libexec/git-core (after)

so we're talking 12MB of diskspace here.

(Of course, stripping all the binaries brings the 33MB down to 9MB, so the
whole debug information thing is still the bulk of it all, but that's a
separate issue entirely)

Now, I'm sure there are other things we should do, and changing our
compiler flags from -O2 to -Os would bring the text size down by an
additional almost 20%, but this thing Exal pointed out seems to be some
good low-hanging fruit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-21 17:05:13 -08:00
Junio C Hamano 6751e0471d Merge branch 'jc/cache-unmerge'
* jc/cache-unmerge:
  rerere forget path: forget recorded resolution
  rerere: refactor rerere logic to make it independent from I/O
  rerere: remove silly 1024-byte line limit
  resolve-undo: teach "update-index --unresolve" to use resolve-undo info
  resolve-undo: "checkout -m path" uses resolve-undo information
  resolve-undo: allow plumbing to clear the information
  resolve-undo: basic tests
  resolve-undo: record resolved conflicts in a new index extension section
  builtin-merge.c: use standard active_cache macros

Conflicts:
	builtin-ls-files.c
	builtin-merge.c
	builtin-rerere.c
2010-01-20 14:46:35 -08:00
Junio C Hamano 56eb8b43eb Merge branch 'jc/symbol-static'
* jc/symbol-static:
  date.c: mark file-local function static
  Replace parse_blob() with an explanatory comment
  symlinks.c: remove unused functions
  object.c: remove unused functions
  strbuf.c: remove unused function
  sha1_file.c: remove unused function
  mailmap.c: remove unused function
  utf8.c: mark file-local function static
  submodule.c: mark file-local function static
  quote.c: mark file-local function static
  remote-curl.c: mark file-local function static
  read-cache.c: mark file-local functions static
  parse-options.c: mark file-local function static
  entry.c: mark file-local function static
  http.c: mark file-local functions static
  pretty.c: mark file-local function static
  builtin-rev-list.c: mark file-local function static
  bisect.c: mark file-local function static
2010-01-20 14:37:25 -08:00
Junio C Hamano dc96c5ee70 Merge branch 'cc/reset-more'
* cc/reset-more:
  t7111: check that reset options work as described in the tables
  Documentation: reset: add some missing tables
  Fix bit assignment for CE_CONFLICTED
  "reset --merge": fix unmerged case
  reset: use "unpack_trees()" directly instead of "git read-tree"
  reset: add a few tests for "git reset --merge"
  Documentation: reset: add some tables to describe the different options
  reset: improve mixed reset error message when in a bare repo
2010-01-13 11:58:56 -08:00
Junio C Hamano 73d66323ac Merge branch 'nd/sparse'
* nd/sparse: (25 commits)
  t7002: test for not using external grep on skip-worktree paths
  t7002: set test prerequisite "external-grep" if supported
  grep: do not do external grep on skip-worktree entries
  commit: correctly respect skip-worktree bit
  ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
  tests: rename duplicate t1009
  sparse checkout: inhibit empty worktree
  Add tests for sparse checkout
  read-tree: add --no-sparse-checkout to disable sparse checkout support
  unpack-trees(): ignore worktree check outside checkout area
  unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
  unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
  unpack-trees.c: generalize verify_* functions
  unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
  Introduce "sparse checkout"
  dir.c: export excluded_1() and add_excludes_from_file_1()
  excluded_1(): support exclude files in index
  unpack-trees(): carry skip-worktree bit over in merged_entry()
  Read .gitignore from index if it is skip-worktree
  Avoid writing to buffer in add_excludes_from_file_1()
  ...

Conflicts:
	.gitignore
	Documentation/config.txt
	Documentation/git-update-index.txt
	Makefile
	entry.c
	t/t7002-grep.sh
2010-01-13 11:58:34 -08:00
Junio C Hamano 87b29e5a5a read-cache.c: mark file-local functions static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 01:06:08 -08:00
Junio C Hamano e11d7b5969 "reset --merge": fix unmerged case
Commit 9e8ecea (Add 'merge' mode to 'git reset', 2008-12-01) disallowed
"git reset --merge" when there was unmerged entries.  But it wished if
unmerged entries were reset as if --hard (instead of --merge) has been
used.  This makes sense because all "mergy" operations makes sure that
any path involved in the merge does not have local modifications before
starting, so resetting such a path away won't lose any information.

The previous commit changed the behavior of --merge to accept resetting
unmerged entries if they are reset to a different state than HEAD, but it
did not reset the changes in the work tree, leaving the conflict markers
in the resulting file in the work tree.

Fix it by doing three things:

 - Update the documentation to match the wish of original "reset --merge"
   better, namely, "An unmerged entry is a sign that the path didn't have
   any local modification and can be safely resetted to whatever the new
   HEAD records";

 - Update read_index_unmerged(), which reads the index file into the cache
   while dropping any higher-stage entries down to stage #0, not to copy
   the object name from the higher stage entry.  The code used to take the
   object name from the a stage entry ("base" if you happened to have
   stage #1, or "ours" if both sides added, etc.), which essentially meant
   that you are getting random results depending on what the merge did.

   The _only_ reason we want to keep a previously unmerged entry in the
   index at stage #0 is so that we don't forget the fact that we have
   corresponding file in the work tree in order to be able to remove it
   when the tree we are resetting to does not have the path.  In order to
   differentiate such an entry from ordinary cache entry, the cache entry
   added by read_index_unmerged() is marked as CE_CONFLICTED.

 - Update merged_entry() and deleted_entry() so that they pay attention to
   cache entries marked as CE_CONFLICTED.  They are previously unmerged
   entries, and the files in the work tree that correspond to them are
   resetted away by oneway_merge() to the version from the tree we are
   resetting to.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-03 16:01:05 -08:00
Junio C Hamano 0622f79d8e Merge branch 'nf/maint-fix-index-ext-len-on-be64' into maint
* nf/maint-fix-index-ext-len-on-be64:
  read_index(): fix reading extension size on BE 64-bit archs
2009-12-27 10:42:00 -08:00
Nathaniel W Filardo 07cc8ecac0 read_index(): fix reading extension size on BE 64-bit archs
On big endian platforms with 8-byte unsigned long, the code reads the
size of the index extension section (which is a 4-byte network byte
order integer) incorrectly.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-27 10:41:48 -08:00
Junio C Hamano cfc5789ada resolve-undo: record resolved conflicts in a new index extension section
When resolving a conflict using "git add" to create a stage #0 entry, or
"git rm" to remove entries at higher stages, remove_index_entry_at()
function is eventually called to remove unmerged (i.e. higher stage)
entries from the index.  Introduce a "resolve_undo_info" structure and
keep track of the removed cache entries, and save it in a new index
extension section in the index_state.

Operations like "read-tree -m", "merge", "checkout [-m] <branch>" and
"reset" are signs that recorded information in the index is no longer
necessary.  The data is removed from the index extension when operations
start; they may leave conflicted entries in the index, and later user
actions like "git add" will record their conflicted states afresh.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-25 17:10:10 -08:00
Nguyễn Thái Ngọc Duy 56cac48c35 ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
Previously CE_MATCH_IGNORE_VALID flag is used by both valid and
skip-worktree bits. While the two bits have similar behaviour, sharing
this flag means "git update-index --really-refresh" will ignore
skip-worktree while it should not. Instead another flag is
introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only
applies to valid bit.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-14 14:03:58 -08:00
Nguyễn Thái Ngọc Duy b4d1690df1 Teach Git to respect skip-worktree bit (reading part)
grep: turn on --cached for files that is marked skip-worktree
ls-files: do not check for deleted file that is marked skip-worktree
update-index: ignore update request if it's skip-worktree, while still allows removing
diff*: skip worktree version

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:13:32 -07:00
Matthieu Moy 3deffc52d8 reset: make the reminder output consistent with "checkout"
git reset without argument displays a summary of the local modification,
like this:

    $ git reset
    Makefile: locally modified

Some people have problems with this; they look like an error message.

This patch makes its output mimic how "git checkout $another_branch"
reports the paths with local modifications.  "git add --refresh --verbose"
is changed in the same way.

It also adds a header to make it clear that the output is informative,
and not an error.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
2009-08-21 21:19:35 -07:00
Matthieu Moy 43673fddd3 Rename REFRESH_SAY_CHANGED to REFRESH_IN_PORCELAIN.
The change in the output is going to become more general than just saying
"changed", so let's make the variable name more general too.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-21 20:45:40 -07:00
Thomas Rast 0721c314a5 Use die_errno() instead of die() when checking syscalls
Lots of die() calls did not actually report the kind of error, which
can leave the user confused as to the real problem.  Use die_errno()
where we check a system/library call that sets errno on failure, or
one of the following that wrap such calls:

  Function              Passes on error from
  --------              --------------------
  odb_pack_keep         open
  read_ancestry         fopen
  read_in_full          xread
  strbuf_read           xread
  strbuf_read_file      open or strbuf_read_file
  strbuf_readlink       readlink
  write_in_full         xwrite

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Thomas Rast d824cbba02 Convert existing die(..., strerror(errno)) to die_errno()
Change calls to die(..., strerror(errno)) to use the new die_errno().

In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Kjetil Barvik 5bcf109cdf checkout bugfix: use stat.mtime instead of stat.ctime in two places
Commit e1afca4fd "write_index(): update index_state->timestamp after
flushing to disk" on 2009-02-23 used stat.ctime to record the
timestamp of the index-file.  This is wrong, so fix this and use the
correct stat.mtime timestamp instead.

Commit 110c46a909 "Not all systems use st_[cm]tim field for ns
resolution file timestamp" on 2009-03-08, has a similar bug for the
builtin-fetch-pack.c file.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-15 12:56:40 -07:00
Junio C Hamano 110c46a909 Not all systems use st_[cm]tim field for ns resolution file timestamp
Some codepaths do not still use the ST_[CM]TIME_NSEC() pair of macros
introduced by the previous commit but assumes all systems use st_mtim
and st_ctim fields in "struct stat" to record nanosecond resolution part
of the file timestamps.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-08 14:04:39 -07:00
Kjetil Barvik c06ff4908b Record ns-timestamps if possible, but do not use it without USE_NSEC
Traditionally, the lack of USE_NSEC meant "do not record nor use the
nanosecond resolution part of the file timestamps".  To avoid problems on
filesystems that lose the ns part when the metadata is flushed to the disk
and then later read back in, disabling USE_NSEC has been a good idea in
general.

If you are on a filesystem without such an issue, it does not hurt to read
and store them in the cached stat data in the index entries even if your
git is compiled without USE_NSEC.  The index left with such a version of
git can be read by git compiled with USE_NSEC and it can make use of the
nanosecond part to optimize the check to see if the path on the filesystem
hsa been modified since we last looked at.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 20:25:16 -08:00
Kjetil Barvik e1afca4fd3 write_index(): update index_state->timestamp after flushing to disk
Since this timestamp is used to check for racy-clean files, it is
important to keep it uptodate.

For the 'git checkout' command without the '-q' option, this make a
huge difference.  Before, each and every file which was updated, was
racy-clean after the call to unpack_trees() and write_index() but
before the GIT process ended.

And because of the call to show_local_changes() in builtin-checkout.c,
we ended up reading those files back into memory, doing a SHA1 to
check if the files was really different from the index.  And, of
course, no file was different.

With this fix, 'git checkout' without the '-q' option should now be
almost as fast as with the '-q' option, but not quite, as we still do
some few lstat(2) calls more without the '-q' option.

Below is some average numbers for 10 checkout's to v2.6.27 and 10 to
v2.6.25 of the Linux kernel, to show the difference:

before (git version 1.6.2.rc1.256.g58a87):
 7.860 user  2.427 sys  19.465 real  52.8% CPU  faults: 0 major 95331 minor
after:
 6.184 user  2.160 sys  17.619 real  47.4% CPU  faults: 0 major 38994 minor

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-23 18:04:20 -08:00
Kjetil Barvik fba2f38a2c make USE_NSEC work as expected
Since the filesystem ext4 is now defined as stable in Linux v2.6.28,
and ext4 supports nanonsecond resolution timestamps natively, it is
time to make USE_NSEC work as expected.

This will make racy git situations less likely to happen.  For 'git
checkout' this means it will be less likely that we have to open, read
the contents of the file into RAM, and check if file is really
modified or not.  The result sould be a litle less used CPU time, less
pagefaults and a litle faster program, at least for 'git checkout'.

Since the number of possible racy git situations would increase when
disks gets faster, this patch would be more and more helpfull as times
go by.  For a fast Solid State Disk, this patch should be helpfull.

Note that, when file operations starts to take less than 1 nanosecond,
one would again start to get more racy git situations.

For more info on racy git, see Documentation/technical/racy-git.txt
For more info on ext4, see http://kernelnewbies.org/Ext4

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-19 21:39:48 -08:00
Kjetil Barvik 36419c8ee4 check_updates(): effective removal of cache entries marked CE_REMOVE
Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25'
(move from tag v2.6.27 to tag v2.6.25 of the Linux kernel):

CPU: Core 2, speed 1999.95 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
                         mask of 0x00 (Unhalted core cycles) count 20000
Counted INST_RETIRED_ANY_P events (number of instructions retired) with a
                           unit mask of 0x00 (No unit mask) count 20000
CPU_CLK_UNHALT...|INST_RETIRED:2...|
  samples|      %|  samples|      %|
------------------------------------
   409247 100.000    342878 100.000 git
        CPU_CLK_UNHALT...|INST_RETIRED:2...|
          samples|      %|  samples|      %|
        ------------------------------------
           260476 63.6476    257843 75.1996 libz.so.1.2.3
           100876 24.6492     64378 18.7758 kernel-2.6.28.4_2.vmlinux
            30850  7.5382      7874  2.2964 libc-2.9.so
            14775  3.6103      8390  2.4469 git
             2020  0.4936      4325  1.2614 libcrypto.so.0.9.8
              191  0.0467        32  0.0093 libpthread-2.9.so
               58  0.0142        36  0.0105 ld-2.9.so
                1 2.4e-04         0       0 libldap-2.3.so.0.2.31

Detail list of the top 20 function entries (libz counted in one blob):

CPU_CLK_UNHALTED  INST_RETIRED_ANY_P
samples  %        samples  %        image name               symbol name
260476   63.6862  257843   75.2725  libz.so.1.2.3            /lib/libz.so.1.2.3
16587     4.0555  3636      1.0615  libc-2.9.so              memcpy
7710      1.8851  277       0.0809  libc-2.9.so              memmove
3679      0.8995  1108      0.3235  kernel-2.6.28.4_2.vmlinux d_validate
3546      0.8670  2607      0.7611  kernel-2.6.28.4_2.vmlinux __getblk
3174      0.7760  1813      0.5293  libc-2.9.so              _int_malloc
2396      0.5858  3681      1.0746  kernel-2.6.28.4_2.vmlinux copy_to_user
2270      0.5550  2528      0.7380  kernel-2.6.28.4_2.vmlinux __link_path_walk
2205      0.5391  1797      0.5246  kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty
2103      0.5142  1203      0.3512  kernel-2.6.28.4_2.vmlinux find_first_zero_bit
2077      0.5078  997       0.2911  kernel-2.6.28.4_2.vmlinux do_get_write_access
2070      0.5061  514       0.1501  git                      cache_name_compare
2043      0.4995  1501      0.4382  kernel-2.6.28.4_2.vmlinux rcu_irq_exit
2022      0.4944  1732      0.5056  kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc
2020      0.4939  4325      1.2626  libcrypto.so.0.9.8       /usr/lib/libcrypto.so.0.9.8
1965      0.4804  1384      0.4040  git                      patch_delta
1708      0.4176  984       0.2873  kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period
1682      0.4112  727       0.2122  kernel-2.6.28.4_2.vmlinux sysfs_slab_alias
1659      0.4056  290       0.0847  git                      find_pack_entry_one
1480      0.3619  1307      0.3816  kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks

Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles
per instruction, and compared to the total cycles spent inside the
source code of GIT for this command, all the memmove() calls
translates to (7710 * 100) / 14775 = 52.2% of this.

Retesting with a GIT program compiled for gcov usage, I found out that
the memmove() calls came from remove_index_entry_at() in read-cache.c,
where we have:

        memmove(istate->cache + pos,
                istate->cache + pos + 1,
                (istate->cache_nr - pos) * sizeof(struct cache_entry *));

remove_index_entry_at() is called 4902 times from check_updates() in
unpack-trees.c, and each time called we move each cache_entry pointers
(from the removed one) one step to the left.

Since we have 28828 entries in the cache this time, and if we on
average move half of them each time, we in total move approximately
4902 * 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if
each pointer is 8 bytes (64 bit).

OK, is seems that the function check_updates() is called 28 times, so
the estimated guess above had been more correct if check_updates() had
been called only once, but the point is: we get lots of bytes moved.

To fix this, and use an O(N) algorithm instead, where N is the number
of cache_entries, we delete/remove all entries in one loop through all
entries.

From a retest, the new remove_marked_cache_entries() from the patch
below, ended up with the following output line from oprofile:

46        0.0105  15        0.0041  git                      remove_marked_cache_entries

If we can trust the numbers from oprofile in this case, we saved
approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077
seconds CPU time with this fix for this particular test.  And notice
that now the CPU did only 46 / 15 = 3.1 cycles/instruction.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-18 17:11:21 -08:00
Junio C Hamano 4cc8d6c62d add -u: do not fail to resolve a path as deleted
After you resolve a conflicted merge to remove the path, "git add -u"
failed to record the removal.  Instead it errored out by saying that the
removed path is not found in the work tree, but that is what the user
already knows, and the wanted to record the removal as the resolution,
so the error does not make sense.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-28 17:29:33 -08:00
Linus Torvalds a60272b38e Make 'ce_compare_link()' use the new 'strbuf_readlink()'
This simplifies the code, and also makes ce_compare_link now able to
handle filesystems with odd 'st_size' return values for symlinks.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-17 13:36:34 -08:00
Junio C Hamano 63e8dc5b14 read-cache.c: typofix in comment
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-07 19:08:23 -08:00
Junio C Hamano 331fcb598e git add --intent-to-add: do not let an empty blob be committed by accident
Writing a tree out of an index with an "intent to add" entry is blocked.
This implies that you cannot "git commit" from such a state; however you
can still do "git commit -a" or "git commit $that_path".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-30 17:59:19 -08:00
Junio C Hamano 388b2acd6e git add --intent-to-add: fix removal of cached emptiness
This uses the extended index flag mechanism introduced earlier to mark
the entries added to the index via "git add -N" with CE_INTENT_TO_ADD.

The logic to detect an "intent to add" entry for the purpose of allowing
"git rm --cached $path" is tightened to check not just for a staged empty
blob, but with the CE_INTENT_TO_ADD bit.  This protects an empty blob that
was explicitly added and then modified in the work tree from being dropped
with this sequence:

	$ >empty
	$ git add empty
	$ echo "non empty" >empty
	$ git rm --cached empty

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-28 19:58:24 -08:00
Junio C Hamano fe60dff744 Merge branch 'nd/narrow' (early part) into jc/add-i-t-a
* 'nd/narrow' (early part):
  Extend index to save more flags
2008-11-28 17:22:35 -08:00
Junio C Hamano 6cd3729eae Merge branch 'maint'
* maint:
  Start 1.6.0.5 cycle
  Fix pack.packSizeLimit and --max-pack-size handling
  checkout: Fix "initial checkout" detection
  Remove the period after the git-check-attr summary

Conflicts:
	RelNotes
2008-11-12 15:03:57 -08:00
Junio C Hamano fa7b3c2f75 checkout: Fix "initial checkout" detection
Earlier commit 5521883 (checkout: do not lose staged removal, 2008-09-07)
tightened the rule to prevent switching branches from losing local
changes, so that staged removal of paths can be protected, while
attempting to keep a loophole to still allow a special case of switching
out of an un-checked-out state.

However, the loophole was made a bit too tight, and did not allow
switching from one branch (in an un-checked-out state) to check out
another branch.

The change to builtin-checkout.c in this commit loosens it to allow this,
by not insisting the original commit and the new commit to be the same.

It also introduces a new function, is_index_unborn (and an associated
macro, is_cache_unborn), to check if the repository is truly in an
un-checked-out state more reliably, by making sure that $GIT_INDEX_FILE
did not exist when populating the in-core index structure.  A few places
the earlier commit 5521883 added the check for the initial checkout
condition are updated to use this function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 14:16:50 -08:00
Junio C Hamano 86e67a088c Merge branch 'jk/maint-ls-files-other' into maint
* jk/maint-ls-files-other:
  refactor handling of "other" files in ls-files and status
2008-11-02 13:37:13 -08:00
Jeff King f55527f802 rm: loosen safety valve for empty files
If a file is different between the working tree copy, the index, and the
HEAD, then we do not allow it to be deleted without --force.

However, this is overly tight in the face of "git add --intent-to-add":

  $ git add --intent-to-add file
  $ : oops, I don't actually want to stage that yet
  $ git rm --cached file
  error: 'empty' has staged content different from both the
  file and the HEAD (use -f to force removal)
  $ git rm -f --cached file

Unfortunately, there is currently no way to distinguish between an empty
file that has been added and an "intent to add" file. The ideal behavior
would be to disallow the former while allowing the latter.

This patch loosens the safety valve to allow the deletion only if we are
deleting the cached entry and the cached content is empty.  This covers
the intent-to-add situation, and assumes there is little harm in not
protecting users who have legitimately added an empty file.  In many
cases, the file will still be empty, in which case the safety valve does
not trigger anyway (since the content remains untouched in the working
tree). Otherwise, we do remove the fact that no content was staged, but
given that the content is by definition empty, it is not terribly
difficult for a user to recreate it.

However, we still document the desired behavior in the form of two
tests. One checks the correct removal of an intent-to-add file. The other
checks that we still disallow removal of empty files, but is marked as
expect_failure to indicate this compromise. If the intent-to-add feature
is ever extended to differentiate between normal empty files and
intent-to-add files, then the safety valve can be re-tightened.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-22 17:16:07 -07:00
Junio C Hamano d67dd17b33 Merge branch 'jk/fix-ls-files-other'
* jk/fix-ls-files-other:
  refactor handling of "other" files in ls-files and status
2008-10-21 17:57:56 -07:00
Junio C Hamano 500ac7f42e Merge branch 'jc/maint-reset-remove-unmerged-new'
* jc/maint-reset-remove-unmerged-new:
  reset --hard/read-tree --reset -u: remove unmerged new paths
2008-10-21 13:48:41 -07:00
Junio C Hamano d1a43f2aa4 reset --hard/read-tree --reset -u: remove unmerged new paths
When aborting a failed merge that has brought in a new path using "git
reset --hard" or "git read-tree --reset -u", we used to first forget about
the new path (via read_cache_unmerged) and then matched the working tree
to what is recorded in the index, thus ending up leaving the new path in
the work tree.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-18 10:00:59 -07:00
Junio C Hamano e845e16ee6 Merge branch 'jk/maint-ls-files-other' into jk/fix-ls-files-other
* jk/maint-ls-files-other:
  refactor handling of "other" files in ls-files and status

Conflicts:
	read-cache.c
2008-10-17 13:03:52 -07:00
Jeff King 98fa473887 refactor handling of "other" files in ls-files and status
When the "git status" display code was originally converted
to C, we copied the code from ls-files to discover whether a
pathname returned by read_directory was an "other", or
untracked, file.

Much later, 5698454e updated the code in ls-files to handle
some new cases caused by gitlinks.  This left the code in
wt-status.c broken: it would display submodule directories
as untracked directories. Nobody noticed until now, however,
because unless status.showUntrackedFiles was set to "all",
submodule directories were not actually reported by
read_directory. So the bug was only triggered in the
presence of a submodule _and_ this config option.

This patch pulls the ls-files code into a new function,
cache_name_is_other, and uses it in both places. This should
leave the ls-files functionality the same and fix the bug
in status.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-17 12:46:59 -07:00
Nguyễn Thái Ngọc Duy 06aaaa0bf7 Extend index to save more flags
The on-disk format of index only saves 16 bit flags, nearly all have
been used. The last bit (CE_EXTENDED) is used to for future extension.

This patch extends index entry format to save more flags in future.
The new entry format will be used when CE_EXTENDED bit is 1.

Because older implementation may not understand CE_EXTENDED bit and
misread the new format, if there is any extended entry in index, index
header version will turn 3, which makes it incompatible for older git.
If there is none, header version will return to 2 again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-12 13:21:54 -07:00
Brandon Casey f285a2d7ed Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer
Many call sites use strbuf_init(&foo, 0) to initialize local
strbuf variable "foo" which has not been accessed since its
declaration. These can be replaced with a static initialization
using the STRBUF_INIT macro which is just as readable, saves a
function call, and takes up fewer lines.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-12 12:36:19 -07:00
Dmitry Potapov 7e7abea96b print an error message for invalid path
If verification of path failed, it is always better to print an
error message saying this than relying on the caller function to
print a meaningful error message (especially when the callee already
prints error message for another situation).

Because the callers of add_index_entry_with_check() did not print
any error message, it resulted that the user would not notice the
problem when checkout of an invalid path failed.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-12 12:36:19 -07:00
Shawn O. Pearce a3c76f2858 Merge branch 'jc/add-ita'
* jc/add-ita:
  git-add --intent-to-add (-N)
2008-10-09 10:21:25 -07:00
Nicolas Pitre 9126f0091f fix openssl headers conflicting with custom SHA1 implementations
On ARM I have the following compilation errors:

    CC fast-import.o
In file included from cache.h:8,
                 from builtin.h:6,
                 from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1

This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version.  Compilation of git is probably broken on PPC too for the
same reason.

Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c.  But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.

As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-02 18:06:56 -07:00
Junio C Hamano a3fcc0562c Merge branch 'jc/maint-name-hash-clear' into maint
* jc/maint-name-hash-clear:
  discard_cache: reset lazy name_hash bit
2008-09-18 19:53:06 -07:00
Junio C Hamano 27551baa3e Merge branch 'jc/maint-name-hash-clear'
* jc/maint-name-hash-clear:
  discard_cache: reset lazy name_hash bit
2008-09-16 00:47:52 -07:00
Junio C Hamano 394258190c git-add --intent-to-add (-N)
This adds "--intent-to-add" option to "git add".  This is to let the
system know that you will tell it the final contents to be staged later,
iow, just be aware of the presense of the path with the type of the blob
for now.  It is implemented by staging an empty blob as the content.

With this sequence:

    $ git reset --hard
    $ edit newfile
    $ git add -N newfile
    $ edit newfile oldfile
    $ git diff

the diff will show all changes relative to the current commit.  Then you
can do:

    $ git commit -a ;# commit everything

or

    $ git commit oldfile ;# only oldfile, newfile not yet added

to pretend you are working with an index-free system like CVS.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-31 16:22:05 -07:00
Junio C Hamano b46f7e54fc Merge branch 'jc/add-addremove'
* jc/add-addremove:
  builtin-add.c: optimize -A option and "git add ."
  builtin-add.c: restructure the code for maintainability
2008-08-27 16:39:57 -07:00
Junio C Hamano d6096f17d2 Merge branch 'maint'
* maint:
  unpack_trees(): protect the handcrafted in-core index from read_cache()
  git-p4: Fix one-liner in p4_write_pipe function.
  Completion: add missing '=' for 'diff --diff-filter'
  Fix 'git help help'
2008-08-23 18:28:37 -07:00
Junio C Hamano 913e0e99b6 unpack_trees(): protect the handcrafted in-core index from read_cache()
unpack_trees() rebuilds the in-core index from scratch by allocating a new
structure and finishing it off by copying the built one to the final
index.

The resulting in-core index is Ok for most use, but read_cache() does not
recognize it as such.  The function is meant to be no-op if you already
have loaded the index, until you call discard_cache().

This change the way read_cache() detects an already initialized in-core
index, by introducing an extra bit, and marks the handcrafted in-core
index as initialized, to avoid this problem.

A better fix in the longer term would be to change the read_cache() API so
that it will always discard and re-read from the on-disk index to avoid
confusion.  But there are higher level API that have relied on the current
semantics, and they and their users all need to get converted, which is
outside the scope of 'maint' track.

An example of such a higher level API is write_cache_as_tree(), which is
used by git-write-tree as well as later Porcelains like git-merge, revert
and cherry-pick.  In the longer term, we should remove read_cache() from
there and add one to cmd_write_tree(); other callers expect that the
in-core index they prepared is what gets written as a tree so no other
change is necessary for this particular codepath.

The original version of this patch marked the index by pointing an
otherwise wasted malloc'ed memory with o->result.alloc, but this version
uses Linus's idea to use a new "initialized" bit, which is conceptually
much cleaner.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-23 18:09:27 -07:00
Junio C Hamano 64ca23afda discard_cache: reset lazy name_hash bit
We forgot to reset name_hash_initialized bit when discarding the in-core index.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-23 13:05:10 -07:00
Junio C Hamano 16ce2e4c8f index: future proof for "extended" index entries
We do not have any more bits in the on-disk index flags word, but we would
need to have more in the future.  Use the last remaining bits as a signal
to tell us that the index entry we are looking at is an extended one.

Since we do not understand the extended format yet, we will just error out
when we see it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-17 00:22:45 -07:00
Junio C Hamano c70115b4b1 Teach gitlinks to ie_modified() and ce_modified_check_fs()
The ie_modified() function is the workhorse for refresh_cache_entry(),
i.e. checking if an index entry that is stat-dirty actually has changes.

After running quicker check to compare cached stat information with
results from the latest lstat(2) to answer "has modification" early, the
code goes on to check if there really is a change by comparing the staged
data with what is on the filesystem by asking ce_modified_check_fs().
However, this function always said "no change" for any gitlinks that has a
directory at the corresponding path.  This made ie_modified() to miss
actual changes in the subproject.

The patch fixes this first by modifying an existing short-circuit logic
before calling the ce_modified_check_fs() function.  It knows that for any
filesystem entity to which ie_match_stat() says its data has changed, if
its cached size is nonzero then the contents cannot match, which is a
correct optimization only for blob objects.  We teach gitlink objects to
this special case, as we already know that any gitlink that
ie_match_stat() says is modified is indeed modified at this point in the
codepath.

With the above change, we could leave ce_modified_check_fs() broken, but
it also futureproofs the code by teaching it to use ce_compare_gitlink(),
instead of assuming (incorrectly) that any directory is unchanged.

Originally noticed by Alex Riesen on Cygwin.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-30 00:09:22 -07:00
Alex Riesen 1ce4790bf5 Make use of stat.ctime configurable
A new configuration variable 'core.trustctime' is introduced to
allow ignoring st_ctime information when checking if paths
in the working tree has changed, because there are situations where
it produces too much false positives.  Like when file system crawlers
keep changing it when scanning and using the ctime for marking scanned
files.

The default is to notice ctime changes.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-28 23:26:25 -07:00
Petr Baudis 81dc2307d0 git-mv: Keep moved index entries inact
The rewrite of git-mv from a shell script to a builtin was perhaps
a little too straightforward: the git add and git rm queues were
emulated directly, which resulted in a rather complicated code and
caused an inconsistent behaviour when moving dirty index entries;
git mv would update the entry based on working tree state,
except in case of overwrites, where the new entry would still have
sha1 of the old file.

This patch introduces rename_index_entry_at() into the index toolkit,
which will rename an entry while removing any entries the new entry
might render duplicate. This is then used in git mv instead
of all the file queues, resulting in a major simplification
of the code and an inevitable change in git mv -n output format.

Also the code used to refuse renaming overwriting symlink with a regular
file and vice versa; there is no need for that.

A few new tests have been added to the testsuite to reflect this change.

Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-27 15:05:19 -07:00
Junio C Hamano 041aee31be builtin-add.c: restructure the code for maintainability
A private function add_files_to_cache() in builtin-add.c was borrowed by
checkout and commit re-implementors without getting properly refactored to
more library-ish place.  This does the refactoring.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-25 21:14:21 -07:00
Junio C Hamano d14e7407b3 "needs update" considered harmful
"git update-index --refresh", "git reset" and "git add --refresh" have
reported paths that have local modifications as "needs update" since the
beginning of git.

Although this is logically correct in that you need to update the index at
that path before you can commit that change, it is now becoming more and
more clear, especially with the continuous push for user friendliness
since 1.5.0 series, that the message is suboptimal.  After all, the change
may be something the user might want to get rid of, and "updating" would
be absolutely a wrong thing to do if that is the case.

I prepared two alternatives to solve this.  Both aim to reword the message
to more neutral "locally modified".

This patch is a more intrusive variant that changes the message for only
Porcelain commands ("add" and "reset") while keeping the plumbing
"update-index" intact.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-20 17:21:32 -07:00
Junio C Hamano 3bf0dd1f4e read-cache.c: typofix
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-16 18:48:58 -07:00
Miklos Vajna e46bbcf6e8 Move read_cache_unmerged() to read-cache.c
builtin-read-tree has a read_cache_unmerged() which is useful for other
builtins, for example builtin-merge uses it as well. Move it to
read-cache.c to avoid code duplication.

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-30 22:45:51 -07:00
Junio C Hamano 159e639e5b Merge branch 'lt/racy-empty'
* lt/racy-empty:
  racy-git: an empty blob has a fixed object name
2008-06-22 14:34:20 -07:00
Linus Torvalds f49c2c22fe racy-git: an empty blob has a fixed object name
We use size=0 as the magic token to say the entry is known to be racily
clean, but a sequence that does:

 - update the path with a non-empty blob and write the index;
 - update an unrelated path and write the index -- this smudges
   the above entry;
 - truncate the path to size zero.

would make both the size field for the path in the index and the size on
the filesystem zero.  We should not mistake it as a clean index entry.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-19 14:14:45 -07:00
Marius Storm-Olsen aa9349d449 Add shortcut in refresh_cache_ent() for marked entries.
When a cache entry has been marked as CE_VALID, the user has
promised us that any change in the work tree does not matter.
Just mark the entry as up-to-date, and continue.

Signed-off-by: Marius Storm-Olsen <marius@trolltech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-31 14:18:20 -07:00
Junio C Hamano 0166592495 Merge branch 'jc/add-n-u'
* jc/add-n-u:
  Make git add -n and git -u -n output consistent
  "git-add -n -u" should not add but just report

Conflicts:

	builtin-add.c
	builtin-mv.c
	cache.h
	read-cache.c
2008-05-25 14:03:50 -07:00
Junio C Hamano 7e83003029 Merge branch 'js/ignore-submodule'
* js/ignore-submodule:
  Ignore dirty submodule states during rebase and stash
  Teach update-index about --ignore-submodules
  diff options: Introduce --ignore-submodules
2008-05-25 13:37:08 -07:00
Junio C Hamano 38ed1d89f7 "git-add -n -u" should not add but just report
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-21 12:04:41 -07:00
Johannes Schindelin 5fdeacb0ca Teach update-index about --ignore-submodules
Like with the diff machinery, update-index should sometimes just
ignore submodules (e.g. to determine a clean state before a rebase).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-15 16:12:43 -07:00
Alex Riesen 960b8ad1b1 Make the exit code of add_file_to_index actually useful
Update the programs which used the function (as add_file_to_cache).

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-12 20:54:46 -07:00
Linus Torvalds d177cab048 Avoid some unnecessary lstat() calls
The commit sequence used to do

	if (file_exists(p->path))
		add_file_to_cache(p->path, 0);

where both "file_exists()" and "add_file_to_cache()" needed to do a
lstat() on the path to do their work.

This cuts down 'lstat()' calls for the partial commit case by two
for each path we know about (because we do this twice per path).

Just move the lstat() to the caller instead (that's all that
"file_exists()" really does), and pass the stat information down to the
add_to_cache() function.

This essentially makes 'add_to_index()' the core function that adds a path
to the index, getting the index pointer, the pathname and the stat
information as arguments. There are then shorthand helper functions that
use this core function:

 - 'add_to_cache()' is just 'add_to_index()' with the default index

 - 'add_file_to_cache/index()' is the same, but does the lstat() call
   itself, so you can pass just the pathname if you don't already have the
   stat information available.

So old users of the 'add_file_to_xyzzy()' are essentially left unchanged,
and this just exposes the more generic helper function that can take
existing stat information into account.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-10 18:16:30 -07:00
Junio C Hamano 2855e70ad1 Merge branch 'py/diff-submodule'
* py/diff-submodule:
  is_racy_timestamp(): do not check timestamp for gitlinks
  diff-lib.c: rename check_work_tree_entity()
  diff: a submodule not checked out is not modified
  Add t7506 to test submodule related functions for git-status
  t4027: test diff for submodule with empty directory
2008-05-10 18:16:25 -07:00
Junio C Hamano 380a742679 Merge branch 'lt/case-insensitive'
* lt/case-insensitive:
  Make git-add behave more sensibly in a case-insensitive environment
  When adding files to the index, add support for case-independent matches
  Make unpack-tree update removed files before any updated files
  Make branch merging aware of underlying case-insensitive filsystems
  Add 'core.ignorecase' option
  Make hash_name_lookup able to do case-independent lookups
  Make "index_name_exists()" return the cache_entry it found
  Move name hashing functions into a file of its own
  Make unpack_trees_options bit flags actual bitfields
2008-05-10 18:14:28 -07:00
Junio C Hamano 050288d52d is_racy_timestamp(): do not check timestamp for gitlinks
Because we do not even check the timestamp to determie if a gitlink
is up to date or not, triggering the racy-timestamp check for gitlinks
does not make sense.

This fixes the recently added test in t7506.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-04 17:41:27 -07:00
Junio C Hamano e06c43c795 write_index(): optimize ce_smudge_racily_clean_entry() calls with CE_UPTODATE
When writing the index out, we need to check the work tree again to see if
an entry whose timestamp indicates that it could be "racily clean", in
order to smudge it if it is stat-clean but with modified contents.

However, we can skip this step for entries marked with CE_UPTODATE,
which are known to be the really clean (i.e. the one we already have
checked when we prepared the index).  This will reduce lstat(2) calls
necessary in git-status.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-12 19:42:17 -07:00