Commit graph

1180 commits

Author SHA1 Message Date
Christian Couder de7b5d6218 sha1_object_info_extended(): add an "unsigned flags" parameter
This parameter is not used yet, but it will be used to tell
sha1_object_info_extended() if it should perform object
replacement or not.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12 11:53:48 -08:00
Christian Couder bf93eea0f6 sha1_file.c: add lookup_replace_object_extended() to pass flags
Currently, there is only one caller to lookup_replace_object()
that can benefit from passing it some flags, but we expect
that there could be more.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12 11:53:48 -08:00
Christian Couder ffe68cf9ac rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT
The READ_SHA1_FILE_REPLACE flag is more related to using the
lookup_replace_object() function rather than the
read_sha1_file() function.

We also need such a flag to be used with sha1_object_info()
instead of read_sha1_file().

The name LOOKUP_REPLACE_OBJECT is therefore better for this
flag.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12 11:53:48 -08:00
Nguyễn Thái Ngọc Duy 069c053222 add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
This may be needed when a hook is run after a new shallow pack is
received, but .git/shallow is not settled yet. A temporary shallow
file to plug all loose ends should be used instead. GIT_SHALLOW_FILE
is overriden by --shallow-file.

--shallow-file does not work in this case because the hook may spawn
many git subprocesses and the launch commands do not have
--shallow-file as it's a recent addition.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:17 -08:00
Nguyễn Thái Ngọc Duy 58babfffde shallow.c: the 8 steps to select new commits for .git/shallow
Suppose a fetch or push is requested between two shallow repositories
(with no history deepening or shortening). A pack that contains
necessary objects is transferred over together with .git/shallow of
the sender. The receiver has to determine whether it needs to update
.git/shallow if new refs needs new shallow comits.

The rule here is avoid updating .git/shallow by default. But we don't
want to waste the received pack. If the pack contains two refs, one
needs new shallow commits installed in .git/shallow and one does not,
we keep the latter and reject/warn about the former.

Even if .git/shallow update is allowed, we only add shallow commits
strictly necessary for the former ref (remember the sender can send
more shallow commits than necessary) and pay attention not to
accidentally cut the receiver history short (no history shortening is
asked for)

So the steps to figure out what ref need what new shallow commits are:

1. Split the sender shallow commit list into "ours" and "theirs" list
   by has_sha1_file. Those that exist in current repo in "ours", the
   remaining in "theirs".

2. Check the receiver .git/shallow, remove from "ours" the ones that
   also exist in .git/shallow.

3. Fetch the new pack. Either install or unpack it.

4. Do has_sha1_file on "theirs" list again. Drop the ones that fail
   has_sha1_file. Obviously the new pack does not need them.

5. If the pack is kept, remove from "ours" the ones that do not exist
   in the new pack.

6. Walk the new refs to answer the question "what shallow commits,
   both ours and theirs, are required in .git/shallow in order to add
   this ref?". Shallow commits not associated to any refs are removed
   from their respective list.

7. (*) Check reachability (from the current refs) of all remaining
   commits in "ours". Those reachable are removed. We do not want to
   cut any part of our (reachable) history. We only check up
   commits. True reachability test is done by
   check_everything_connected() at the end as usual.

8. Combine the final "ours" and "theirs" and add them all to
   .git/shallow. Install new refs. The case where some hook rejects
   some refs on a push is explained in more detail in the push
   patches.

Of these steps, #6 and #7 are expensive. Both require walking through
some commits, or in the worst case all commits. And we rather avoid
them in at least common case, where the transferred pack does not
contain any shallow commits that the sender advertises. Let's look at
each scenario:

1) the sender has longer history than the receiver

   All shallow commits from the sender will be put into "theirs" list
   at step 1 because none of them exists in current repo. In the
   common case, "theirs" becomes empty at step 4 and exit early.

2) the sender has shorter history than the receiver

   All shallow commits from the sender are likely in "ours" list at
   step 1. In the common case, if the new pack is kept, we could empty
   "ours" and exit early at step 5.

   If the pack is not kept, we hit the expensive step 6 then exit
   after "ours" is emptied. There'll be only a handful of objects to
   walk in fast-forward case. If it's forced update, we may need to
   walk to the bottom.

3) the sender has same .git/shallow as the receiver

   This is similar to case 2 except that "ours" should be emptied at
   step 2 and exit early.

A fetch after "clone --depth=X" is case 1. A fetch after "clone" (from
a shallow repo) is case 3. Luckily they're cheap for the common case.

A push from "clone --depth=X" falls into case 2, which is expensive.
Some more work may be done at the sender/client side to avoid more
work on the server side: if the transferred pack does not contain any
shallow commits, send-pack should not send any shallow commits to the
receive-pack, effectively turning it into a normal push and avoid all
steps.

This patch implements all steps except #3, already handled by
fetch-pack and receive-pack, #6 and #7, which has their own patch due
to their size.

(*) in previous versions step 7 was put before step 3. I reorder it so
    that the common case that keeps the pack does not need to walk
    commits at all. In future if we implement faster commit
    reachability check (maybe with the help of pack bitmaps or commit
    cache), step 7 could become cheap and be moved up before 6 again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:16 -08:00
Karsten Blees efc684245b remove old hash.[ch] implementation
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:25 -08:00
Karsten Blees 419a597f64 name-hash.c: remove cache entries instead of marking them CE_UNHASHED
The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.

The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.

Keep the CE_HASHED flag to prevent adding entries twice.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:24 -08:00
Karsten Blees 8b013788a1 name-hash.c: use new hash map implementation for cache entries
Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be
removed, as ce->next (now ce->ent.next) is always properly initialized in
name-hash.c::hash_index_entry.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:24 -08:00
Karsten Blees e05881a457 name-hash.c: use new hash map implementation for directories
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:23 -08:00
Junio C Hamano 9196a2f8bd Merge branch 'jc/upload-pack-send-symref' into maint
One long-standing flaw in the pack transfer protocol used by "git
clone" was that there was no way to tell the other end which branch
"HEAD" points at, and the receiving end needed to guess.  A new
capability has been defined in the pack protocol to convey this
information so that cloning from a repository with more than one
branches pointing at the same commit where the HEAD is at now
reliably sets the initial branch in the resulting repository.

* jc/upload-pack-send-symref:
  t5570: Update for clone-progress-to-stderr branch
  t5570: Update for symref capability
  clone: test the new HEAD detection logic
  connect: annotate refs with their symref information in get_remote_head()
  connect.c: make parse_feature_value() static
  upload-pack: send non-HEAD symbolic refs
  upload-pack: send symbolic ref information as capability
  upload-pack.c: do not pass confusing cb_data to mark_our_ref()
  t5505: fix "set-head --auto with ambiguous HEAD" test
2013-11-08 11:38:00 -08:00
Junio C Hamano e0fd1e3841 Merge branch 'sb/refs-code-cleanup'
* sb/refs-code-cleanup:
  cache: remove unused function 'have_git_dir'
  refs: remove unused function invalidate_ref_cache
2013-11-01 07:38:58 -07:00
Junio C Hamano c02e1e4a07 Merge branch 'nd/lift-path-max'
* nd/lift-path-max:
  checkout_entry(): clarify the use of topath[] parameter
  entry.c: convert checkout_entry to use strbuf
2013-10-30 12:10:56 -07:00
Junio C Hamano e22c1c7f19 Merge branch 'jx/relative-path-regression-fix'
* jx/relative-path-regression-fix:
  Use simpler relative_path when set_git_dir
  relative_path should honor dos-drive-prefix
  test: use unambigous leading path (/foo) for MSYS
2013-10-28 10:42:30 -07:00
Junio C Hamano da212eabec Merge branch 'jk/format-patch-from' into maint
"format-patch --from=<whom>" forgot to omit unnecessary in-body from
line, i.e. when <whom> is the same as the real author.

* jk/format-patch-from:
  format-patch: print in-body "From" only when needed
2013-10-28 10:18:43 -07:00
Stefan Beller 84471a1213 cache: remove unused function 'have_git_dir'
This function was added in d2b0708 (2008-09-27, add have_git_dir()
function) as a preparation for adbc0b6 (2008-09-30, cygwin: Use native
Win32 API for stat).

However the second referenced commit was reverted in f66450a (2013-06-22,
cygwin: Remove the Win32 l/stat() implementation), so we don't need to
expose this wrapper function any more as a public API.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-28 08:56:06 -07:00
Vicent Marti ec73f5807c sha1_file: export git_open_noatime
The `git_open_noatime` helper can be of general interest for other
consumers of git's different on-disk formats.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:44:52 -07:00
Junio C Hamano af2a651d2e checkout_entry(): clarify the use of topath[] parameter
The said function has this signature:

	extern int checkout_entry(struct cache_entry *ce,
				  const struct checkout *state,
				  char *topath);

At first glance, it might appear that the caller of checkout_entry()
can specify to which path the contents are written out by the last
parameter, and it is tempting to add "const" in front of its type.

In reality, however, topath[] is to point at a buffer to store the
temporary path generated by the callchain originating from this
function, and the temporary path is always short, much shorter than
the buffer prepared by its only caller in builtin/checkout-index.c.

Document the code a bit to clarify so that future callers know how
to use the function better.

Noticed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 14:59:39 -07:00
Junio C Hamano 046180ad9d Merge branch 'jk/format-patch-from'
"format-patch --from=<whom>" forgot to omit unnecessary in-body
from line, i.e. when <whom> is the same as the real author.

* jk/format-patch-from:
  format-patch: print in-body "From" only when needed
2013-10-17 15:55:18 -07:00
Junio C Hamano d6a58b7773 Merge branch 'es/name-hash-no-trailing-slash-in-dirs'
Clean up the internal of the name-hash mechanism used to work
around case insensitivity on some filesystems to cleanly fix a
long-standing API glitch where the caller of cache_name_exists()
that ask about a directory with a counted string was required to
have '/' at one location past the end of the string.

* es/name-hash-no-trailing-slash-in-dirs:
  dir: revert work-around for retired dangerous behavior
  name-hash: stop storing trailing '/' on paths in index_state.dir_hash
  employ new explicit "exists in index?" API
  name-hash: refactor polymorphic index_name_exists()
2013-10-17 15:55:16 -07:00
Jiang Xin 41894ae3a3 Use simpler relative_path when set_git_dir
Using a relative_path as git_dir first appears in v1.5.6-1-g044bbbc.
It will make git_dir shorter only if git_dir is inside work_tree,
and this will increase performance. But my last refactor effort on
relative_path function (commit v1.8.3-rc2-12-ge02ca72) changed that.
Always use relative_path as git_dir may bring troubles like
$gmane/234434.

Because new relative_path is a combination of original relative_path
from path.c and original path_relative from quote.c, so in order to
restore the origin implementation, save the original relative_path
as remove_leading_path, and call it in setup.c.

Suggested-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-10-14 07:00:33 -07:00
Junio C Hamano f406140baa Merge branch 'fc/at-head'
Instead of typing four capital letters "HEAD", you can say "@" now,
e.g. "git log @".

* fc/at-head:
  Add new @ shortcut for HEAD
  sha1-name: pass len argument to interpret_branch_name()
2013-09-20 12:38:10 -07:00
Jeff King 662cc30cd0 format-patch: print in-body "From" only when needed
Commit a908047 taught format-patch the "--from" option,
which places the author ident into an in-body from header,
and uses the committer ident in the rfc822 from header.  The
documentation claims that it will omit the in-body header
when it is the same as the rfc822 header, but the code never
implemented that behavior.

This patch completes the feature by comparing the two idents
and doing nothing when they are the same (this is the same
as simply omitting the in-body header, as the two are by
definition indistinguishable in this case). This makes it
reasonable to turn on "--from" all the time (if it matches
your particular workflow), rather than only using it when
exporting other people's patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-20 11:09:51 -07:00
Junio C Hamano 5d54cffc36 connect.c: make parse_feature_value() static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 21:52:06 -07:00
Eric Sunshine db5360f3f4 name-hash: refactor polymorphic index_name_exists()
Depending upon the absence or presence of a trailing '/' on the incoming
pathname, index_name_exists() checks either if a file is present in the
index or if a directory is represented within the index. Each caller
explicitly chooses the mode of operation by adding or removing a
trailing '/' before invoking index_name_exists().

Since these two modes of operations are disjoint and have no code in
common (one searches index_state.name_hash; the other dir_hash), they
can be represented more naturally as distinct functions: one to search
for a file, and one for a directory.

Splitting index searching into two functions relieves callers of the
artificial burden of having to add or remove a slash to select the mode
of operation; instead they just call the desired function. A subsequent
patch will take advantage of this benefit in order to eliminate the
requirement that the incoming pathname for a directory search must have
a trailing slash.

(In order to avoid disturbing in-flight topics, index_name_exists() is
retained as a thin wrapper dispatching either to index_dir_exists() or
index_file_exists().)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 10:07:13 -07:00
Junio C Hamano c7c377d83f Merge branch 'jk/config-int-range-check'
"git config" did not provide a way to set or access numbers larger
than a native "int" on the platform; it now provides 64-bit signed
integers on all platforms.

* jk/config-int-range-check:
  git-config: always treat --int as 64-bit internally
  config: make numeric parsing errors more clear
  config: set errno in numeric git_parse_* functions
  config: properly range-check integer values
  config: factor out integer parsing from range checks
2013-09-12 14:41:00 -07:00
Junio C Hamano b0d974d6d9 Merge branch 'tg/index-struct-sizes'
The code that reads from a region that mmaps an on-disk index
assumed that "int"/"short" are always 32/16 bits.

* tg/index-struct-sizes:
  read-cache: use fixed width integer types
2013-09-09 14:50:38 -07:00
Junio C Hamano b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Junio C Hamano 2233ad4534 Merge branch 'jc/push-cas'
Allow a safer "rewind of the remote tip" push than blind "--force",
by requiring that the overwritten remote ref to be unchanged since
the new history to replace it was prepared.

The machinery is more or less ready.  The "--force" option is again
the big red button to override any safety, thanks to J6t's sanity
(the original round allowed --lockref to defeat --force).

The logic to choose the default implemented here is fragile
(e.g. "git fetch" after seeing a failure will update the
remote-tracking branch and will make the next "push" pass,
defeating the safety pretty easily).  It is suitable only for the
simplest workflows, and it may hurt users more than it helps them.

* jc/push-cas:
  push: teach --force-with-lease to smart-http transport
  send-pack: fix parsing of --force-with-lease option
  t5540/5541: smart-http does not support "--force-with-lease"
  t5533: test "push --force-with-lease"
  push --force-with-lease: tie it all together
  push --force-with-lease: implement logic to populate old_sha1_expect[]
  remote.c: add command line option parser for "--force-with-lease"
  builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN
  cache.h: move remote/connect API out of it
2013-09-09 14:30:29 -07:00
Jeff King 0016024277 git-config: always treat --int as 64-bit internally
When you run "git config --int", the maximum size of integer
you get depends on how git was compiled, and what it
considers to be an "int".

This is almost useful, because your scripts calling "git
config" will behave similarly to git internally. But relying
on this is dubious; you have to actually know how git treats
each value internally (e.g., int versus unsigned long),
which is not documented and is subject to change. And even
if you know it is "unsigned long", we do not have a
git-config option to match that behavior.

Furthermore, you may simply be asking git to store a value
on your behalf (e.g., configuration for a hook). In that
case, the relevant range check has nothing at all to do with
git, but rather with whatever scripting tools you are using
(and git has no way of knowing what the appropriate range is
there).

Not only is the range check useless, but it is actively
harmful, as there is no way at all for scripts to look
at config variables with large values. For instance, one
cannot reliably get the value of pack.packSizeLimit via
git-config. On an LP64 system, git happily uses a 64-bit
"unsigned long" internally to represent the value, but the
script cannot read any value over 2G.

Ideally, the "--int" option would simply represent an
arbitrarily large integer. For practical purposes, however,
a 64-bit integer is large enough, and is much easier to
implement (and if somebody overflows it, we will still
notice the problem, and not simply return garbage).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-09 11:12:29 -07:00
Felipe Contreras cf99a761d3 sha1-name: pass len argument to interpret_branch_name()
This is useful to make sure we don't step outside the boundaries of what
we are interpreting at the moment. For example while interpreting
foobar@{u}~1, the job of interpret_branch_name() ends right before ~1,
but there's no way to figure that out inside the function, unless the
len argument is passed.

So let's do that.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-03 11:33:00 -07:00
Thomas Gummerer 7800c1ebcc read-cache: use fixed width integer types
Use the fixed width integer types uint16_t and uint32_t for on-disk
structures; unsigned short and unsigned int do not have a guaranteed
size.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-20 12:29:42 -07:00
Junio C Hamano d50cb7569c Merge branch 'ob/typofixes'
* ob/typofixes:
  many small typofixes
2013-08-01 12:01:01 -07:00
Ondřej Bílka 98e023dea4 many small typofixes
Signed-off-by: Ondřej Bílka <neleai@seznam.cz>
Reviewed-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-29 12:32:25 -07:00
Junio C Hamano 356df9bd8d Merge branch 'jk/cat-file-batch-optim'
If somebody wants to only know on-disk footprint of an object
without having to know its type or payload size, we can bypass a
lot of code to cheaply learn it.

* jk/cat-file-batch-optim:
  Fix some sparse warnings
  sha1_object_info_extended: pass object_info to helpers
  sha1_object_info_extended: make type calculation optional
  packed_object_info: make type lookup optional
  packed_object_info: hoist delta type resolution to helper
  sha1_loose_object_info: make type lookup optional
  sha1_object_info_extended: rename "status" to "type"
  cat-file: disable object/refname ambiguity check for batch mode
2013-07-24 19:21:21 -07:00
Junio C Hamano 988f98f61f Merge branch 'jx/clean-interactive'
Add "interactive" mode to "git clean".

The early part to refactor relative path related helper functions
looked sensible.

* jx/clean-interactive:
  test: run testcases with POSIX absolute paths on Windows
  test: add t7301 for git-clean--interactive
  git-clean: add documentation for interactive git-clean
  git-clean: add ask each interactive action
  git-clean: add select by numbers interactive action
  git-clean: add filter by pattern interactive action
  git-clean: use a git-add-interactive compatible UI
  git-clean: add colors to interactive git-clean
  git-clean: show items of del_list in columns
  git-clean: add support for -i/--interactive
  git-clean: refactor git-clean into two phases
  write_name{_quoted_relative,}(): remove redundant parameters
  quote_path_relative(): remove redundant parameter
  quote.c: substitute path_relative with relative_path
  path.c: refactor relative_path(), not only strip prefix
  test: add test cases for relative_path
2013-07-22 11:24:11 -07:00
Junio C Hamano c714f9fd8a Merge branch 'hv/config-from-blob'
Allow configuration data to be read from in-tree blob objects,
which would help working in a bare repository and submodule
updates.

* hv/config-from-blob:
  do not die when error in config parsing of buf occurs
  teach config --blob option to parse config from database
  config: make parsing stack struct independent from actual data source
  config: drop cf validity check in get_next_char()
  config: factor out config file stack management
2013-07-22 11:24:09 -07:00
Junio C Hamano d3aeb31dc4 Merge branch 'nd/const-struct-cache-entry'
* nd/const-struct-cache-entry:
  Convert "struct cache_entry *" to "const ..." wherever possible
2013-07-22 11:24:01 -07:00
Junio C Hamano cb29dfde48 Merge branch 'tr/protect-low-3-fds'
When "git" is spawned in such a way that any of the low 3 file
descriptors is closed, our first open() may yield file descriptor 2,
and writing error message to it would screw things up in a big way.

* tr/protect-low-3-fds:
  git: ensure 0/1/2 are open in main()
  daemon/shell: refactor redirection of 0/1/2 from /dev/null
2013-07-22 11:23:35 -07:00
Junio C Hamano 802f878b86 Merge branch 'jk/in-pack-size-measurement'
"git cat-file --batch-check=<format>" is added, primarily to allow
on-disk footprint of objects in packfiles (often they are a lot
smaller than their true size, when expressed as deltas) to be
reported.

* jk/in-pack-size-measurement:
  pack-revindex: radix-sort the revindex
  pack-revindex: use unsigned to store number of objects
  cat-file: split --batch input lines on whitespace
  cat-file: add %(objectsize:disk) format atom
  cat-file: add --batch-check=<format>
  cat-file: refactor --batch option parsing
  cat-file: teach --batch to stream blob objects
  t1006: modernize output comparisons
  teach sha1_object_info_extended a "disk_size" query
  zero-initialize object_info structs
2013-07-18 12:59:41 -07:00
Thomas Rast 1d999ddd1d daemon/shell: refactor redirection of 0/1/2 from /dev/null
Both daemon.c and shell.c contain logic to open FDs 0/1/2 from
/dev/null if they are not already open.  Move the function in daemon.c
to setup.c and use it in shell.c, too.

While there, remove a 'not' that inverted the meaning of the comment.
The point is indeed to *avoid* messing up.

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-17 12:50:34 -07:00
Nguyễn Thái Ngọc Duy 93d9353716 parse_pathspec: accept :(icase)path syntax
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 12:14:38 -07:00
Nguyễn Thái Ngọc Duy bd30c2e484 pathspec: support :(glob) syntax
:(glob)path differs from plain pathspec that it uses wildmatch with
WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The
difference lies in how '*' (and '**') is processed.

With the introduction of :(glob) and :(literal) and their global
options --[no]glob-pathspecs, the user can:

 - make everything literal by default via --noglob-pathspecs
   --literal-pathspecs cannot be used for this purpose as it
   disables _all_ pathspec magic.

 - individually turn on globbing with :(glob)

 - make everything globbing by default via --glob-pathspecs

 - individually turn off globbing with :(literal)

The implication behind this is, there is no way to gain the default
matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get
new globbing or literal. The old fnmatch behavior is considered
deprecated and discouraged to use.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:10 -07:00
Nguyễn Thái Ngọc Duy 645a29c40a parse_pathspec: make sure the prefix part is wildcard-free
Prepending prefix to pathspec is a trick to workaround the fact that
commands can be executed in a subdirectory, but all git commands run
at worktree's root. The prefix part should always be treated as
literal string. Make it so.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Nguyễn Thái Ngọc Duy 3efe8e4381 convert add_files_to_cache to take struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:08 -07:00
Nguyễn Thái Ngọc Duy 9b2d61499b convert refresh_index to take struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:08 -07:00
Nguyễn Thái Ngọc Duy 17ddc66e70 convert report_path_error to take struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:08 -07:00
Nguyễn Thái Ngọc Duy 5ab2a2dabd convert read_cache_preload() to take struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:08 -07:00
Nguyễn Thái Ngọc Duy 64acde94ef move struct pathspec and related functions to pathspec.[ch]
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Jeff King 5b0864070e sha1_object_info_extended: make type calculation optional
Each caller of sha1_object_info_extended sets up an
object_info struct to tell the function which elements of
the object it wants to get. Until now, getting the type of
the object has always been required (and it is returned via
the return type rather than a pointer in object_info).

This can involve actually opening a loose object file to
determine its type, or following delta chains to determine a
packed file's base type. These effects produce a measurable
slow-down when doing a "cat-file --batch-check" that does
not include %(objecttype).

This patch adds a "typep" query to struct object_info, so
that it can be optionally queried just like size and
disk_size. As a result, the return type of the function is
no longer the object type, but rather 0/-1 for success/error.

As there are only three callers total, we just fix up each
caller rather than keep a compatibility wrapper:

  1. The simpler sha1_object_info wrapper continues to
     always ask for and return the type field.

  2. The istream_source function wants to know the type, and
     so always asks for it.

  3. The cat-file batch code asks for the type only when
     %(objecttype) is part of the format string.

On linux.git, the best-of-five for running:

  $ git rev-list --objects --all >objects
  $ time git cat-file --batch-check='%(objectsize:disk)'

on a fully packed repository goes from:

  real    0m8.680s
  user    0m8.160s
  sys     0m0.512s

to:

  real    0m7.205s
  user    0m6.580s
  sys     0m0.608s

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:16:36 -07:00
Jeff King 25fba78d36 cat-file: disable object/refname ambiguity check for batch mode
A common use of "cat-file --batch-check" is to feed a list
of objects from "rev-list --objects" or a similar command.
In this instance, all of our input objects are 40-byte sha1
ids. However, cat-file has always allowed arbitrary revision
specifiers, and feeds the result to get_sha1().

Fortunately, get_sha1() recognizes a 40-byte sha1 before
doing any hard work trying to look up refs, meaning this
scenario should end up spending very little time converting
the input into an object sha1. However, since 798c35f
(get_sha1: warn about full or short object names that look
like refs, 2013-05-29), when we encounter this case, we
spend the extra effort to do a refname lookup anyway, just
to print a warning. This is further exacerbated by ca91993
(get_packed_ref_cache: reload packed-refs file when it
changes, 2013-06-20), which makes individual ref lookup more
expensive by requiring a stat() of the packed-refs file for
each missing ref.

With no patches, this is the time it takes to run:

  $ git rev-list --objects --all >objects
  $ time git cat-file --batch-check='%(objectname)' <objects

on the linux.git repository:

  real    1m13.494s
  user    0m25.924s
  sys     0m47.532s

If we revert ca91993, the packed-refs up-to-date check, it
gets a little better:

  real    0m54.697s
  user    0m21.692s
  sys     0m32.916s

but we are still spending quite a bit of time on ref lookup
(and we would not want to revert that patch, anyway, which
has correctness issues).  If we revert 798c35f, disabling
the warning entirely, we get a much more reasonable time:

  real    0m7.452s
  user    0m6.836s
  sys     0m0.608s

This patch does the moral equivalent of this final case (and
gets similar speedups). We introduce a global flag that
callers of get_sha1() can use to avoid paying the price for
the warning.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:09:56 -07:00
Heiko Voigt 1bc888193e teach config --blob option to parse config from database
This can be used to read configuration values directly from git's
database. For example it is useful for reading to be checked out
.gitmodules files directly from the database.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:34:57 -07:00
Nguyễn Thái Ngọc Duy 9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Junio C Hamano 47a5918536 cache.h: move remote/connect API out of it
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me.  This structure is not
about the local ref used by sha1-name API to name local objects.

It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values.  It belongs to "remote.h" together
with "struct refspec".

While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-08 14:34:24 -07:00
Jeff King 161f00e708 teach sha1_object_info_extended a "disk_size" query
Using sha1_object_info_extended, a caller can find out the
type of an object, its size, and information about where it
is stored. In addition to the object's "true" size, it can
also be useful to know the size that the object takes on
disk (e.g., to generate statistics about which refs consume
space).

This patch adds a "disk_sizep" field to "struct object_info",
and fills it in during sha1_object_info_extended if it is
non-NULL.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-07 10:53:22 -07:00
Junio C Hamano 079424a2cf Merge branch 'mh/ref-races'
"git pack-refs" that races with new ref creation or deletion have
been susceptible to lossage of refs under right conditions, which
has been tightened up.

* mh/ref-races:
  for_each_ref: load all loose refs before packed refs
  get_packed_ref_cache: reload packed-refs file when it changes
  add a stat_validity struct
  Extract a struct stat_data from cache_entry
  packed_ref_cache: increment refcount when locked
  do_for_each_entry(): increment the packed refs cache refcount
  refs: manage lifetime of packed refs cache via reference counting
  refs: implement simple transactions for the packed-refs file
  refs: wrap the packed refs cache in a level of indirection
  pack_refs(): split creation of packed refs and entry writing
  repack_without_ref(): split list curation and entry writing
2013-06-30 15:40:05 -07:00
Jiang Xin e02ca72f70 path.c: refactor relative_path(), not only strip prefix
Original design of relative_path() is simple, just strip the prefix
(*base) from the absolute path (*abs).

In most cases, we need a real relative path, such as: ../foo,
../../bar.  That's why there is another reimplementation
(path_relative()) in quote.c.

Borrow some codes from path_relative() in quote.c to refactor
relative_path() in path.c, so that it could return real relative
path, and user can reuse this function without reimplementing
his/her own.  The function path_relative() in quote.c will be
substituted, and I would use the new relative_path() function when
implementing the interactive git-clean later.

Different results for relative_path() before and after this refactor:

    abs path  base path  relative (original)  relative (refactor)
    ========  =========  ===================  ===================
    /a/b      /a/b       .                    ./
    /a/b/     /a/b       .                    ./
    /a        /a/b/      /a                   ../
    /         /a/b/      /                    ../../
    /a/c      /a/b/      /a/c                 ../c
    /x/y      /a/b/      /x/y                 ../../x/y

    a/b/      a/b/       .                    ./
    a/b/      a/b        .                    ./
    a         a/b        a                    ../
    x/y       a/b/       x/y                  ../../x/y
    a/c       a/b        a/c                  ../c

    (empty)   (null)     (empty)              ./
    (empty)   (empty)    (empty)              ./
    (empty)   /a/b       (empty)              ./
    (null)    (null)     (null)               ./
    (null)    (empty)    (null)               ./
    (null)    /a/b       (segfault)           ./

You may notice that return value "." has been changed to "./".
It is because:

 * Function quote_path_relative() in quote.c will show the relative
   path as "./" if abs(in) and base(prefix) are the same.

 * Function relative_path() is called only once (in setup.c), and
   it will be OK for the return value as "./" instead of ".".

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-26 09:59:00 -07:00
Junio C Hamano 8f0c843aab Merge branch 'nd/traces'
* nd/traces:
  git.txt: document GIT_TRACE_PACKET
  core: use env variable instead of config var to turn on logging pack access
2013-06-20 16:02:28 -07:00
Michael Haggerty 3861253224 add a stat_validity struct
It can sometimes be useful to know whether a path in the
filesystem has been updated without going to the work of
opening and re-reading its content. We trust the stat()
information on disk already to handle index updates, and we
can use the same trick here.

This patch introduces a "stat_validity" struct which
encapsulates the concept of checking the stat-freshness of a
file. It is implemented on top of "struct stat_data" to
reuse the logic about which stat entries to trust for a
particular platform, but hides the complexity behind two
simple functions: check and update.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20 15:50:17 -07:00
Michael Haggerty c21d39d7c7 Extract a struct stat_data from cache_entry
Add public functions fill_stat_data() and match_stat_data() to work
with it.  This infrastructure will later be used to check the validity
of other types of file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20 15:50:17 -07:00
Junio C Hamano dd261b1727 Merge branch 'rs/unpack-trees-plug-leak'
* rs/unpack-trees-plug-leak:
  unpack-trees: free cache_entry array members for merges
  diff-lib, read-tree, unpack-trees: mark cache_entry array paramters const
  diff-lib, read-tree, unpack-trees: mark cache_entry pointers const
  unpack-trees: create working copy of merge entry in merged_entry
  unpack-trees: factor out dup_entry
  read-cache: mark cache_entry pointers const
  cache: mark cache_entry pointers const
2013-06-11 13:30:05 -07:00
Nguyễn Thái Ngọc Duy b12ca9631f core: use env variable instead of config var to turn on logging pack access
5f44324 (core: log offset pack data accesses happened - 2011-07-06)
provides a way to observe pack access patterns via a config
switch. Setting an environment variable looks more obvious than a
config var, especially when you just need to _observe_, and more
inline with other tracing knobs we have.

Document it as it may be useful for remote troubleshooting.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-09 16:07:50 -07:00
Junio C Hamano db400949b3 Merge branch 'jk/fetch-always-update-tracking'
"git fetch origin master" unlike "git fetch origin" or "git fetch"
did not update "refs/remotes/origin/master"; this was an early
design decision to keep the update of remote tracking branches
predictable, but in practice it turns out that people find it more
convenient to opportunisticly update them whenever we have a chance,
and we have been updating them when we run "git push" which already
breaks the original "predictability" anyway.

Now such a fetch does update refs/remotes/origin/master.

* jk/fetch-always-update-tracking:
  fetch: don't try to update unfetched tracking refs
  fetch: opportunistically update tracking refs
  refactor "ref->merge" flag
  fetch/pull doc: untangle meaning of bare <ref>
  t5510: start tracking-ref tests from a known state
2013-06-02 15:57:26 -07:00
René Scharfe 21a6b9fa42 read-cache: mark cache_entry pointers const
ie_match_stat and ie_modified only derefence their struct cache_entry
pointers for reading.  Add const to the parameter declaration here and
do the same for the static helper function used by them, as it's the
same there as well.  This allows callers to pass in const pointers.

Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02 15:31:12 -07:00
René Scharfe 20d142b48c cache: mark cache_entry pointers const
Add const for pointers that are only dereferenced for reading by the
inline functions copy_cache_entry and ce_mode_from_stat.  This allows
callers to pass in const pointers.

Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02 15:31:12 -07:00
Junio C Hamano 3e1e7624aa Merge branch 'jc/prune-all'
We used the approxidate() parser for "--expire=<timestamp>" options
of various commands, but it is better to treat --expire=all and
--expire=now a bit more specially than using the current timestamp.
Update "git gc" and "git reflog" with a new parsing function for
expiry dates.

* jc/prune-all:
  prune: introduce OPT_EXPIRY_DATE() and use it
  api-parse-options.txt: document "no-" for non-boolean options
  git-gc.txt, git-reflog.txt: document new expiry options
  date.c: add parse_expiry_date()
2013-05-29 14:23:04 -07:00
Jeff King 900f2814b8 refactor "ref->merge" flag
Each "struct ref" has a boolean flag that is set by the
fetch code to determine whether the ref should be marked as
"not-for-merge" or not when we write it out to FETCH_HEAD.

It would be useful to turn this boolean into a tri-state,
with the third state meaning "do not bother writing it out
to FETCH_HEAD at all". That would let us add extra refs to
the set of refs to be stored (e.g., to store copies of
things we fetched) without impacting FETCH_HEAD.

This patch turns it into an enum that covers the tri-state
case, and hopefully makes the code more explicit and easier
to read.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-12 15:23:48 -07:00
Junio C Hamano 808d3d717e git add: -u/-A now affects the entire working tree
As promised in 0fa2eb530f (add: warn when -u or -A is used without
pathspec, 2013-01-28), in Git 2.0, "git add -u/-A" that is run
without pathspec in a subdirectory updates all updated paths in the
entire working tree, not just the current directory and its
subdirectories.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-26 16:09:21 -07:00
Junio C Hamano 4b35b007a6 Merge branch 'lf/read-blob-data-from-index'
Reduce duplicated code between convert.c and attr.c.

* lf/read-blob-data-from-index:
  convert.c: remove duplicate code
  read_blob_data_from_index(): optionally return the size of blob data
  attr.c: extract read_index_data() as read_blob_data_from_index()
2013-04-21 18:39:45 -07:00
Junio C Hamano de91daf5e6 Merge branch 'jn/add-2.0-u-A-sans-pathspec' (early part)
In Git 2.0, "git add -u" and "git add -A" without any pathspec will
update the index for all paths, including those outside the current
directory, making it more consistent with "commit -a".  To help the
migration pain, a warning is issued when the differences between the
current behaviour and the upcoming behaviour matters, i.e. when the
user has local changes outside the current directory.

* 'jn/add-2.0-u-A-sans-pathspec' (early part):
  add -A: only show pathless 'add -A' warning when changes exist outside cwd
  add -u: only show pathless 'add -u' warning when changes exist outside cwd
  add: make warn_pathless_add() a no-op after first call
  add: add a blank line at the end of pathless 'add [-u|-A]' warning
  add: make pathless 'add [-u|-A]' warning a file-global function
2013-04-19 13:37:36 -07:00
Junio C Hamano 3d27b9b005 date.c: add parse_expiry_date()
"git reflog --expire=all" tries to expire reflog entries up to the
current second, because the approxidate() parser gives the current
timestamp for anything it does not understand (and it does not know
what time "all" means).  When the user tells us to expire "all" (or
set the expiration time to "now"), the user wants to remove all the
reflog entries (no reflog entry should record future time).

Just set it to ULONG_MAX and to let everything that is older that
timestamp expire.

While at it, allow "now" to be treated the same way for callers that
parse expiry date timestamp with this function.  Also use an error
reporting version of approxidate() to report misspelled date.  When
the user says e.g. "--expire=mnoday" to delete entries two days or
older on Wednesday, we wouldn't want the "unknown, default to now"
logic to kick in.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 16:03:56 -07:00
Lukas Fleischer ff36682505 read_blob_data_from_index(): optionally return the size of blob data
This allows for optionally getting the size of the returned data and
will be used in a follow-up patch.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:51:47 -07:00
Lukas Fleischer 29fb37b272 attr.c: extract read_index_data() as read_blob_data_from_index()
Extract the read_index_data() function from attr.c and move it to
read-cache.c; rename it to read_blob_data_from_index() and update
the function signature of it to align better with index/cache API
functions.

This allows for reusing the function in convert.c later.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:49:11 -07:00
Junio C Hamano e65cdde454 Merge branch 'tb/shared-perm'
Simplifies adjust_shared_perm() implementation.

* tb/shared-perm:
  path.c: optimize adjust_shared_perm()
  path.c: simplify adjust_shared_perm()
2013-04-07 14:33:11 -07:00
Torsten Bögershausen 3a429d3b8d path.c: simplify adjust_shared_perm()
All calls to set_shared_perm() use mode == 0, so simplify the
function.

Because all callers use the macro adjust_shared_perm(path) from
cache.h to call this function, convert it to a proper function,
losing set_shared_perm().

Since path.c has much more functions than just mkpath() these days,
drop the stale comment about it.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05 12:37:55 -07:00
Jonathan Nieder 71c7b0538f add -u: only show pathless 'add -u' warning when changes exist outside cwd
A common workflow in large projects is to chdir into a subdirectory of
interest and only do work there:

	cd src
	vi foo.c
	make test
	git add -u
	git commit

The upcoming change to 'git add -u' behavior would not affect such a
workflow: when the only changes present are in the current directory,
'git add -u' will add all changes, and whether that happens via an
implicit "." or implicit ":/" parameter is an unimportant
implementation detail.

The warning about use of 'git add -u' with no pathspec is annoying
because it seemingly serves no purpose in this case.  So suppress the
warning unless there are changes outside the cwd that are not being
added.

A previous version of this patch ran two I/O-intensive diff-files
passes: one to find changes outside the cwd, and another to find
changes to add to the index within the cwd.  This version runs one
full-tree diff and decides for each change whether to add it or warn
and suppress it in update_callback.  As a result, even on very large
repositories "git add -u" will not be significantly slower than the
future default behavior ("git add -u :/"), and the slowdown relative
to "git add -u ." should be a useful clue to users of such
repositories to get into the habit of explicitly passing '.'.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-03 11:34:22 -07:00
Junio C Hamano 97fefaf6d3 Merge branch 'nd/checkout-paths-reduce-match-pathspec-calls'
Consolidate repeated pathspec matches on the same paths, while
fixing a bug in "git checkout dir/" code started from an unmerged
index.

* nd/checkout-paths-reduce-match-pathspec-calls:
  checkout: avoid unnecessary match_pathspec calls
2013-04-03 09:34:00 -07:00
Junio C Hamano eeecf39397 Merge branch 'jk/alias-in-bare' into maint
An aliased command spawned from a bare repository that does not say
it is bare with "core.bare = yes" is treated as non-bare by mistake.

* jk/alias-in-bare:
  setup: suppress implicit "." work-tree for bare repos
  environment: add GIT_PREFIX to local_repo_env
  cache.h: drop LOCAL_REPO_ENV_SIZE
2013-04-03 09:25:41 -07:00
Junio C Hamano 92e0d91632 Sync with 1.8.1 maintenance track
* maint-1.8.1:
  Start preparing for 1.8.1.6
  git-tag(1): we tag HEAD by default
  Fix revision walk for commits with the same dates
  t2003: work around path mangling issue on Windows
  pack-refs: add fully-peeled trait
  pack-refs: write peeled entry for non-tags
  use parse_object_or_die instead of die("bad object")
  avoid segfaults on parse_object failure
  entry: fix filter lookup
  t2003: modernize style
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-03 09:18:01 -07:00
Junio C Hamano c81e2c61b3 Merge branch 'kb/name-hash' into maint-1.8.1
* kb/name-hash:
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-03 08:44:54 -07:00
Junio C Hamano c044bed8f0 Merge branch 'kb/name-hash'
The code to keep track of what directory names are known to Git on
platforms with case insensitive filesystems can get confused upon
a hash collision between these pathnames and looped forever.

* kb/name-hash:
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-01 08:59:53 -07:00
Junio C Hamano e013bdab0f Merge branch 'jk/pkt-line-cleanup'
Clean up pkt-line API, implementation and its callers to make them
more robust.

* jk/pkt-line-cleanup:
  do not use GIT_TRACE_PACKET=3 in tests
  remote-curl: always parse incoming refs
  remote-curl: move ref-parsing code up in file
  remote-curl: pass buffer straight to get_remote_heads
  teach get_remote_heads to read from a memory buffer
  pkt-line: share buffer/descriptor reading implementation
  pkt-line: provide a LARGE_PACKET_MAX static buffer
  pkt-line: move LARGE_PACKET_MAX definition from sideband
  pkt-line: teach packet_read_line to chomp newlines
  pkt-line: provide a generic reading function with options
  pkt-line: drop safe_write function
  pkt-line: move a misplaced comment
  write_or_die: raise SIGPIPE when we get EPIPE
  upload-archive: use argv_array to store client arguments
  upload-archive: do not copy repo name
  send-pack: prefer prefixcmp over memcmp in receive_status
  fetch-pack: fix out-of-bounds buffer offset in get_ack
  upload-pack: remove packet debugging harness
  upload-pack: do not add duplicate objects to shallow list
  upload-pack: use get_sha1_hex to parse "shallow" lines
2013-04-01 08:59:37 -07:00
Junio C Hamano e96a3b3649 Merge branch 'rs/archive-zip-raw-compression'
* rs/archive-zip-raw-compression:
  archive-zip: use deflateInit2() to ask for raw compressed data
2013-03-27 09:28:53 -07:00
Nguyễn Thái Ngọc Duy e721c1544f checkout: avoid unnecessary match_pathspec calls
In checkout_paths() we do this

 - for all updated items, call match_pathspec
 - for all items, call match_pathspec (inside unmerge_cache)
 - for all items, call match_pathspec (for showing "path .. is unmerged)
 - for updated items, call match_pathspec and update paths

That's a lot of duplicate match_pathspec(s) and the function is not
exactly cheap to be called so many times, especially on large indexes.
This patch makes it call match_pathspec once per updated index entry,
save the result in ce_flags and reuse the results in the following
loops.

The changes in 0a1283b (checkout $tree $path: do not clobber local
changes in $path not in $tree - 2011-09-30) limit the affected paths
to ones we read from $tree. We do not do anything to other modified
entries in this case, so the "for all items" above could be modified
to "for all updated items". But..

The command's behavior now is modified slightly: unmerged entries that
match $path, but not updated by $tree, are now NOT touched.  Although
this should be considered a bug fix, not a regression. A new test is
added for this change.

And while at there, free ps_matched after use.

The following command is tested on webkit, 215k entries. The pattern
is chosen mainly to make match_pathspec sweat:

git checkout -- "*[a-zA-Z]*[a-zA-Z]*[a-zA-Z]*"

        before      after
real    0m3.493s    0m2.737s
user    0m2.239s    0m1.586s
sys     0m1.252s    0m1.151s

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 08:53:15 -07:00
Junio C Hamano fb3b7b1f95 Merge branch 'jk/alias-in-bare'
An aliased command spawned from a bare repository that does not say
it is bare with "core.bare = yes" is treated as non-bare by mistake.

* jk/alias-in-bare:
  setup: suppress implicit "." work-tree for bare repos
  environment: add GIT_PREFIX to local_repo_env
  cache.h: drop LOCAL_REPO_ENV_SIZE
2013-03-25 14:00:44 -07:00
Junio C Hamano f5715de54a Merge branch 'nd/count-garbage'
"git count-objects -v" did not count leftover temporary packfiles
and other kinds of garbage.

* nd/count-garbage:
  count-objects: report how much disk space taken by garbage files
  count-objects: report garbage files in pack directory too
  sha1_file: reorder code in prepare_packed_git_one()
  git-count-objects.txt: describe each line in -v output
2013-03-21 14:02:34 -07:00
Junio C Hamano e4e1c54990 Merge branch 'jc/fetch-raw-sha1'
Allows requests to fetch objects at any tip of refs (including
hidden ones).  It seems that there may be use cases even outside
Gerrit (e.g. $gmane/215701).

* jc/fetch-raw-sha1:
  fetch: fetch objects by their exact SHA-1 object names
  upload-pack: optionally allow fetching from the tips of hidden refs
  fetch: use struct ref to represent refs to be fetched
  parse_fetch_refspec(): clarify the codeflow a bit
2013-03-21 14:02:27 -07:00
René Scharfe c3c2e1a09b archive-zip: use deflateInit2() to ask for raw compressed data
We use the function git_deflate_init() -- which wraps the zlib function
deflateInit() -- to initialize compression of ZIP file entries.  This
results in compressed data prefixed with a two-bytes long header and
followed by a four-bytes trailer.  ZIP file entries consist of ZIP
headers and raw compressed data instead, so we remove the zlib wrapper
before writing the result.

We can ask zlib for the the raw compressed data without the unwanted
parts in the first place by using deflateInit2() and specifying a
negative number of bits to size the window.  For that purpose, factor
out the function do_git_deflate_init() and add git_deflate_init_raw(),
which wraps it.  Then use the latter in archive-zip.c and get rid of
the code that stripped the zlib header and trailer.

Also rename the helper function zlib_deflate() to zlib_deflate_raw()
to reflect the change.

Thus we avoid generating data that we throw away anyway, the code
becomes shorter and some magic constants are removed.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-16 22:07:02 -07:00
Jeff King 2cd83d10bb setup: suppress implicit "." work-tree for bare repos
If an explicit GIT_DIR is given without a working tree, we
implicitly assume that the current working directory should
be used as the working tree. E.g.,:

  GIT_DIR=/some/repo.git git status

would compare against the cwd.

Unfortunately, we fool this rule for sub-invocations of git
by setting GIT_DIR internally ourselves. For example:

  git init foo
  cd foo/.git
  git status ;# fails, as we expect
  git config alias.st status
  git status ;# does not fail, but should

What happens is that we run setup_git_directory when doing
alias lookup (since we need to see the config), set GIT_DIR
as a result, and then leave GIT_WORK_TREE blank (because we
do not have one). Then when we actually run the status
command, we do setup_git_directory again, which sees our
explicit GIT_DIR and uses the cwd as an implicit worktree.

It's tempting to argue that we should be suppressing that
second invocation of setup_git_directory, as it could use
the values we already found in memory. However, the problem
still exists for sub-processes (e.g., if "git status" were
an external command).

You can see another example with the "--bare" option, which
sets GIT_DIR explicitly. For example:

  git init foo
  cd foo/.git
  git status ;# fails
  git --bare status ;# does NOT fail

We need some way of telling sub-processes "even though
GIT_DIR is set, do not use cwd as an implicit working tree".
We could do it by putting a special token into
GIT_WORK_TREE, but the obvious choice (an empty string) has
some portability problems.

Instead, we add a new boolean variable, GIT_IMPLICIT_WORK_TREE,
which suppresses the use of cwd as a working tree when
GIT_DIR is set. We trigger the new variable when we know we
are in a bare setting.

The variable is left intentionally undocumented, as this is
an internal detail (for now, anyway). If somebody comes up
with a good alternate use for it, and once we are confident
we have shaken any bugs out of it, we can consider promoting
it further.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-08 14:02:40 -08:00
Jeff King a6f7f9a325 environment: add GIT_PREFIX to local_repo_env
The GIT_PREFIX variable is set based on our location within
the working tree. It should therefore be cleared whenever
GIT_WORK_TREE is cleared.

In practice, this doesn't cause any bugs, because none of
the sub-programs we invoke with local_repo_env cleared
actually care about GIT_PREFIX. But this is the right thing
to do, and future proofs us against that assumption changing.

While we're at it, let's define a GIT_PREFIX_ENVIRONMENT
macro; this avoids repetition of the string literal, which
can help catch any spelling mistakes in the code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-08 14:02:31 -08:00
Jeff King 2163e5dbb4 cache.h: drop LOCAL_REPO_ENV_SIZE
We keep a static array of variables that should be cleared
when invoking a sub-process on another repo. We statically
size the array with the LOCAL_REPO_ENV_SIZE macro so that
any readers do not have to count it themselves.

As it turns out, no readers actually use the macro, and it
creates a maintenance headache, as modifications to the
array need to happen in two places (one to add the new
element, and another to bump the size).

Since it's NULL-terminated, we can just drop the size macro
entirely. While we're at it, we'll clean up some comments
around it, and add a new mention of it at the top of the
list of environment variable macros. Even though
local_repo_env is right below that list, it's easy to miss,
and additions to that list should consider local_repo_env.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-08 07:55:54 -08:00
Karsten Blees 2092678cd5 name-hash.c: fix endless loop with core.ignorecase=true
With core.ignorecase=true, name-hash.c builds a case insensitive index of
all tracked directories. Currently, the existing cache entry structures are
added multiple times to the same hashtable (with different name lengths and
hash codes). However, there's only one dir_next pointer, which gets
completely messed up in case of hash collisions. In the worst case, this
causes an endless loop if ce == ce->dir_next (see t7062).

Use a separate hashtable and separate structures for the directory index
so that each directory entry has its own next pointer. Use reference
counting to track which directory entry contains files.

There are only slight changes to the name-hash.c API:
- new free_name_hash() used by read_cache.c::discard_index()
- remove_name_hash() takes an additional index_state parameter
- index_name_exists() for a directory (trailing '/') may return a cache
  entry that has been removed (CE_UNHASHED). This is not a problem as the
  return value is only used to check if the directory exists (dir.c) or to
  normalize casing of directory names (read-cache.c).

Getting rid of cache_entry.dir_next reduces memory consumption, especially
with core.ignorecase=false (which doesn't use that member at all).

With core.ignorecase=true, building the directory index is slightly faster
as we add / check the parent directory first (instead of going through all
directory levels for each file in the index). E.g. with WebKit (~200k
files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms
to 130ms.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-27 23:29:04 -08:00
Jeff King 85edf4f58b teach get_remote_heads to read from a memory buffer
Now that we can read packet data from memory as easily as a
descriptor, get_remote_heads can take either one as a
source. This will allow further refactoring in remote-curl.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 00:17:38 -08:00
Nguyễn Thái Ngọc Duy 543c5caa6c count-objects: report garbage files in pack directory too
prepare_packed_git_one() is modified to allow count-objects to hook a
report function to so we don't need to duplicate the pack searching
logic in count-objects.c. When report_pack_garbage is NULL, the
overhead is insignificant.

The garbage is reported with warning() instead of error() in packed
garbage case because it's not an error to have garbage. Loose garbage
is still reported as errors and will be converted to warnings later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-15 08:13:13 -08:00
Junio C Hamano f2db854d24 fetch: use struct ref to represent refs to be fetched
Even though "git fetch" has full infrastructure to parse refspecs to
be fetched and match them against the list of refs to come up with
the final list of refs to be fetched, the list of refs that are
requested to be fetched were internally converted to a plain list of
strings at the transport layer and then passed to the underlying
fetch-pack driver.

Stop this conversion and instead pass around an array of refs.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-07 13:53:59 -08:00
Junio C Hamano 370855e967 Merge branch 'jc/push-reject-reasons'
Improve error and advice messages given locally when "git push"
refuses when it cannot compute fast-forwardness by separating these
cases from the normal "not a fast-forward; merge first and push
again" case.

* jc/push-reject-reasons:
  push: finishing touches to explain REJECT_ALREADY_EXISTS better
  push: introduce REJECT_FETCH_FIRST and REJECT_NEEDS_FORCE
  push: further simplify the logic to assign rejection reason
  push: further clean up fields of "struct ref"
2013-02-04 10:25:04 -08:00
Junio C Hamano 099ba556d0 Merge branch 'jk/config-parsing-cleanup'
Configuration parsing for tar.* configuration variables were
broken. Introduce a new config-keyname parser API to make the
callers much less error prone.

* jk/config-parsing-cleanup:
  reflog: use parse_config_key in config callback
  help: use parse_config_key for man config
  submodule: simplify memory handling in config parsing
  submodule: use parse_config_key when parsing config
  userdiff: drop parse_driver function
  convert some config callbacks to parse_config_key
  archive-tar: use parse_config_key when parsing config
  config: add helper function for parsing key names
2013-02-04 10:24:50 -08:00
Junio C Hamano 149a4211a4 Merge branch 'jc/custom-comment-char'
Allow a configuration variable core.commentchar to customize the
character used to comment out the hint lines in the edited text from
the default '#'.

* jc/custom-comment-char:
  Allow custom "comment char"
2013-02-04 10:23:49 -08:00
Junio C Hamano 070c57df42 Merge branch 'rr/minimal-stat'
Some reimplementations of Git does not write all the stat info back
to the index due to their implementation limitations (e.g. jgit
running on Java).  A configuration option can tell Git to ignore
changes to most of the stat fields and only pay attention to mtime
and size, which these implementations can reliably update.  This
avoids excessive revalidation of contents.

* rr/minimal-stat:
  Enable minimal stat checking
2013-01-30 08:53:02 -08:00
Junio C Hamano ce956fc48e Merge branch 'mh/ceiling' into maint
An element on GIT_CEILING_DIRECTORIES list that does not name the
real path to a directory (i.e. a symbolic link) could have caused
the GIT_DIR discovery logic to escape the ceiling.

* mh/ceiling:
  string_list_longest_prefix(): remove function
  setup_git_directory_gently_1(): resolve symlinks in ceiling paths
  longest_ancestor_length(): require prefix list entries to be normalized
  longest_ancestor_length(): take a string_list argument for prefixes
  longest_ancestor_length(): use string_list_split()
  Introduce new function real_path_if_valid()
  real_path_internal(): add comment explaining use of cwd
  Introduce new static function real_path_internal()
2013-01-28 11:07:18 -08:00
Junio C Hamano 75e5c0dc55 push: introduce REJECT_FETCH_FIRST and REJECT_NEEDS_FORCE
When we push to update an existing ref, if:

 * the object at the tip of the remote is not a commit; or
 * the object we are pushing is not a commit,

it won't be correct to suggest to fetch, integrate and push again,
as the old and new objects will not "merge".  We should explain that
the push must be forced when there is a non-committish object is
involved in such a case.

If we do not have the current object at the tip of the remote, we do
not even know that object, when fetched, is something that can be
merged.  In such a case, suggesting to pull first just like
non-fast-forward case may not be technically correct, but in
practice, most such failures are seen when you try to push your work
to a branch without knowing that somebody else already pushed to
update the same branch since you forked, so "pull first" would work
as a suggestion most of the time.  And if the object at the tip is
not a commit, "pull first" will fail, without making any permanent
damage.  As a side effect, it also makes the error message the user
will get during the next "push" attempt easier to understand, now
the user is aware that a non-commit object is involved.

In these cases, the current code already rejects such a push on the
client end, but we used the same error and advice messages as the
ones used when rejecting a non-fast-forward push, i.e. pull from
there and integrate before pushing again.

Introduce new rejection reasons and reword the messages
appropriately.

[jc: with help by Peff on message details]

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-24 14:37:23 -08:00