This patch activates the DO_MATCH_DIRECTORY code in m_p_i(), which
makes "git diff HEAD submodule/" and "git diff HEAD submodule" produce
the same output. Previously only the version without trailing slash
returns the difference (if any).
That's the effect of new ce_path_match(). dir_path_match() is not
executed by the new tests. And it should not introduce regressions.
Previously if path "dir/" is passed in with pathspec "dir/", they
obviously match. With new dir_path_match(), the path becomes
_directory_ "dir" vs pathspec "dir/", which is not executed by the old
code path in m_p_i(). The new code path is executed and produces the
same result.
The other case is pathspec "dir" and path "dir/" is now turned to
"dir" (with DO_MATCH_DIRECTORY). Still the same result before or after
the patch.
So why change? Because of the next patch about clean.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A long time ago, for some reason I was not happy with
match_pathspec(). I created a better version, match_pathspec_depth()
that was suppose to replace match_pathspec()
eventually. match_pathspec() has finally been gone since 6 months
ago. Use the shorter name for match_pathspec_depth().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how m_p_d() is used. And it usage is:
- match against an index entry (ce_path_match or match_pathspec_depth
in ls-files)
- match against a dir_entry from read_directory (dir_path_match and
match_pathspec_depth in clean.c, which will be converted later)
- resolve-undo (rerere.c and ls-files.c)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Making a single preparation run for counting the lines will avoid memory
fragmentation. Also, fix the allocated memory size which was wrong
when sizeof(int *) != sizeof(int), and would have been too small
for sizeof(int *) < sizeof(int), admittedly unlikely.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are calling xrealloc on every single line, the least we can do
is get the right allocation size.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the origin pointers are "interned" and reference-counted, comparing
the pointers rather than the content is enough. The only uninterned
origins are cached values kept in commit->util, but same_suspect is not
called on them.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The semantics of this flag was changed in commit
e1111cef23 inline lookup_replace_object() calls
but wasn't renamed at the time to minimize code churn. Rename it now,
and add a comment explaining its use.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it clear that we don't use fnmatch() anymore.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently "git notes add -C $object" will read the raw bytes from $object,
and then copy those bytes into the note object, which is hardcoded to be
of type blob. This means that if the given $object is a non-blob (e.g.
tree or commit), the raw bytes from that object is copied into a blob
object. This is probably not useful, and certainly not what any sane
user would expect. So disallow it, by erroring out if the $object passed
to the -C option is not a blob.
The fix also applies to the -c option (in which the user is prompted to
edit/verify the note contents in a text editor), and also when -c/-C is
passed to "git notes append" (which appends the $object contents to an
existing note object). In both cases, passing a non-blob $object does not
make sense.
Also add a couple of tests demonstrating expected behavior.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patch extends git config --file interface to allow read config from
stdin.
Editing stdin or setting value in stdin is an error.
Include by absolute path is allowed in stdin config, but not by relative
path.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're going to have more options for config source.
Let's alter git_config_with_options() interface to accept struct with
all source options.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function will be reused to check for other conditions which prevent
write.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If 'src' already ends with a slash, then add_slash() will just return
it, meaning that 'free(src_with_slash)' is actually 'free(src)'. Since
we use 'src' later, this will result in use-after-free.
In fact, this cannot happen because 'src' comes from
internal_copy_pathspec() without the KEEP_TRAILING_SLASH flag, so any
trailing '/' will have been stripped; but static analysis tools are not
clever enough to realise this and so warn that 'src' could be used after
having been free'd. Fix this by checking that 'src_w_slash' is indeed
newly allocated memory.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refreshing index requires work tree. So we have two options: always
set up work tree (and refuse to reset if failing to do so), or make
refreshing index optional.
As refreshing index is not the main task, it makes more sense to make
it optional. This allows us to still work in a bare repository to update
what is in the index.
Reported-by: Patrick Palka <patrick@parcs.ath.cx>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge-base --octopus" used to leave cleaning up suboptimal
result to the caller, but now it does the clean-up itself.
* bm/merge-base-octopus-dedup:
merge-base --octopus: reduce the result from get_octopus_merge_bases()
merge-base: separate "--independent" codepath into its own helper
"git repack --max-pack-size=8g" stopped being parsed correctly when
the command was reimplemented in C.
* sb/repack-in-c:
repack: propagate pack-objects options as strings
repack: make parsed string options const-correct
repack: fix typo in max-pack-size option
`gc --auto` takes time and can block the user temporarily (but not any
less annoyingly). Make it run in background on systems that support
it. The only thing lost with running in background is printouts. But
gc output is not really interesting. You can keep it in foreground by
changing gc.autodetach.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lasse Makholm noticed that running "git check-attr" from a place
totally unrelated to $GIT_DIR and $GIT_WORK_TREE does not give
expected results. I think it is because the command does not say it
wants to call setup_work_tree().
We still need to support use cases where only a bare repository is
involved, so unconditionally requiring a working tree would not work
well. Instead, make a call only in a non-bare repository.
We may want to see if we want to do a similar fix in the opposite
direction to check-ignore. The command unconditionally requires a
working tree, but it should be usable in a bare repository just like
check-attr attempts to be.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --mixed is used, entries could be removed from index if the
target ref does not have them. When "reset" is used in preparation for
commit spliting (in a dirty worktree), it could be hard to track what
files to be added back. The new option --intent-to-add simplifies it
by marking all removed files intent-to-add.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
There is no reason to have a hardcoded upper limit of the number of
parents for an octopus merge, created via the graft mechanism, but
there was.
* js/lift-parent-count-limit:
Remove the line length limit for graft files
"git add -A" (no other arguments) in a totally empty working tree
used to emit an error.
* nd/add-empty-fix:
add: don't complain when adding empty project root
"git commit -v" appends the patch to the log message before
editing, and then removes the patch when the editor returned
control. However, the patch was not stripped correctly when the
first modified path was a submodule.
* jl/commit-v-strip-marker:
commit -v: strip diffs and submodule shortlogs from the commit message
Remote repository URL expressed in scp-style host:path notation are
parsed more carefully (e.g. "foo/bar:baz" is local, "[::1]:/~user" asks
to connect to user's home directory on host at address ::1.
* tb/clone-ssh-with-colon-for-port:
git_connect(): use common return point
connect.c: refactor url parsing
git_connect(): refactor the port handling for ssh
git fetch: support host:/~repo
t5500: add test cases for diag-url
git fetch-pack: add --diag-url
git_connect: factor out discovery of the protocol and its parts
git_connect: remove artificial limit of a remote command
t5601: add tests for ssh
t5601: remove clear_ssh, refactor setup_ssh_wrapper
"git fetch --depth=0" was a no-op, and was silently ignored.
Diagnose it as an error.
* nd/transport-positive-depth-only:
clone,fetch: catch non positive --depth option value
When a repo was fully repacked, and is repacked again, we may run
into the situation that "new" packfiles have the same name as
already existing ones (traditionally packfiles have been named after
the list of names of objects in them, so repacking all the objects
in a single pack would have produced a packfile with the same name).
The logic is to rename the existing ones into filename like
"old-XXX", create the new ones and then remove the "old-" ones.
When something went wrong in the middle, this sequence is rolled
back by renaming the "old-" files back.
The renaming into "old-" did not work as intended, because
file_exists() was done on "XXX", not "pack-XXX". Also when rolling
back the change, the code tried to rename "old-pack-XXX" but the
saved ones are named "old-XXX", so this couldn't have worked.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --prefix, --default, and --resolve-git-dir options to
git-rev-parse require an argument, but when given no argument,
the code uses the NULL read from argv[argc] without checking,
leading to a segfault.
Instead, check first and die() with an error message.
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nicolas Vigier <boklm@mars-attacks.org>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git repack --max-pack-size=8g" stopped being parsed correctly when
the command was reimplemented in C.
* sb/repack-in-c:
repack: propagate pack-objects options as strings
repack: make parsed string options const-correct
repack: fix typo in max-pack-size option
Code clean-up and protection against concurrent write access to the
ref namespace.
* mh/safe-create-leading-directories:
rename_tmp_log(): on SCLD_VANISHED, retry
rename_tmp_log(): limit the number of remote_empty_directories() attempts
rename_tmp_log(): handle a possible mkdir/rmdir race
rename_ref(): extract function rename_tmp_log()
remove_dir_recurse(): handle disappearing files and directories
remove_dir_recurse(): tighten condition for removing unreadable dir
lock_ref_sha1_basic(): if locking fails with ENOENT, retry
lock_ref_sha1_basic(): on SCLD_VANISHED, retry
safe_create_leading_directories(): add new error value SCLD_VANISHED
cmd_init_db(): when creating directories, handle errors conservatively
safe_create_leading_directories(): introduce enum for return values
safe_create_leading_directories(): always restore slash at end of loop
safe_create_leading_directories(): split on first of multiple slashes
safe_create_leading_directories(): rename local variable
safe_create_leading_directories(): add explicit "slash" pointer
safe_create_leading_directories(): reduce scope of local variable
safe_create_leading_directories(): fix format of "if" chaining
In the original shell version of git-repack, any options
destined for pack-objects were left as strings, and passed
as a whole. Since the C rewrite in commit a1bbc6c (repack:
rewrite the shell script in C, 2013-09-15), we now parse
these values to integers internally, then reformat the
integers when passing the option to pack-objects.
This has the advantage that we catch format errors earlier
(i.e., when repack is invoked, rather than when pack-objects
is invoked).
It has three disadvantages, though:
1. Our internal data types may not be the right size. In
the case of "--window-memory" and "--max-pack-size",
these are "unsigned long" in pack-objects, but we can
only represent a regular "int".
2. Our parsing routines might not be the same as those of
pack-objects. For the two options above, pack-objects
understands "100m" to mean "100 megabytes", but repack
does not.
3. We have to keep a sentinel value to know whether it is
worth passing the option along. In the case of
"--window-memory", we currently do not pass it if the
value is "0". But that is a meaningful value to
pack-objects, where it overrides any configured value.
We can fix all of these by simply passing the strings from
the user along to pack-objects verbatim. This does not
actually fix anything for "--depth" or "--window", but these
are converted, too, for consistency.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use OPT_STRING to parse an option, we get back a
pointer into the argv array, which should be "const char *".
The compiler doesn't notice because it gets passed through a
"void *" in the option struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we see "--max-pack-size", we accidentally propagated
this to pack-objects as "--max_pack_size", which does not
work at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).
* nd/shallow-clone: (31 commits)
t5537: fix incorrect expectation in test case 10
shallow: remove unused code
send-pack.c: mark a file-local function static
git-clone.txt: remove shallow clone limitations
prune: clean .git/shallow after pruning objects
clone: use git protocol for cloning shallow repo locally
send-pack: support pushing from a shallow clone via http
receive-pack: support pushing to a shallow clone via http
smart-http: support shallow fetch/clone
remote-curl: pass ref SHA-1 to fetch-pack as well
send-pack: support pushing to a shallow clone
receive-pack: allow pushes that update .git/shallow
connected.c: add new variant that runs with --shallow-file
add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
receive/send-pack: support pushing from a shallow clone
receive-pack: reorder some code in unpack()
fetch: add --update-shallow to accept refs that update .git/shallow
upload-pack: make sure deepening preserves shallow roots
fetch: support fetching from a shallow repository
clone: support remote shallow repository
...
Our xwrite wrapper already deals with a few potential hazards, and
are as such more robust. Prefer it instead of write to get the
robustness benefits everywhere.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-and-improved-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A "gc" process running as a different user should be able to stop a
new "gc" process from starting.
* km/gc-eperm:
gc: notice gc processes run by other users
"git mv A B/", when B does not exist as a directory, should error
out, but it didn't.
* mm/mv-file-to-no-such-dir-with-slash:
mv: let 'git mv file no-such-dir/' error out on Windows, too
mv: let 'git mv file no-such-dir/' error out
"git rev-parse <revs> -- <paths>" did not implement the usual
disambiguation rules the commands in the "git log" family used in
the same way.
* jk/rev-parse-double-dashes:
rev-parse: be more careful with munging arguments
rev-parse: correctly diagnose revision errors before "--"
"git cat-file --batch=", an admittedly useless command, did not
behave very well.
* jk/cat-file-regression-fix:
cat-file: handle --batch format with missing type/size
cat-file: pass expand_data to print_object_or_die
The previous commit c57f628 (mv: let 'git mv file no-such-dir/' error out)
relies on that rename("file", "no-such-dir/") fails if the directory does not
exist (note the trailing slash). This does not work as expected on Windows:
This rename() call does not fail, but renames "file" to "no-such-dir" (not to
"no-such-dir/file"). Insert an explicit check for this case to force an error.
This changes the error message from
$ git mv file no-such-dir/
fatal: renaming 'file' failed: Not a directory
to
$ git mv file no-such-dir/
fatal: destination directory does not exist, source=file, destination=no-such-dir/
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git help $cmd" unnecessarily enumerated potential command names
from the filesystem, even when $cmd is known to be a built-in.
Ideas for further optimization, primarily by killing the use of
is_in_cmdlist(), were suggested in the discussion, but they can
come as follow-ups on top of this series.
* ss/builtin-cleanup:
builtin/help.c: speed up is_git_command() by checking for builtin commands first
builtin/help.c: call load_command_list() only when it is needed
git.c: consistently use the term "builtin" instead of "internal command"
There is no reason to have a hardcoded upper limit of the number of
parents for an octopus merge, created via the graft mechanism.
* js/lift-parent-count-limit:
Remove the line length limit for graft files
"git merge-base --octopus" used to leave cleaning up suboptimal
result to the caller, but now it does the clean-up itself.
* bm/merge-base-octopus-dedup:
merge-base --octopus: reduce the result from get_octopus_merge_bases()
merge-base: separate "--independent" codepath into its own helper
A "gc" process running as a different user should be able to stop a
new "gc" process from starting.
* km/gc-eperm:
gc: notice gc processes run by other users
Teach "cat-file --batch" to show delta-base object name for a
packed object that is represented as a delta.
* jk/oi-delta-base:
cat-file: provide %(deltabase) batch format
sha1_object_info_extended: provide delta base sha1s
"git add -A" (no other arguments) in a totally empty working tree
used to emit an error.
* nd/add-empty-fix:
add: don't complain when adding empty project root
Fetching 'frotz' branch with "git fetch", while having
'frotz/nitfol' remote-tracking branch from an earlier fetch, would
error out, primarily because the command has not been told to
remove anything on our side. In such a case, "git fetch --prune"
can be used to remove 'frotz/nitfol' to make room to fetch and
store 'frotz' remote-tracking branch.
* tm/fetch-prune:
fetch --prune: Run prune before fetching
fetch --prune: always print header url
A few places where we relied on a fixed length buffer to hold
pathnames in these two programs have been converted to use strbuf.
* mh/path-max:
builtin/prune.c: use strbuf to avoid having to worry about PATH_MAX
prune-packed: use strbuf to avoid having to worry about PATH_MAX
read_sha1_file() that is the workhorse to read the contents given
an object name honoured object replacements, but there is no
corresponding mechanism to sha1_object_info() that is used to
obtain the metainfo (e.g. type & size) about the object, leading
callers to weird inconsistencies.
* cc/replace-object-info:
replace info: rename 'full' to 'long' and clarify in-code symbols
Documentation/git-replace: describe --format option
builtin/replace: unset read_replace_refs
t6050: add tests for listing with --format
builtin/replace: teach listing using short, medium or full formats
sha1_file: perform object replacement in sha1_object_info_extended()
t6050: show that git cat-file --batch fails with replace objects
sha1_object_info_extended(): add an "unsigned flags" parameter
sha1_file.c: add lookup_replace_object_extended() to pass flags
replace_object: don't check read_replace_refs twice
rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT
Introduce "negative pathspec" magic, to allow "git log -- . ':!dir'" to
tell us "I am interested in everything but 'dir' directory".
* nd/negative-pathspec:
pathspec.c: support adding prefix magic to a pathspec with mnemonic magic
Support pathspec magic :(exclude) and its short form :!
glossary-content.txt: rephrase magic signature part
Since 2dce956 is_git_command() is a bit slow as it does file I/O in
the call to list_commands_in_dir(). Avoid the file I/O by adding an
early check for the builtin commands.
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This avoids list_commands_in_dir() being called when not needed which is
quite slow due to file I/O in order to list matching files in a directory.
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
safe_create_leading_directories_const() returns a non-zero value on
error. The old code at this calling site recognized a couple of
particular error values, and treated all other return values as
success. Instead, be more conservative: recognize the errors we are
interested in, but treat any other nonzero values as failures. This
is more robust in case somebody adds another possible return value
without telling us.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of returning magic integer values (which a couple of callers
go to the trouble of distinguishing), return values from an enum. Add
a docstring.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 58babfff ("shallow.c: the 8 steps to select new commits for
.git/shallow", 05-12-2013) added a function to implement step 5 of
the quoted eight steps, namely 'remove_nonexistent_ours_in_pack()'.
This function implements an optional optimization step in the new
shallow commit selection algorithm. However, this function has no
callers. (The commented out call sites would need to change, in
order to provide information required by the function.)
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we have a remote-tracking branch named "frotz/nitfol" from a
previous fetch, and the upstream now has a branch named "frotz",
fetch would fail to remove "frotz/nitfol" with a "git fetch --prune"
from the upstream. git would inform the user to use "git remote
prune" to fix the problem.
Change the way "fetch --prune" works by moving the pruning operation
before the fetching operation. This way, instead of warning the user
of a conflict, it autmatically fixes it.
Signed-off-by: Tom Miller <jackerran@gmail.com>
Tested-by: Thomas Rast <tr@thomasrast.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If "fetch --prune" is run with no new refs to fetch, but it has refs
to prune. Then, the header url is not printed as it would if there were
new refs to fetch.
Output before this patch:
$ git fetch --prune remote-with-no-new-refs
x [deleted] (none) -> origin/world
Output after this patch:
$ git fetch --prune remote-with-no-new-refs
From https://github.com/git/git
x [deleted] (none) -> origin/test
Signed-off-by: Tom Miller <jackerran@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 64a99eb4 git gc refuses to run without the --force option if
another gc process on the same repository is already running.
However, if the repository is shared and user A runs git gc on the
repository and while that gc is still running user B runs git gc on
the same repository the gc process run by user A will not be noticed
and the gc run by user B will go ahead and run.
The problem is that the kill(pid, 0) test fails with an EPERM error
since user B is not allowed to signal processes owned by user A
(unless user B is root).
Update the test to recognize an EPERM error as meaning the process
exists and another gc should not be run (unless --force is given).
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enum names SHORT/MEDIUM/FULL were too broad to be descriptive. And
they clashed with built-in symbols on platforms like Windows.
Clarify by giving them REPLACE_FORMAT_ prefix.
Rename 'full' format in "git replace --format=<name>" to 'long', to
match others (i.e. 'short' and 'medium').
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No code ever used this symbol since the command was introduced at
9f613ddd (Add git-for-each-ref: helper for language bindings,
2006-09-15).
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.
In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.
As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.
The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.
But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.
This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.
Here are perf results for p5310:
Test origin/master HEAD^ HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7%
5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5%
5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2%
5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9%
You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
`.idx` files, this commit teaches `git-repack` to consider the new
bitmap indexes (if they exist) when performing repack operations.
This implies moving old bitmap indexes out of the way if we are
repacking a repository that already has them, and moving the newly
generated bitmap indexes into the `objects/pack` directory, next to
their corresponding packfiles.
Since `git repack` is now capable of handling these `.bitmap` files,
a normal `git gc` run on a repository that has `pack.writebitmaps` set
to true in its config file will generate bitmap indexes as part of the
garbage collection process.
Alternatively, `git repack` can be called with the `-b` switch to
explicitly generate bitmap indexes if you are experimenting
and don't want them on all the time.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We ask pack-objects to pack to a set of temporary files, and
then rename them into place. Some files that pack-objects
creates may be optional (like a .bitmap file), in which case
we would not want to call rename(). We already call stat()
and make the chmod optional if the file cannot be accessed.
We could simply skip the rename step in this case, but that
would be a minor regression in noticing problems with
non-optional files (like the .pack and .idx files).
Instead, we can now annotate extensions as optional, and
skip them if they don't exist (and otherwise rely on
rename() to barf).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is slightly more verbose, but will let us annotate the
extensions with further options in future commits.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a static array of extensions, but hardcode the size
of the array in our loops. Let's pull out this magic number,
which will make it easier to change.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bitmap reachability index used to speed up the counting objects
phase during `pack-objects` can also be used to optimize a normal
rev-list if the only thing required are the SHA1s of the objects during
the list (i.e., not the path names at which trees and blobs were found).
Calling `git rev-list --objects --use-bitmap-index [committish]` will
perform an object iteration based on a bitmap result instead of actually
walking the object graph.
These are some example timings for `torvalds/linux` (warm cache,
best-of-five):
$ time git rev-list --objects master > /dev/null
real 0m34.191s
user 0m33.904s
sys 0m0.268s
$ time git rev-list --objects --use-bitmap-index master > /dev/null
real 0m1.041s
user 0m0.976s
sys 0m0.064s
Likewise, using `git rev-list --count --use-bitmap-index` will speed up
the counting operation by building the resulting bitmap and performing a
fast popcount (number of bits set on the bitmap) on the result.
Here are some sample timings of different ways to count commits in
`torvalds/linux`:
$ time git rev-list master | wc -l
399882
real 0m6.524s
user 0m6.060s
sys 0m3.284s
$ time git rev-list --count master
399882
real 0m4.318s
user 0m4.236s
sys 0m0.076s
$ time git rev-list --use-bitmap-index --count master
399882
real 0m0.217s
user 0m0.176s
sys 0m0.040s
This also respects negative refs, so you can use it to count
a slice of history:
$ time git rev-list --count v3.0..master
144843
real 0m1.971s
user 0m1.932s
sys 0m0.036s
$ time git rev-list --use-bitmap-index --count v3.0..master
real 0m0.280s
user 0m0.220s
sys 0m0.056s
Though note that the closer the endpoints, the less it helps. In the
traversal case, we have fewer commits to cross, so we take less time.
But the bitmap time is dominated by generating the pack revindex, which
is constant with respect to the refs given.
Note that you cannot yet get a fast --left-right count of a symmetric
difference (e.g., "--count --left-right master...topic"). The slow part
of that walk actually happens during the merge-base determination when
we parse "master...topic". Even though a count does not actually need to
know the real merge base (it only needs to take the symmetric difference
of the bitmaps), the revision code would require some refactoring to
handle this case.
Additionally, a `--test-bitmap` flag has been added that will perform
the same rev-list manually (i.e. using a normal revwalk) and using
bitmaps, and verify that the results are the same. This can be used to
exercise the bitmap code, and also to verify that the contents of the
.bitmap file are sane.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function actually does three things:
1. Check whether we've already added the object to our
packing list.
2. Check whether the object meets our criteria for adding.
3. Actually add the object to our packing list.
It's a little hard to see these three phases, because they
happen linearly in the rather long function. Instead, this
patch breaks them up into three separate helper functions.
The result is a little easier to follow, though it
unfortunately suffers from some optimization
interdependencies between the stages (e.g., during step 3 we
use the packing list index from step 1 and the packfile
information from step 2).
More importantly, though, the various parts can be
composed differently, as they will be in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Scripts that use "merge-base --octopus" could do the reducing
themselves, but most of them are expected to want to get the reduced
results without having to do any work themselves.
Tests are taken from a message by Василий Макаров
<einmalfel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
We might want to vet the existing callers of the underlying
get_octopus_merge_bases() and find out if _all_ of them are doing
anything extra (like deduping) because the machinery can return
duplicate results. And if that is the case, then we may want to
move the dedupling down the callchain instead of having it here.
It piggybacks on an unrelated handle_octopus() function only because
there are some similarities between the way they need to preprocess
their input and output their result. There is nothing similar in
the true logic between these two operations.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support for grafts predates Git's strbuf, and hence it is understandable
that there was a hard-coded line length limit of 1023 characters (which
was chosen a bit awkwardly, given that it is *exactly* one byte short of
aligning with the 41 bytes occupied by a commit name and the following
space or new-line character).
While regular commit histories hardly win comprehensibility in general
if they merge more than twenty-two branches in one go, it is not Git's
business to limit grafts in such a way.
In this particular developer's case, the use case that requires
substantially longer graft lines to be supported is the visualization of
the commits' order implied by their changes: commits are considered to
have an implicit relationship iff exchanging them in an interactive
rebase would result in merge conflicts.
Thusly implied branches tend to be very shallow in general, and the
resulting thicket of implied branches is usually very wide; It is
actually quite common that *most* of the commits in a topic branch have
not even one implied parent, so that a final merge commit has about as
many implied parents as there are commits in said branch.
[jc: squashed in tests by Jonathan]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff ../else/where/A ../else/where/B" when ../else/where is
clearly outside the repository, and "git diff --no-index A B", do
not have to look at the index at all, but we used to read the index
unconditionally.
* tg/diff-no-index-refactor:
diff: avoid some nesting
diff: add test for --no-index executed outside repo
diff: don't read index when --no-index is given
diff: move no-index detection to builtin/diff.c
"git cat-file --batch=", an admittedly useless command, did not
behave very well.
* jk/cat-file-regression-fix:
cat-file: handle --batch format with missing type/size
cat-file: pass expand_data to print_object_or_die
"git rev-parse <revs> -- <paths>" did not implement the usual
disambiguation rules the commands in the "git log" family used in
the same way.
* jk/rev-parse-double-dashes:
rev-parse: be more careful with munging arguments
rev-parse: correctly diagnose revision errors before "--"
Make "git push origin master" update the same ref that would be
updated by our 'master' when "git push origin" (no refspecs) is run
while the 'master' branch is checked out, which makes "git push"
more symmetric to "git fetch" and more usable for the triangular
workflow.
* jc/push-refmap:
push: also use "upstream" mapping when pushing a single ref
push: use remote.$name.push as a refmap
builtin/push.c: use strbuf instead of manual allocation
It can be useful for debugging or analysis to see which
objects are stored as delta bases on top of others. This
information is available by running `git verify-pack`, but
that is extremely expensive (and is harder than necessary to
parse).
Instead, let's make it available as a cat-file query format,
which makes it fast and simple to get the bases for a subset
of the objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sha1write function returns an int, but it will always be
"0". The failure-prone parts of the function happen in the
"flush" callback, which cannot pass an error back to us. So
we just end up calling die() during the flush.
Let's just drop the return value altogether, as it only
confuses callers into thinking that it might be useful.
Only one call site actually checked the return value. We can
drop that check, since it just led to a die() anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This behavior was added in 07d7bed (add: don't complain when adding
empty project root - 2009-04-28) then broken by 84b8b5d (remove
match_pathspec() in favor of match_pathspec_depth() -
2013-07-14). Reinstate it.
Noticed-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at it, rename prune_tmp_object(), which used to be a helper to
remove temporary files that were created to become loose object
files, to prune_tmp_file(), as the function is also used to remove
any random cruft whose name begins with tmp_ directly in .git/object
or .git/object/pack directories these days.
Noticed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Be more careful when parsing remote repository URL given in the
scp-style host:path notation.
* tb/clone-ssh-with-colon-for-port:
git_connect(): use common return point
connect.c: refactor url parsing
git_connect(): refactor the port handling for ssh
git fetch: support host:/~repo
t5500: add test cases for diag-url
git fetch-pack: add --diag-url
git_connect: factor out discovery of the protocol and its parts
git_connect: remove artificial limit of a remote command
t5601: add tests for ssh
t5601: remove clear_ssh, refactor setup_ssh_wrapper
"git fetch --depth=0" was a no-op, and was silently
ignored. Diagnose it as an error.
* nd/transport-positive-depth-only:
clone,fetch: catch non positive --depth option value
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.
* cc/starts-n-ends-with:
replace {pre,suf}fixcmp() with {starts,ends}_with()
strbuf: introduce starts_with() and ends_with()
builtin/remote: remove postfixcmp() and use suffixcmp() instead
environment: normalize use of prefixcmp() by removing " != 0"
"git commit -v" appends the patch to the log message before
editing, and then removes the patch when the editor returned
control. However, the patch was not stripped correctly when the
first modified path was a submodule.
* jl/commit-v-strip-marker:
commit -v: strip diffs and submodule shortlogs from the commit message