development/git - HydraGit

mirror of https://github.com/git/git synced 2024-09-19 08:21:36 +00:00

Author	SHA1	Message	Date
Junio C Hamano	8746e30541	Merge branch 'ah/pack-objects-usage-strings' Usage string fix. * ah/pack-objects-usage-strings: pack-objects: place angle brackets around placeholders in usage strings	2015-09-01 16:31:12 -07:00
Alex Henrie	b8c1d27577	pack-objects: place angle brackets around placeholders in usage strings Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-28 11:59:10 -07:00
Charles Bailey	2a514ed805	parse-options: move unsigned long option parsing out of pack-objects.c The unsigned long option parsing (including 'k'/'m'/'g' suffix parsing) is more widely applicable. Add support for OPT_MAGNITUDE to parse-options.h and change pack-objects.c use this support. The error behavior on parse errors follows that of OPT_INTEGER. The name of the option that failed to parse is reported with a brief message describing the expect format for the option argument and then the full usage message for the command invoked. This differs from the previous behavior for OPT_ULONG used in pack-objects for --max-pack-size and --window-memory which used to display the value supplied in the error message and did not display the full usage message. Signed-off-by: Charles Bailey <cbailey32@bloomberg.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 15:07:21 -07:00
Michael Haggerty	d155254c73	builtin/pack-objects: rewrite to take an object_id argument Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-25 12:19:29 -07:00
Michael Haggerty	2b2a5be394	each_ref_fn: change to take an object_id parameter Change typedef each_ref_fn to take a "const struct object_id oid" parameter instead of "const unsigned char sha1". To aid this transition, implement an adapter that can be used to wrap old-style functions matching the old typedef, which is now called "each_ref_sha1_fn"), and make such functions callable via the new interface. This requires the old function and its cb_data to be wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be used as the cb_data for an adapter function, each_ref_fn_adapter(). This is an enormous diff, but most of it consists of simple, mechanical changes to the sites that call any of the "for_each_ref" family of functions. Subsequent to this change, the call sites can be rewritten one by one to use the new interface. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-25 12:19:27 -07:00
Junio C Hamano	cedeffeee0	Merge branch 'jk/sha1-file-reduce-useless-warnings' * jk/sha1-file-reduce-useless-warnings: sha1_file: squelch "packfile cannot be accessed" warnings	2015-05-11 14:23:41 -07:00
Jeff King	319b678a7b	sha1_file: squelch "packfile cannot be accessed" warnings When we find an object in a packfile index, we make sure we can still open the packfile itself (or that it is already open), as it might have been deleted by a simultaneous repack. If we can't access the packfile, we print a warning for the user and tell the caller that we don't have the object (we can then look in other packfiles, or find a loose version, before giving up). The warning we print to the user isn't really accomplishing anything, and it is potentially confusing to users. In the normal case, it is complete noise; we find the object elsewhere, and the user does not have to care that we racily saw a packfile index that became stale. It didn't affect the operation at all. A possibly more interesting case is when we later can't find the object, and report failure to the user. In this case the warning could be considered a clue toward that ultimate failure. But it's not really a useful clue in practice. We wouldn't even print it consistently (since we are racing with another process, we might not even see the .idx file, or we might win the race and open the packfile, completing the operation). This patch drops the warning entirely (not only from the fill_pack_entry site, but also from an identical use in pack-objects). If we did find the warning interesting in the error case, we could stuff it away and reveal it to the user when we later die() due to the broken object. But that complexity just isn't worth it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-03-30 21:47:39 -07:00
Junio C Hamano	6902c4da58	Merge branch 'rs/deflate-init-cleanup' Code simplification. * rs/deflate-init-cleanup: zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}	2015-03-17 16:01:26 -07:00
René Scharfe	9a6f1287fb	zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw} Clear the git_zstream variable at the start of git_deflate_init() etc. so that callers don't have to do that. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-03-05 15:46:03 -08:00
brian m. carlson	2dacf26d09	pack-objects: use --objects-edge-aggressive for shallow repos When fetching into or pushing from a shallow repository, we want to aggressively mark edges as uninteresting, since this decreases the pack size. However, aggressively marking edges can negatively affect performance on large non-shallow repositories with lots of refs. Teach pack-objects a --shallow option to indicate that we're pushing from or fetching into a shallow repository. Use --objects-edge-aggressive only for shallow repositories and otherwise use --objects-edge, which performs better in the general case. Update the callers to pass the --shallow option when they are dealing with a shallow repository. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-29 09:58:25 -08:00
brian m. carlson	1684c1b219	rev-list: add an option to mark fewer edges as uninteresting In commit `fbd4a70` (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), we marked an increasing number of edges uninteresting. This change, and the subsequent change to make this conditional on --objects-edge, are used by --thin to make much smaller packs for shallow clones. Unfortunately, they cause a significant performance regression when pushing non-shallow clones with lots of refs (23.322 seconds vs. 4.785 seconds with 22400 refs). Add an option to git rev-list, --objects-edge-aggressive, that preserves this more aggressive behavior, while leaving --objects-edge to provide more performant behavior. Preserve the current behavior for the moment by using the aggressive option. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-29 09:57:55 -08:00
Junio C Hamano	d70e331c0e	Merge branch 'jk/prune-mtime' Tighten the logic to decide that an unreachable cruft is sufficiently old by covering corner cases such as an ancient object becoming reachable and then going unreachable again, in which case its retention period should be prolonged. * jk/prune-mtime: (28 commits) drop add_object_array_with_mode revision: remove definition of unused 'add_object' function pack-objects: double-check options before discarding objects repack: pack objects mentioned by the index pack-objects: use argv_array reachable: use revision machinery's --indexed-objects code rev-list: add --indexed-objects option rev-list: document --reflog option t5516: test pushing a tag of an otherwise unreferenced blob traverse_commit_list: support pending blobs/trees with paths make add_object_array_with_context interface more sane write_sha1_file: freshen existing objects pack-objects: match prune logic for discarding objects pack-objects: refactor unpack-unreachable expiration check prune: keep objects reachable from recent objects sha1_file: add for_each iterators for loose and packed objects count-objects: use for_each_loose_file_in_objdir count-objects: do not use xsize_t when counting object size prune-packed: use for_each_loose_file_in_objdir reachable: mark index blobs as SEEN ...	2014-10-29 10:07:56 -07:00
Junio C Hamano	e4da4fbe0e	Merge branch 'eb/no-pthreads' Allow us build with NO_PTHREADS=NoThanks compilation option. * eb/no-pthreads: Handle atexit list internaly for unthreaded builds pack-objects: set number of threads before checking and warning index-pack: fix compilation with NO_PTHREADS	2014-10-24 14:59:10 -07:00
Junio C Hamano	26a22d8d00	Merge branch 'jk/pack-objects-no-bitmap-when-splitting' Splitting pack-objects output into multiple packs is incompatible with the use of reachability bitmap. * jk/pack-objects-no-bitmap-when-splitting: pack-objects: turn off bitmaps when we split packs	2014-10-24 14:56:10 -07:00
Jeff King	2113471478	pack-objects: turn off bitmaps when we split packs If a pack.packSizeLimit is set, we may split the pack data across multiple packfiles. This means we cannot generate .bitmap files, as they require that all of the reachable objects are in the same pack. We check that condition when we are generating the list of objects to pack (and disable bitmaps if we are not packing everything), but we forgot to update it when we notice that we needed to split (which doesn't happen until the actual write phase). The resulting bitmaps are quite bogus (they mention entries that do not exist in the pack!) and can cause a fetch or push to send insufficient objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:08:38 -07:00
Jeff King	b1e757f363	pack-objects: double-check options before discarding objects When we are given an expiration time like --unpack-unreachable=2.weeks.ago, we avoid writing out old, unreachable loose objects entirely, under the assumption that running "prune" would simply delete them immediately anyway. However, this is only valid if we computed the same set of reachable objects as prune would. In practice, this is the case, because only git-repack uses the --unpack-unreachable option with an expiration, and it always feeds as many objects into the pack as possible. But we can double-check at runtime just to be sure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:07:07 -07:00
Jeff King	c90f9e13ab	repack: pack objects mentioned by the index When we pack all objects, we use only the objects reachable from references and reflogs. This misses any objects which are reachable from the index, but not yet referenced. By itself this isn't a big deal; the objects can remain loose until they are actually used in a commit. However, it does create a problem when we drop packed but unreachable objects. We try to optimize out the writing of objects that we will immediately prune, which means we must follow the same rules as prune in determining what is reachable. And prune uses the index for this purpose. This is rather uncommon in practice, as objects in the index would not usually have been packed in the first place. But it could happen in a sequence like: 1. You make a commit on a branch that references blob X. 2. You repack, moving X into the pack. 3. You delete the branch (and its reflog), so that X is unreferenced. 4. You "git add" blob X so that it is now referenced only by the index. 5. You repack again with git-gc. The pack-objects we invoke will see that X is neither referenced nor recent and not bother loosening it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:07:07 -07:00
Jeff King	edfbb2aa53	pack-objects: use argv_array This saves us from having to bump the rp_av count when we add new traversal options. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:07:07 -07:00
Jeff King	abcb86553d	pack-objects: match prune logic for discarding objects A recent commit taught git-prune to keep non-recent objects that are reachable from recent ones. However, pack-objects, when loosening unreachable objects, tries to optimize out the write in the case that the object will be immediately pruned. It now gets this wrong, since its rule does not reflect the new prune code (and this can be seen by running t6501 with a strategically placed repack). Let's teach pack-objects similar logic. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:43 -07:00
Jeff King	d0d46abc16	pack-objects: refactor unpack-unreachable expiration check When we are loosening unreachable packed objects, we do not bother to process objects that would simply be pruned immediately anyway. The "would be pruned" check is a simple comparison, but is about to get more complicated. Let's pull it out into a separate function. Note that this is slightly less efficient than the original, which avoided even opening old packs, since no object in them could pass the current check, which cares only about the pack mtime. But the new rules will depend on the exact object, so we need to perform the check even for old packs. Note also that we fix a minor buglet when the pack mtime is exactly the same as the expiration time. The prune code considers that worth pruning, whereas our check here considered it worth keeping. This wasn't a big deal. Besides being unlikely to happen, the result was simply that the object was loosened and then pruned, missing the optimization. Still, we can easily fix it while we are here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:42 -07:00
Junio C Hamano	0c45d258ec	pack-objects: set number of threads before checking and warning Under NO_PTHREADS build, we warn when delta_search_threads is not set to 1, because that is the only sensible value on a single threaded build. However, the auto detection that kicks in when that variable is set to 0 (e.g. there is no configuration variable or command line option, or an explicit --threads=0 is given from the command line to override the pack.threads configuration to force auto-detection) was not done before the condition to issue this warning was tested. Move the auto-detection code and place it at an appropriate spot. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-13 12:53:46 -07:00
René Scharfe	2756ca4347	use REALLOC_ARRAY for changing the allocation size of arrays Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-18 09:13:42 -07:00
Junio C Hamano	a3d54f9a1f	Merge branch 'jk/pack-shallow-always-without-bitmap' Reachability bitmaps do not work with shallow operations. Fixes regression in 2.0. * jk/pack-shallow-always-without-bitmap: pack-objects: turn off bitmaps when we see --shallow lines	2014-08-26 11:16:25 -07:00
Jeff King	f7f91086a3	pack-objects: turn off bitmaps when we see --shallow lines Reachability bitmaps do not work with shallow operations, because they cache a view of the object reachability that represents the true objects. Whereas a shallow repository (or a shallow operation in a repository) is inherently cutting off the object graph with a graft. We explicitly disallow the use of bitmaps in shallow repositories by checking is_repository_shallow(), and we should continue to do that. However, we also want to disallow bitmaps when we are serving a fetch to a shallow client, since we momentarily take on their grafted view of the world. It used to be enough to call is_repository_shallow at the start of pack-objects. Upload-pack wrote the other side's shallow state to a temporary file and pointed the whole pack-objects process at this state with "git --shallow-file", and from the perspective of pack-objects, we really were in a shallow repo. But since `b790e0f` (upload-pack: send shallow info over stdin to pack-objects, 2014-03-11), we do it differently: we send --shallow lines to pack-objects over stdin, and it registers them itself. This means that our is_repository_shallow check is way too early (we have not been told about the shallowness yet), and that it is insufficient (calling is_repository_shallow is not enough, as the shallow grafts we register do not change its return value). Instead, we can just turn off bitmaps explicitly when we see these lines. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-12 12:17:19 -07:00
Junio C Hamano	25f3119000	Merge branch 'jk/repack-pack-writebitmaps-config' * jk/repack-pack-writebitmaps-config: t7700: drop explicit --no-pack-kept-objects from .keep test repack: introduce repack.writeBitmaps config option repack: simplify handling of --write-bitmap-index pack-objects: stop respecting pack.writebitmaps	2014-06-25 12:23:19 -07:00
Jeff King	15a906c5e9	pack-objects: stop respecting pack.writebitmaps The handling of the pack.writebitmaps config option originally happened in pack-objects, which is quite low-level. It would make more sense for drivers of pack-objects to read the config, and then manipulate pack-objects with command-line options. Recently, repack learned to do so, making the low-level read of pack.writebitmaps redundant here. Other callers, like upload-pack, would not generally want to write bitmaps anyway. This could be considered a regression for somebody who is driving pack-objects themselves outside of repack and expects the config option to be used. However, such users seem rather unlikely given how new the bitmap code is (and the fact that they would basically be reimplementing repack in the first place). Note that we do not do anything with pack.writeBitmapHashCache here. That option is not about "do we write bimaps", but rather "when we are writing bitmaps, how do we do it?". You would want that to kick in anytime you decide to write them, similar to how pack.indexVersion is used. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-10 14:01:53 -07:00
Junio C Hamano	967f8c9184	Merge branch 'jk/pack-bitmap' * jk/pack-bitmap: pack-objects: do not reuse packfiles without --delta-base-offset add `ignore_missing_links` mode to revwalk	2014-04-08 12:00:33 -07:00
Junio C Hamano	d59c12d7ad	Merge branch 'jl/nor-or-nand-and' Eradicate mistaken use of "nor" (that is, essentially "nor" used not in "neither A nor B" ;-)) from in-code comments, command output strings, and documentations. * jl/nor-or-nand-and: code and test: fix misuses of "nor" comments: fix misuses of "nor" contrib: fix misuses of "nor" Documentation: fix misuses of "nor"	2014-04-08 12:00:28 -07:00
Jeff King	69e4b3426a	pack-objects: do not reuse packfiles without --delta-base-offset When we are sending a packfile to a remote, we currently try to reuse a whole chunk of packfile without bothering to look at the individual objects. This can make things like initial clones much lighter on the server, as we can just dump the packfile bytes. However, it's possible that the other side cannot read our packfile verbatim. For example, we may have objects stored as OFS_DELTA, but the client is an antique version of git that only understands REF_DELTA. We negotiate this capability over the fetch protocol. A normal pack-objects run will convert OFS_DELTA into REF_DELTA on the fly, but the "reuse pack" code path never even looks at the objects. This patch disables packfile reuse if the other side is missing any capabilities that we might have used in the on-disk pack. Right now the only one is OFS_DELTA, but we may need to expand in the future (e.g., if packv4 introduces new object types). We could be more thorough and only disable reuse in this case when we actually have an OFS_DELTA to send, but: 1. We almost always will have one, since we prefer OFS_DELTA to REF_DELTA when possible. So this case would almost never come up. 2. Looking through the objects defeats the purpose of the optimization, which is to do as little work as possible to get the bytes to the remote. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-04-04 15:29:44 -07:00
Justin Lebar	01689909eb	comments: fix misuses of "nor" Signed-off-by: Justin Lebar <jlebar@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-31 15:29:27 -07:00
Junio C Hamano	0ddcc9cfba	Merge branch 'jk/pack-bitmap-progress' The progress output while repacking and transferring objects showed an apparent large silence while writing the objects out of existing packfiles, when the reachability bitmap was in use. * jk/pack-bitmap-progress: pack-objects: show reused packfile objects in "Counting objects" pack-objects: show progress for reused packfiles	2014-03-28 13:50:56 -07:00
Junio C Hamano	e2450e1245	Merge branch 'jk/pack-bitmap' Instead of dying when asked to (re)pack with the reachability bitmap when a bitmap cannot be built, just (re)pack without producing a bitmap in such a case, with a warning. * jk/pack-bitmap: pack-objects: turn off bitmaps when skipping objects	2014-03-28 13:50:50 -07:00
Junio C Hamano	1ddb4d7e5e	Merge branch 'nd/upload-pack-shallow' Serving objects from a shallow repository needs to write a temporary file to be used, but the serving upload-pack may not have write access to the repository which is meant to be read-only. Instead feed these temporary shallow bounds from the standard input of pack-objects so that we do not have to use a temporary file. * nd/upload-pack-shallow: upload-pack: send shallow info over stdin to pack-objects	2014-03-21 12:49:08 -07:00
Junio C Hamano	f4eec8ce05	Merge branch 'sh/finish-tmp-packfile' * sh/finish-tmp-packfile: finish_tmp_packfile():use strbuf for pathname construction	2014-03-18 13:50:24 -07:00
Junio C Hamano	fe9122a352	Merge branch 'dd/use-alloc-grow' Replace open-coded reallocation with ALLOC_GROW() macro. * dd/use-alloc-grow: sha1_file.c: use ALLOC_GROW() in pretend_sha1_file() read-cache.c: use ALLOC_GROW() in add_index_entry() builtin/mktree.c: use ALLOC_GROW() in append_to_tree() attr.c: use ALLOC_GROW() in handle_attr_line() dir.c: use ALLOC_GROW() in create_simplify() reflog-walk.c: use ALLOC_GROW() replace_object.c: use ALLOC_GROW() in register_replace_object() patch-ids.c: use ALLOC_GROW() in add_commit() diffcore-rename.c: use ALLOC_GROW() diff.c: use ALLOC_GROW() commit.c: use ALLOC_GROW() in register_commit_graft() cache-tree.c: use ALLOC_GROW() in find_subtree() bundle.c: use ALLOC_GROW() in add_to_ref_list() builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()	2014-03-18 13:50:21 -07:00
Jeff King	373c67da1d	pack-objects: turn off bitmaps when skipping objects The pack bitmap format requires that we have a single bit for each object in the pack, and that each object's bitmap represents its complete set of reachable objects. Therefore we have no way to represent the bitmap of an object which references objects outside the pack. We notice this problem while generating the bitmaps, as we try to find the offset of a particular object and realize that we do not have it. In this case we die, and neither the bitmap nor the pack is generated. This is correct, but perhaps a little unfriendly. If you have bitmaps turned on in the config, many repacks will fail which would otherwise succeed. E.g., incremental repacks, repacks with "-l" when you have alternates, ".keep" files. Instead, this patch notices early that we are omitting some objects from the pack and turns off bitmaps (with a warning). Note that this is not strictly correct, as it's possible that the object being omitted is not reachable from any other object in the pack. In practice, this is almost never the case, and there are two advantages to doing it this way: 1. The code is much simpler, as we do not have to cleanly abort the bitmap-generation process midway through. 2. We do not waste time partially generating bitmaps only to find out that some object deep in the history is not being packed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-17 15:02:39 -07:00
Jeff King	78d2214eb4	pack-objects: show reused packfile objects in "Counting objects" When we are sending a pack for push or fetch, we may reuse a chunk of packfile without even parsing it. The progress meter then looks like this: Reusing existing pack: 3440489, done. Counting objects: 3, done. The first line shows that we are reusing a large chunk of objects, and then we further count any objects not included in the reused portion with an actual traversal. These are all implementation details that the user does not need to care about. Instead, we can show the reused objects in the normal "counting..." progress meter (which will simply go much faster than normal), and then continue to add to it as we traverse. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-17 15:01:27 -07:00
Jeff King	657673f125	pack-objects: show progress for reused packfiles When the "--all-progress" option is in effect, pack-objects shows a progress report for the "writing" phase. If the repository has bitmaps and we are reusing a packfile, the user sees no progress update until the whole packfile is sent. Since this is typically the bulk of what is being written, it can look like git hangs during this phase, even though the transfer is proceeding. This generally only happens with "git push" from a repository with bitmaps. We do not use "--all-progress" for fetch (since the result is going to index-pack on the client, which takes care of progress reporting). And for regular repacks to disk, we do not reuse packfiles. We already have the progress meter setup during write_reused_pack; we just need to call display_progress whiel we are writing out the pack. The progress meter is attached to our output descriptor, so it automatically handles the throughput measurements. However, we need to update the object count as we go, since that is what feeds the percentage we show. We aren't actually parsing the packfile as we send it, so we have no idea how many objects we have sent; we only know that at the end of N bytes, we will have sent M objects. So we cheat a little and assume each object is M/N bytes (i.e., the mean of the objects we are sending). While this isn't strictly true, it actually produces a more pleasing progress meter for the user, as it moves smoothly and predictably (and nobody really cares about the object count; they care about the percentage, and the object count is a proxy for that). One alternative would be to actually show two progress meters: one for the reused pack, and one for the rest of the objects. That would more closely reflect the data we have (the first would be measured in bytes, and the second measured in objects). But it would also be more complex and annoying to the user; rather than seeing one progress meter counting up to 100%, they would finish one meter, then start another one at zero. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-17 15:01:25 -07:00
Junio C Hamano	56e2874a81	Merge branch 'sh/write-pack-file-warning-message-fix' A warning from "git pack-objects" were generated by referring to an incorrect variable when forming the filename that we had trouble with. * sh/write-pack-file-warning-message-fix: write_pack_file: use correct variable in diagnostic	2014-03-14 14:27:17 -07:00
Junio C Hamano	3e30cb0fbf	Merge branch 'mh/replace-refs-variable-rename' * mh/replace-refs-variable-rename: Document some functions defined in object.c Add docstrings for lookup_replace_object() and do_lookup_replace_object() rename read_replace_refs to check_replace_refs	2014-03-14 14:27:06 -07:00
Junio C Hamano	0963008cbf	Merge branch 'nd/i18n-progress' Mark the progress indicators from various time-consuming commands for i18n/l10n. * nd/i18n-progress: i18n: mark all progress lines for translation	2014-03-14 14:26:31 -07:00
Nguyễn Thái Ngọc Duy	b790e0f67c	upload-pack: send shallow info over stdin to pack-objects Before `cdab485` (upload-pack: delegate rev walking in shallow fetch to pack-objects - 2013-08-16) upload-pack does not write to the source repository. `cdab485` starts to write $GIT_DIR/shallow_XXXXXX if it's a shallow fetch, so the source repo must be writable. git:// servers do not need write access to repos and usually don't have it, which means `cdab485` breaks shallow clone over git:// Instead of using a temporary file as the media for shallow points, we can send them over stdin to pack-objects as well. Prepend shallow SHA-1 with --shallow so pack-objects knows what is what. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-11 13:32:10 -07:00
Dmitry S. Dolzhenko	25e1940709	builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path() Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 14:44:11 -08:00
Sun He	5889271114	finish_tmp_packfile():use strbuf for pathname construction The old version fixes a maximum length on the buffer, which could be a problem if one is not certain of the length of get_object_directory(). Using strbuf can avoid the protential bug. Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Sun He <sunheehnus@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 12:15:10 -08:00
Junio C Hamano	2156a98045	Merge branch 'sh/write-pack-file-warning-message-fix' into sh/finish-tmp-packfile * sh/write-pack-file-warning-message-fix: write_pack_file: use correct variable in diagnostic	2014-03-03 12:13:20 -08:00
Sun He	0eea5a6e91	write_pack_file: use correct variable in diagnostic 'pack_tmp_name' is the subject of the utime() check, so report it in the warning, not the uninitialized 'tmpname' Signed-off-by: Sun He <sunheehnus@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 10:43:40 -08:00
Junio C Hamano	0f9e62e084	Merge branch 'jk/pack-bitmap' Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...	2014-02-27 14:01:48 -08:00
Nguyễn Thái Ngọc Duy	754dbc43f0	i18n: mark all progress lines for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 09:08:37 -08:00
Michael Haggerty	afc711b8e1	rename read_replace_refs to check_replace_refs The semantics of this flag was changed in commit `e1111cef23` inline lookup_replace_object() calls but wasn't renamed at the time to minimize code churn. Rename it now, and add a comment explaining its use. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-20 14:16:55 -08:00
Vicent Marti	ae4f07fbcc	pack-bitmap: implement optional name_hash cache When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Vicent Marti	7cc8f97108	pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Vicent Marti	6b8fda2db1	pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Jeff King	ce2bc42456	pack-objects: split add_object_entry This function actually does three things: 1. Check whether we've already added the object to our packing list. 2. Check whether the object meets our criteria for adding. 3. Actually add the object to our packing list. It's a little hard to see these three phases, because they happen linearly in the rather long function. Instead, this patch breaks them up into three separate helper functions. The result is a little easier to follow, though it unfortunately suffers from some optimization interdependencies between the stages (e.g., during step 3 we use the packing list index from step 1 and the packfile information from step 2). More importantly, though, the various parts can be composed differently, as they will be in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Jeff King	9af270e8c2	do not pretend sha1write returns errors The sha1write function returns an int, but it will always be "0". The failure-prone parts of the function happen in the "flush" callback, which cannot pass an error back to us. So we just end up calling die() during the flush. Let's just drop the return value altogether, as it only confuses callers into thinking that it might be useful. Only one call site actually checked the return value. We can drop that check, since it just led to a die() anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-26 11:50:20 -08:00
Christian Couder	5955654823	replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-05 14:13:21 -08:00
Vicent Marti	68fb36eb92	pack-objects: factor out name_hash As the pack-objects system grows beyond the single pack-objects.c file, more parts (like the soon-to-exist bitmap code) will need to compute hashes for matching deltas. Factor out name_hash to make it available to other files. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-24 15:44:52 -07:00
Vicent Marti	2834bc27c1	pack-objects: refactor the packing list The hash table that stores the packing list for a given `pack-objects` run was tightly coupled to the pack-objects code. In this commit, we refactor the hash table and the underlying storage array into a `packing_data` struct. The functionality for accessing and adding entries to the packing list is hence accessible from other parts of Git besides the `pack-objects` builtin. This refactoring is a requirement for further patches in this series that will require accessing the commit packing list from outside of `pack-objects`. The hash table implementation has been minimally altered: we now use table sizes which are always a power of two, to ensure a uniform index distribution in the array. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-24 15:44:48 -07:00
Junio C Hamano	eeb8e8373f	Merge branch 'jc/pack-objects' * jc/pack-objects: pack-objects: shrink struct object_entry	2013-10-23 13:21:26 -07:00
Junio C Hamano	238504b014	Merge branch 'nd/fetch-into-shallow' When there is no sufficient overlap between old and new history during a fetch into a shallow repository, we unnecessarily sent objects the sending side knows the receiving end has. * nd/fetch-into-shallow: Add testcase for needless objects during a shallow fetch list-objects: mark more commits as edges in mark_edges_uninteresting list-objects: reduce one argument in mark_edges_uninteresting upload-pack: delegate rev walking in shallow fetch to pack-objects shallow: add setup_temporary_shallow() shallow: only add shallow graft points to new shallow file move setup_alternate_shallow and write_shallow_commits to shallow.c	2013-09-20 12:25:32 -07:00
Nguyễn Thái Ngọc Duy	e76a5fb459	list-objects: reduce one argument in mark_edges_uninteresting mark_edges_uninteresting() is always called with this form mark_edges_uninteresting(revs->commits, revs, ...); Remove the first argument and let mark_edges_uninteresting figure that out by itself. It helps answer the question "are this commit list and revs related in any way?" when looking at mark_edges_uninteresting implementation. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-28 11:54:18 -07:00
Brandon Casey	7c3ecb3254	Don't close pack fd when free'ing pack windows Now that close_one_pack() has been introduced to handle file descriptor pressure, it is not strictly necessary to close the pack file descriptor in unuse_one_window() when we're under memory pressure. Jeff King provided a justification for leaving the pack file open: If you close packfile descriptors, you can run into racy situations where somebody else is repacking and deleting packs, and they go away while you are trying to access them. If you keep a descriptor open, you're fine; they last to the end of the process. If you don't, then they disappear from under you. For normal object access, this isn't that big a deal; we just rescan the packs and retry. But if you are packing yourself (e.g., because you are a pack-objects started by upload-pack for a clone or fetch), it's much harder to recover (and we print some warnings). Let's do so (or uh, not do so). Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-02 09:27:26 -07:00
Junio C Hamano	63cdcfa40f	pack-objects: shrink struct object_entry Turn some boolean fields into bitfields and use uint32_t for name hash. This shrinks the size of the structure from 128 bytes to 120 bytes. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-04 15:23:35 -08:00
Jeff King	315ea32f1b	Merge branch 'jk/peel-ref' Speeds up "git upload-pack" (what is invoked by "git fetch" on the other side of the connection) by reducing the cost to advertise the branches and tags that are available in the repository. * jk/peel-ref: upload-pack: use peel_ref for ref advertisements peel_ref: check object type before loading peel_ref: do not return a null sha1 peel_ref: use faster deref_tag_noverify	2012-10-25 06:42:27 -04:00
Jeff King	e6dbffa67b	peel_ref: do not return a null sha1 The idea of the peel_ref function is to dereference tag objects recursively until we hit a non-tag, and return the sha1. Conceptually, it should return 0 if it is successful (and fill in the sha1), or -1 if there was nothing to peel. However, the current behavior is much more confusing. For a regular loose ref, the behavior is as described above. But there is an optimization to reuse the peeled-ref value for a ref that came from a packed-refs file. If we have such a ref, we return its peeled value, even if that peeled value is null (indicating that we know the ref definitely does _not_ peel). It might seem like such information is useful to the caller, who would then know not to bother loading and trying to peel the object. Except that they should not bother loading and trying to peel the object _anyway_, because that fallback is already handled by peel_ref. In other words, the whole point of calling this function is that it handles those details internally, and you either get a sha1, or you know that it is not peel-able. This patch catches the null sha1 case internally and converts it into a -1 return value (i.e., there is nothing to peel). This simplifies callers, which do not need to bother checking themselves. Two callers are worth noting: - in pack-objects, a comment indicates that there is a difference between non-peelable tags and unannotated tags. But that is not the case (before or after this patch). Whether you get a null sha1 has to do with internal details of how peel_ref operated. - in show-ref, if peel_ref returns a failure, the caller tries to decide whether to try peeling manually based on whether the REF_ISPACKED flag is set. But this doesn't make any sense. If the flag is set, that does not necessarily mean the ref came from a packed-refs file with the "peeled" extension. But it doesn't matter, because even if it didn't, there's no point in trying to peel it ourselves, as peel_ref would already have done so. In other words, the fallback peeling is guaranteed to fail. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-10-04 20:34:28 -07:00
Nguyễn Thái Ngọc Duy	4c6881204b	i18n: pack-objects: mark parseopt strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-08-20 12:23:18 -07:00
Junio C Hamano	0958a24d73	Merge branch 'jc/sha1-name-more' Teaches the object name parser things like a "git describe" output is always a commit object, "A" in "git log A" must be a committish, and "A" and "B" in "git log A...B" both must be committish, etc., to prolong the lifetime of abbreviated object names. * jc/sha1-name-more: (27 commits) t1512: match the "other" object names t1512: ignore whitespaces in wc -l output rev-parse --disambiguate=<prefix> rev-parse: A and B in "rev-parse A..B" refer to committish reset: the command takes committish commit-tree: the command wants a tree and commits apply: --build-fake-ancestor expects blobs sha1_name.c: add support for disambiguating other types revision.c: the "log" family, except for "show", takes committish revision.c: allow handle_revision_arg() to take other flags sha1_name.c: introduce get_sha1_committish() sha1_name.c: teach lookup context to get_sha1_with_context() sha1_name.c: many short names can only be committish sha1_name.c: get_sha1_1() takes lookup flags sha1_name.c: get_describe_name() by definition groks only commits sha1_name.c: teach get_short_sha1() a commit-only option sha1_name.c: allow get_short_sha1() to take other flags get_sha1(): fix error status regression sha1_name.c: restructure disambiguation of short names sha1_name.c: correct misnamed "canonical" and "res" ...	2012-07-22 12:55:07 -07:00
Junio C Hamano	8e676e8ba5	revision.c: allow handle_revision_arg() to take other flags The existing "cant_be_filename" that tells the function that the caller knows the arg is not a path (hence it does not have to be checked for absense of the file whose name matches it) is made into a bit in the flag word. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-09 16:42:22 -07:00
Nguyễn Thái Ngọc Duy	cf2ba13ac6	pack-objects: use streaming interface for reading large loose blobs git usually streams large blobs directly to packs. But there are cases where git can create large loose blobs (unpack-objects or hash-object over pipe). Or they can come from other git implementations. core.bigfilethreshold can also be lowered down and introduce a new wave of large loose blobs. Use streaming interface to read/compress/write these blobs in one go. Fall back to normal way if somehow streaming interface cannot be used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-29 10:50:56 -07:00
Nguyễn Thái Ngọc Duy	c9018b0305	pack-objects: refactor write_object() into helper functions The function first decides if we want to copy data taken from existing pack verbatim or we want to encode the data ourselves for the packfile we are creating and then carries out the decision. Separate the latter phase into two helper functions, one for the case the data is reused, the other for the case the data is produced anew. A little twist is that it can later turn out that we cannot reuse the data after we initially decide to do so; in such a case, the "reuse" helper makes a call to "generate" helper. It is easier to follow than the current fallback code that uses "goto" inside a single large function. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-18 14:22:15 -07:00
Nguyễn Thái Ngọc Duy	754980d023	pack-objects, streaming: turn "xx >= big_file_threshold" to ".. > .." This is because all other places do "xx > big_file_threshold" Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-18 14:21:19 -07:00
Jeff King	7e52f5660e	gc: do not explode objects which will be immediately pruned When we pack everything into one big pack with "git repack -Ad", any unreferenced objects in to-be-deleted packs are exploded into loose objects, with the intent that they will be examined and possibly cleaned up by the next run of "git prune". Since the exploded objects will receive the mtime of the pack from which they come, if the source pack is old, those loose objects will end up pruned immediately. In that case, it is much more efficient to skip the exploding step entirely for these objects. This patch teaches pack-objects to receive the expiration information and avoid writing these objects out. It also teaches "git gc" to pass the value of gc.pruneexpire to repack (which in turn learns to pass it along to pack-objects) so that this optimization happens automatically during "git gc" and "git gc --auto". Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-11 11:09:49 -07:00
Michał Kiedrowicz	2b34e486bc	pack-objects: Fix compilation with NO_PTHREDS It looks like commit `99fb6e04` (pack-objects: convert to use parse_options(), 2012-02-01) moved the #ifdef NO_PTHREDS around but hasn't noticed that the 'arg' variable no longer is available. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Acked-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-26 17:46:00 -08:00
Nguyễn Thái Ngọc Duy	99fb6e04cb	pack-objects: convert to use parse_options() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:05:00 -08:00
Nguyễn Thái Ngọc Duy	3a2ec52e99	pack-objects: remove bogus comment The comment was introduced in `b5d97e6` (pack-objects: run rev-list equivalent internally. - 2006-09-04), stating that git pack-objects [options] base-name <refs...> is acceptable and refs should be passed into rev-list. But that's not true. All arguments after base-name are ignored. Remove the comment and reject this syntax (i.e. no more arguments after base name) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:04:11 -08:00
Nguyễn Thái Ngọc Duy	6a301345a5	pack-objects: do not accept "--index-version=version," Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:03:46 -08:00
Junio C Hamano	c4a01a3cbb	Merge branch 'maint' * maint: Update draft release notes to 1.7.8.4 Update draft release notes to 1.7.7.6 Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:33:39 -08:00
Junio C Hamano	5a6a939481	Merge branch 'maint-1.7.7' into maint * maint-1.7.7: Update draft release notes to 1.7.7.6 Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:31:46 -08:00
Junio C Hamano	901c907d83	Merge branch 'maint-1.7.6' into maint-1.7.7 * maint-1.7.6: Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:31:05 -08:00
Jeff King	15f07e061e	thin-pack: try harder to use preferred base objects as base When creating a pack using objects that reside in existing packs, we try to avoid recomputing futile delta between an object (trg) and a candidate for its base object (src) if they are stored in the same packfile, and trg is not recorded as a delta already. This heuristics makes sense because it is likely that we tried to express trg as a delta based on src but it did not produce a good delta when we created the existing pack. As the pack heuristics prefer producing delta to remove data, and Linus's law dictates that the size of a file grows over time, we tend to record the newest version of the file as inflated, and older ones as delta against it. When creating a thin-pack to transfer recent history, it is likely that we will try to send an object that is recorded in full, as it is newer. But the heuristics to avoid recomputing futile delta effectively forbids us from attempting to express such an object as a delta based on another object. Sending an object in full is often more expensive than sending a suboptimal delta based on other objects, and it is even more so if we could use an object we know the receiving end already has (i.e. preferred base object) as the delta base. Tweak the recomputation avoidance logic, so that we do not punt on computing delta against a preferred base object. The effect of this change can be seen on two simulated upload-pack workloads. The first is based on 44 reflog entries from my git.git origin/master reflog, and represents the packs that kernel.org sent me git updates for the past month or two. The second workload represents much larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to v1.2.0, and so on. The table below shows the average generated pack size and the average CPU time consumed for each dataset, both before and after the patch: dataset \| reflog \| tags --------------------------------- before \| 53358 \| 2750977 size after \| 32398 \| 2668479 change \| -39% \| -3% --------------------------------- before \| 0.18 \| 1.12 CPU after \| 0.18 \| 1.15 change \| +0% \| +3% This patch makes a much bigger difference for packs with a shorter slice of history (since its effect is seen at the boundaries of the pack) though it has some benefit even for larger packs. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-01-12 23:06:20 -08:00
Junio C Hamano	48b303675a	Merge branch 'jc/stream-to-pack' * jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h	2011-12-16 22:33:40 -08:00
Junio C Hamano	2e8722fc9e	Merge branch 'jc/maint-pack-object-cycle' into maint * jc/maint-pack-object-cycle: pack-object: tolerate broken packs that have duplicated objects Conflicts: builtin/pack-objects.c	2011-12-13 22:04:50 -08:00
Junio C Hamano	df6246ed78	Merge branch 'nd/misc-cleanups' into maint * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-13 22:02:51 -08:00
Junio C Hamano	cddec4f8ae	Merge branch 'jc/maint-pack-object-cycle' * jc/maint-pack-object-cycle: pack-object: tolerate broken packs that have duplicated objects Conflicts: builtin/pack-objects.c	2011-12-05 15:19:34 -08:00
Junio C Hamano	62cdb6b23a	Merge branch 'nd/misc-cleanups' * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-05 15:10:20 -08:00
Junio C Hamano	568508e765	bulk-checkin: replace fast-import based implementation This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-01 11:46:09 -08:00
Junio C Hamano	f63c79dbc8	pack-object: tolerate broken packs that have duplicated objects When --reuse-delta is in effect (which is the default), and an existing pack in the repository has the same object registered twice (e.g. one copy in a non-delta format and the other copy in a delta against some other object), an attempt to repack the repository can result in a cyclic delta dependency, causing write_one() function to infinitely recurse into itself. Detect such a case and break the loopy dependency by writing out an object that is involved in such a loop in the non-delta format. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-16 22:06:08 -08:00
Junio C Hamano	84a9ea90e1	Merge branch 'dm/pack-objects-update' * dm/pack-objects-update: pack-objects: don't traverse objects unnecessarily pack-objects: rewrite add_descendants_to_write_order() iteratively pack-objects: use unsigned int for counter and offset values pack-objects: mark add_to_write_order() as inline	2011-11-01 15:20:07 -07:00
Junio C Hamano	0e990530ae	finish_tmp_packfile(): a helper function Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c. This changes the order of finishing multi-pack generation slightly. The code used to - adjust shared perm of temporary packfile - rename temporary packfile to the final name - update mtime of the packfile under the final name - adjust shared perm of temporary idxfile - rename temporary idxfile to the final name but because the helper does not want to do the mtime thing, the updated code does that step first and then all the rest. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-28 12:34:09 -07:00
Junio C Hamano	cdf9db3c83	create_tmp_packfile(): a helper function Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-28 11:52:14 -07:00
Junio C Hamano	c0ad465725	write_pack_header(): a helper function Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-28 11:40:48 -07:00
Nguyễn Thái Ngọc Duy	0de1633783	tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-27 11:08:26 -07:00
Junio C Hamano	2070950633	Merge branch 'jk/maint-pack-objects-compete-with-delete' * jk/maint-pack-objects-compete-with-delete: downgrade "packfile cannot be accessed" errors to warnings pack-objects: protect against disappearing packs	2011-10-21 16:04:33 -07:00
Dan McGee	38d4debb6d	pack-objects: don't traverse objects unnecessarily This brings back some of the performance lost in optimizing recency order inside pack objects. We were doing extreme amounts of object re-traversal: for the 2.14 million objects in the Linux kernel repository, we were calling add_to_write_order() over 1.03 billion times (a 0.2% hit rate, making 99.8% of of these calls extraneous). Two optimizations take place here- we can start our objects array iteration from a known point where we left off before we started trying to find our tags, and we don't need to do the deep dives required by add_family_to_write_order() if the object has already been marked as filled. These two optimizations bring some pretty spectacular results via `perf stat`: task-clock: 83373 ms --> 43800 ms (50% faster) cycles: 221,633,461,676 --> 116,307,209,986 (47% fewer) instructions: 149,299,179,939 --> 122,998,800,184 (18% fewer) Helped-by: Ramsay Jones (format string fix in "die" message) Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-20 17:17:49 -07:00
Dan McGee	f380872f0a	pack-objects: rewrite add_descendants_to_write_order() iteratively This removes the need to call this function recursively, shinking the code size slightly and netting a small performance increase. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-18 00:16:32 -07:00
Dan McGee	92bef1a14a	pack-objects: use unsigned int for counter and offset values This is done in some of the new pack layout code introduced in commit `1b4bb16b9e`. This more closely matches the nr_objects global that is unsigned that these variables are based off of and bounded by. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-18 00:16:32 -07:00
Dan McGee	be12681896	pack-objects: mark add_to_write_order() as inline This function is a whole 26 bytes when compiled on x86_64, but is currently invoked over 1.037 billion times when running pack-objects on the Linux kernel git repository. This is hitting the point where micro-optimizations do make a difference, and inlining it only increases the object file size by 38 bytes. As reported by perf, this dropped task-clock from 84183 to 83373 ms, and total cycles from 223.5 billion to 221.6 billion. Not astronomical, but worth getting for adding one word. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-18 00:16:31 -07:00
Jeff King	58a6a9cc43	downgrade "packfile cannot be accessed" errors to warnings These can happen if another process simultaneously prunes a pack. But that is not usually an error condition, because a properly-running prune should have repacked the object into a new pack. So we will notice that the pack has disappeared unexpectedly, print a message, try other packs (possibly after re-scanning the list of packs), and find it in the new pack. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:43:09 -07:00
Jeff King	4c08018204	pack-objects: protect against disappearing packs It's possible that while pack-objects is running, a simultaneously running prune process might delete a pack that we are interested in. Because we load the pack indices early on, we know that the pack contains our item, but by the time we try to open and map it, it is gone. Since `c715f78`, we already protect against this in the normal object access code path, but pack-objects accesses the packs at a lower level. In the normal access path, we call find_pack_entry, which will call find_pack_entry_one on each pack index, which does the actual lookup. If it gets a hit, we will actually open and verify the validity of the matching packfile (using c715f78's is_pack_valid). If we can't open it, we'll issue a warning and pretend that we didn't find it, causing us to go on to the next pack (or on to loose objects). Furthermore, we will cache the descriptor to the opened packfile. Which means that later, when we actually try to access the object, we are likely to still have that packfile opened, and won't care if it has been unlinked from the filesystem. Notice the "likely" above. If there is another pack access in the interim, and we run out of descriptors, we could close the pack. And then a later attempt to access the closed pack could fail (we'll try to re-open it, of course, but it may have been deleted). In practice, this doesn't happen because we tend to look up items and then access them immediately. Pack-objects does not follow this code path. Instead, it accesses the packs at a much lower level, using find_pack_entry_one directly. This means we skip the is_pack_valid check, and may end up with the name of a packfile, but no open descriptor. We can add the same is_pack_valid check here. Unfortunately, the access patterns of pack-objects are not quite as nice for keeping lookup and object access together. We look up each object as we find out about it, and the only later when writing the packfile do we necessarily access it. Which means that the opened packfile may be closed in the interim. In practice, however, adding this check still has value, for three reasons. 1. If you have a reasonable number of packs and/or a reasonable file descriptor limit, you can keep all of your packs open simultaneously. If this is the case, then the race is impossible to trigger. 2. Even if you can't keep all packs open at once, you may end up keeping the deleted one open (i.e., you may get lucky). 3. The race window is shortened. You may notice early that the pack is gone, and not try to access it. Triggering the problem without this check means deleting the pack any time after we read the list of index files, but before we access the looked-up objects. Triggering it with this check means deleting the pack means deleting the pack after we do a lookup (and successfully access the packfile), but before we access the object. Which is a smaller window. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:42:37 -07:00
Junio C Hamano	2e2e7e9dd0	Merge branch 'jc/fetch-verify' * jc/fetch-verify: fetch: verify we have everything we need before updating our ref rev-list --verify-object list-objects: pass callback data to show_objects()	2011-10-05 12:36:20 -07:00
Junio C Hamano	4947367267	list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-09-01 15:46:12 -07:00
Junio C Hamano	324b6b1678	Merge branch 'mh/check-attr-relative' * mh/check-attr-relative: (29 commits) test-path-utils: Add subcommand "prefix_path" test-path-utils: Add subcommand "absolute_path" git-check-attr: Normalize paths git-check-attr: Demonstrate problems with relative paths git-check-attr: Demonstrate problems with unnormalized paths git-check-attr: test that no output is written to stderr Rename git_checkattr() to git_check_attr() git-check-attr: Fix command-line handling to match docs git-check-attr: Drive two tests using the same raw data git-check-attr: Add an --all option to show all attributes git-check-attr: Error out if no pathnames are specified git-check-attr: Process command-line args more systematically git-check-attr: Handle each error separately git-check-attr: Extract a function error_with_usage() git-check-attr: Introduce a new variable git-check-attr: Extract a function output_attr() Allow querying all attributes on a file Remove redundant check Remove redundant call to bootstrap_attr_stack() Extract a function collect_all_attrs() ...	2011-08-17 17:36:22 -07:00
Junio C Hamano	96790ca029	Merge branch 'jc/pack-order-tweak' * jc/pack-order-tweak: pack-objects: optimize "recency order" core: log offset pack data accesses happened	2011-08-05 14:54:57 -07:00
Michael Haggerty	d932f4eb9f	Rename git_checkattr() to git_check_attr() Suggested by: Junio Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-04 15:53:21 -07:00
Junio C Hamano	d907bf8ef3	Merge branch 'jc/index-pack' * jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c	2011-07-19 09:54:51 -07:00
Junio C Hamano	1b4bb16b9e	pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) \| git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-08 10:03:24 -07:00
Junio C Hamano	ef49a7a012	zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:52:15 -07:00
Junio C Hamano	225a6f1068	zlib: wrap deflateBound() too Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:18:17 -07:00
Junio C Hamano	55bb5c9147	zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:10:29 -07:00
Junio C Hamano	15366280c2	Teach core.bigfilethreashold to pack-objects The pack-objects command should take notice of the object file and refrain from attempting to delta large ones, to be consistent with the fast-import command. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-04-05 20:25:49 -07:00
Junio C Hamano	ebcfb3791a	write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-27 23:29:03 -08:00
Junio C Hamano	b361888dd5	thread-utils.h: simplify the inclusion All files that include this header file use the same four line incantation: #ifndef NO_PTHREADS #include <pthread.h> #include "thread-utils.h" #endif Move the responsibility for that gymnastics to the header file from the files that include it. This approach makes it easier to later declare new services that are related to threading in thread-utils.h and have them available to all the threading code. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-12-10 12:58:06 -08:00
Junio C Hamano	39f04dbaac	Merge branch 'jn/thinner-wrapper' * jn/thinner-wrapper: Remove pack file handling dependency from wrapper.o pack-objects: mark file-local variable static wrapper: give zlib wrappers their own translation unit strbuf: move strbuf_branchname to sha1_name.c path helpers: move git_mkstemp* to wrapper.c wrapper: move odb_* to environment.c wrapper: move xmmap() to sha1_file.c	2010-12-03 16:13:06 -08:00
Jonathan Nieder	bc9b21755e	pack-objects: mark file-local variable static old_try_to_free_routine is not meant for use from other files. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-10 11:08:04 -08:00
Nicolas Pitre	71064a956b	make pack-objects a bit more resilient to repo corruption Right now, packing valid objects could fail when creating a thin pack simply because a pack edge object used as a preferred base is corrupted. Since preferred base objects are not strictly needed to produce a valid pack, let's not consider the inability to read them as a fatal error. Delta compression may well be attempted against other objects in the search window. To avoid warning storms (we are in the inner loop of the delta search window) a warning is emitted only on the first occurrence. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-22 14:59:58 -07:00
Štěpán Němec	884220653f	Put a space between `<' and argument in pack-objects usage string This makes it cosistent with other places (including the git-pack-objects(1) manpage itself) and avoids possible confusion (I, for one, mistook `<object-list' for a `<object-list>' typo at first when preparing this series). Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-08 12:31:08 -07:00
Štěpán Němec	0adda9362a	Use parentheses and `...' where appropriate Remove some stray usage of other bracket types and asterisks for the same purpose. Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-08 12:31:07 -07:00
Štěpán Němec	62b4698e55	Use angles for placeholders consistently Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-08 12:29:52 -07:00
Erik Faye-Lund	c03c83152d	do not depend on signed integer overflow Signed integer overflow is not defined in C, so do not depend on it. This fixes a problem with GCC 4.4.0 and -O3 where the optimizer would consider "consumed_bytes > consumed_bytes + bytes" as a constant expression, and never execute the die()-call. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-06 11:10:07 -07:00
Johannes Schindelin	8695353147	Fix typo in pack-objects' usage Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-09-30 12:22:02 -07:00
Junio C Hamano	44e08b003d	Merge branch 'js/try-to-free-stackable' * js/try-to-free-stackable: Do not call release_pack_memory in malloc wrappers when GIT_TRACE is used Have set_try_to_free_routine return the previous routine	2010-06-13 11:21:21 -07:00
Junio C Hamano	ea5f75a64a	Merge branch 'np/malloc-threading' * np/malloc-threading: Thread-safe xmalloc and xrealloc needs a recursive mutex Make xmalloc and xrealloc thread-safe	2010-05-21 04:02:16 -07:00
Junio C Hamano	2e0e8b68e3	Merge branch 'lt/deepen-builtin-source' * lt/deepen-builtin-source: Move 'builtin-*' into a 'builtin/' subdirectory Conflicts: Makefile	2010-03-10 15:25:18 -08:00
Linus Torvalds	81b50f3ce4	Move 'builtin-' into a 'builtin/' subdirectory This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-22 14:29:41 -08:00

... 7 8 9 10 11

523 commits