development/git - HydraGit

mirror of https://github.com/git/git synced 2024-09-13 21:34:42 +00:00

Author	SHA1	Message	Date
Jeff King	b039718d92	drop support for "experimental" loose objects In git v1.4.3, we introduced a new loose object format that encoded some object information outside of the zlib stream. Ultimately the format was dropped in v1.5.3, but we kept the reading side around to help people migrate objects. Each time we open a loose object, we use a heuristic to check whether it is in the normal loose format, or the experimental one. This heuristic is robust in the face of valid data, but it tends to treat corrupted or garbage data as an experimental object. With the regular format, we would notice quickly that zlib's crc does not check out and complain. With the experimental object, we are likely to extract a nonsensical object size and try to allocate a huge buffer, resulting in xmalloc calling "die". This latter behavior is much worse, for two reasons. One, git reports an allocation error when the real error is corruption. And two, the program dies unconditionally, so you cannot even run fsck (which would otherwise ignore the broken object and keep going). We could try to improve the heuristic to err on the side of normal objects in the face of corruption, but there is really little point. The experimental format is long-dead, and was never enabled by default to begin with. We can instead simply remove it. The only affected repository would be one that explicitly set core.legacyheaders in 2007, and then never repacked in the intervening 6 years. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-11-21 11:43:42 -08:00
Junio C Hamano	4ef8d1dd03	sha1_loose_object_info(): do not return success on missing object Since `052fe5ea` (sha1_loose_object_info: make type lookup optional, 2013-07-12), sha1_loose_object_info() returns happily without checking if the object in question exists, which is not what the the caller sha1_object_info_extended() expects; the caller does not even bother checking the existence of the object itself. Noticed-by: Sven Brauch <svenbrauch@googlemail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-11-06 11:03:33 -08:00
Junio C Hamano	cfd10568b0	Sync with v1.8.4.2	2013-10-28 10:51:53 -07:00
Johan Herland	b2476a60bd	sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs There are cases (e.g. when running concurrent fetches in a repo) where multiple Git processes concurrently attempt to create loose objects within the same objects/XX/ dir. The creation of the loose object files is (AFAICS) safe from races, but the creation of the objects/XX/ dir in which the loose objects reside is unsafe, for example: Two concurrent fetches - A and B. As part of its fetch, A needs to store 12aaaaa as a loose object. B, on the other hand, needs to store 12bbbbb as a loose object. The objects/12 directory does not already exist. Concurrently, both A and B determine that they need to create the objects/12 directory (because their first call to git_mkstemp_mode() within create_tmpfile() fails witn ENOENT). One of them - let's say A - executes the following mkdir() call before the other. This first call returns success, and A moves on. When B gets around to calling mkdir(), it fails with EEXIST, because A won the race. The mkdir() error causes B to return -1 from create_tmpfile(), which propagates all the way, resulting in the fetch failing with: error: unable to create temporary file: File exists fatal: failed to write object fatal: unpack-objects failed Although it's hard to add a testcase reproducing this issue, it's easy to provoke if we insert a sleep after the if (mkdir(buffer, 0777) \|\| adjust_shared_perm(buffer)) return -1; block, and then run two concurrent "git fetch"es against the same repo. The fix is to simply handle mkdir() failing with EEXIST as a success. If EEXIST is somehow returned for the wrong reasons (because the relevant objects/XX is not a directory, or is otherwise unsuitable for object storage), the following call to adjust_shared_perm(), or ultimately the retried call to git_mkstemp_mode() will fail, and we end up returning error from create_tmpfile() in any case. Note that there are still cases where two users with unsuitable umasks in a shared repo can end up in two races where one user first wins the mkdir() race to create an objects/XX/ directory, and then the other user wins the adjust_shared_perms() race to chmod() that directory, but fails because it is (transiently, until the first users completes its chmod()) unwriteable to the other user. However, (an equivalent of) this race also exists before this patch, and is made no worse by this patch. Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-28 09:50:34 -07:00
Christian Couder	3fc0dca9ce	sha1_file: move comment about return value where it belongs Commit `5b0864070` (sha1_object_info_extended: make type calculation optional, Jul 12 2013) changed the return value of the sha1_object_info_extended function to 0/-1 for success/error. Previously this function returned the object type for success or -1 for error. But unfortunately the above commit forgot to change or move the comment above this function that says "returns enum object_type or negative". To fix this inconsistency, let's move the comment above the sha1_object_info function where it is still true. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-28 09:07:01 -07:00
Vicent Marti	ec73f5807c	sha1_file: export `git_open_noatime` The `git_open_noatime` helper can be of general interest for other consumers of git's different on-disk formats. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-24 15:44:52 -07:00
Jonathan Nieder	87bcf148d7	Merge branch 'nd/unpack-entry-optim-in-pack-objects' * nd/unpack-entry-optim-in-pack-objects: pack-objects: no crc check when the cached version is used	2013-09-24 23:29:55 -07:00
Junio C Hamano	5ff9f2351a	Merge branch 'jk/has-sha1-file-retry-packed' When an object is not found after checking the packfiles and then loose object directory, read_sha1_file() re-checks the packfiles to prevent racing with a concurrent repacker; teach the same logic to has_sha1_file(). * jk/has-sha1-file-retry-packed: has_sha1_file: re-check pack directory before giving up	2013-09-17 11:41:35 -07:00
Nguyễn Thái Ngọc Duy	77965f8b29	pack-objects: no crc check when the cached version is used Current code makes pack-objects always do check_pack_crc() in unpack_entry() even if right after that we find out there's a cached version and pack access is not needed. Swap two code blocks, search for cached version first, then check crc. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-09-13 11:28:33 -07:00
Junio C Hamano	04fbba0119	Merge branch 'bc/unuse-packfile' Handle memory pressure and file descriptor pressure separately when deciding to release pack windows to honor resource limits. * bc/unuse-packfile: Don't close pack fd when free'ing pack windows sha1_file: introduce close_one_pack() to close packs on fd pressure	2013-09-04 12:30:21 -07:00
Jeff King	45e8a74873	has_sha1_file: re-check pack directory before giving up When we read a sha1 file, we first look for a packed version, then a loose version, and then re-check the pack directory again before concluding that we cannot find it. This lets us handle a process that is writing to the repository simultaneously (e.g., receive-pack writing a new pack followed by a ref update, or git-repack packing existing loose objects into a new pack). However, we do not do the same trick with has_sha1_file; we only check the packed objects once, followed by loose objects. This means that we might incorrectly report that we do not have an object, even though we could find it if we simply re-checked the pack directory. By itself, this is usually not a big deal. The other process is running simultaneously, so we may run has_sha1_file before it writes, anyway. It is a race whether we see the object or not. However, we may also see other things the writing process has done (like updating refs); and in that case, we must be able to also see the new objects. For example, imagine we are doing a for_each_ref iteration, and somebody simultaneously pushes. Receive-pack may write the pack and update a ref after we have examined the objects/pack directory, but before the iteration gets to the updated ref. When we do finally see the updated ref, for_each_ref will call has_sha1_file to check whether the ref is broken. If has_sha1_file returns the wrong answer, we erroneously will think that the ref is broken. For a normal iteration without DO_FOR_EACH_INCLUDE_BROKEN, this means that the caller does not see the ref at all (neither the old nor the new value). So not only will we fail to see the new value of the ref (which is acceptable, since we are running simultaneously with the writer, and we might well read the ref before the writer commits its write), but we will not see the old value either. For programs that act on reachability like pack-objects or prune, this can cause data loss, as we may see the objects referenced by the original ref value as dangling (and either omit them from the pack, or delete them via prune). There's no test included here, because the success case is two processes running simultaneously forever. But you can replicate the issue with: # base.sh # run this in one terminal; it creates and pushes # repeatedly to a repository git init parent && (cd parent && # create a base commit that will trigger us looking at # the objects/pack directory before we hit the updated ref echo content >file && git add file && git commit -m base && # set the unpack limit abnormally low, which # lets us simulate full-size pushes using tiny ones git config receive.unpackLimit 1 ) && git clone parent child && cd child && n=0 && while true; do echo $n >file && git add file && git commit -m $n && git push origin HEAD:refs/remotes/child/master && n=$(($n + 1)) done # fsck.sh # now run this simultaneously in another terminal; it # repeatedly fscks, looking for us to consider the # newly-pushed ref broken. We cannot use for-each-ref # here, as it uses DO_FOR_EACH_INCLUDE_BROKEN, which # skips the has_sha1_file check (and if it wants # more information on the object, it will actually read # the object, which does the proper two-step lookup) cd parent && while true; do broken=`git fsck 2>&1 \| grep remotes/child` if test -n "$broken"; then echo $broken exit 1 fi done Without this patch, the fsck loop fails within a few seconds (and almost instantly if the test repository actually has a large number of refs). With it, the two can run indefinitely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-30 14:53:45 -07:00
Brandon Casey	7c3ecb3254	Don't close pack fd when free'ing pack windows Now that close_one_pack() has been introduced to handle file descriptor pressure, it is not strictly necessary to close the pack file descriptor in unuse_one_window() when we're under memory pressure. Jeff King provided a justification for leaving the pack file open: If you close packfile descriptors, you can run into racy situations where somebody else is repacking and deleting packs, and they go away while you are trying to access them. If you keep a descriptor open, you're fine; they last to the end of the process. If you don't, then they disappear from under you. For normal object access, this isn't that big a deal; we just rescan the packs and retry. But if you are packing yourself (e.g., because you are a pack-objects started by upload-pack for a clone or fetch), it's much harder to recover (and we print some warnings). Let's do so (or uh, not do so). Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-02 09:27:26 -07:00
Brandon Casey	88d0db5557	sha1_file: introduce close_one_pack() to close packs on fd pressure When the number of open packs exceeds pack_max_fds, unuse_one_window() is called repeatedly to attempt to release the least-recently-used pack windows, which, as a side-effect, will also close a pack file after closing its last open window. If a pack file has been opened, but no windows have been allocated into it, it will never be selected by unuse_one_window() and hence its file descriptor will not be closed. When this happens, git may exceed the number of file descriptors permitted by the system. This latter situation can occur in show-ref or receive-pack during ref advertisement. During ref advertisement, receive-pack will iterate over every ref in the repository and advertise it to the client after ensuring that the ref exists in the local repository. If the ref is located inside a pack, then the pack is opened to ensure that it exists, but since the object is not actually read from the pack, no mmap windows are allocated. When the number of open packs exceeds pack_max_fds, unuse_one_window() will not be able to find any windows to free and will not be able to close any packs. Once the per-process file descriptor limit is exceeded, receive-pack will produce a warning, not an error, for each pack it cannot open, and will then most likely fail with an error to spawn rev-list or index-pack like: error: cannot create standard input pipe for rev-list: Too many open files error: Could not run 'git rev-list' This may also occur during upload-pack when refs are packed (in the packed-refs file) and the number of packs that must be opened to verify that these packed refs exist exceeds the file descriptor limit. If the refs are loose, then upload-pack will read each ref from the object database (if the object is in a pack, allocating one or more mmap windows for it) in order to peel tags and advertise the underlying object. But when the refs are packed and peeled, upload-pack will use the peeled sha1 in the packed-refs file and will not need to read from the pack files, so no mmap windows will be allocated and just like with receive-pack, unuse_one_window() will never select these opened packs to close. When we have file descriptor pressure, we just need to find an open pack to close. We can leave the existing mmap windows open. If additional windows need to be mapped into the pack file, it will be reopened when necessary. If the pack file has been rewritten in the mean time, open_packed_git_1() should notice when it compares the file size or the pack's sha1 checksum to what was previously read from the pack index, and reject it. Let's introduce a new function close_one_pack() designed specifically for this purpose to search for and close the least-recently-used pack, where LRU is defined as (in order of preference): * pack with oldest mtime and no allocated mmap windows * pack with the least-recently-used windows, i.e. the pack with the oldest most-recently-used window, where none of the windows are in use * pack with the least-recently-used windows Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-02 08:53:54 -07:00
Junio C Hamano	356df9bd8d	Merge branch 'jk/cat-file-batch-optim' If somebody wants to only know on-disk footprint of an object without having to know its type or payload size, we can bypass a lot of code to cheaply learn it. * jk/cat-file-batch-optim: Fix some sparse warnings sha1_object_info_extended: pass object_info to helpers sha1_object_info_extended: make type calculation optional packed_object_info: make type lookup optional packed_object_info: hoist delta type resolution to helper sha1_loose_object_info: make type lookup optional sha1_object_info_extended: rename "status" to "type" cat-file: disable object/refname ambiguity check for batch mode	2013-07-24 19:21:21 -07:00
Ramsay Jones	d099b7173d	Fix some sparse warnings Sparse issues some "Using plain integer as NULL pointer" warnings. Each warning relates to the use of an '{0}' initialiser expression in the declaration of an 'struct object_info'. The first field of this structure has pointer type. Thus, in order to suppress these warnings, we replace the initialiser expression with '{NULL}'. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-18 16:43:47 -07:00
Junio C Hamano	802f878b86	Merge branch 'jk/in-pack-size-measurement' "git cat-file --batch-check=<format>" is added, primarily to allow on-disk footprint of objects in packfiles (often they are a lot smaller than their true size, when expressed as deltas) to be reported. * jk/in-pack-size-measurement: pack-revindex: radix-sort the revindex pack-revindex: use unsigned to store number of objects cat-file: split --batch input lines on whitespace cat-file: add %(objectsize:disk) format atom cat-file: add --batch-check=<format> cat-file: refactor --batch option parsing cat-file: teach --batch to stream blob objects t1006: modernize output comparisons teach sha1_object_info_extended a "disk_size" query zero-initialize object_info structs	2013-07-18 12:59:41 -07:00
Jeff King	23c339c0f2	sha1_object_info_extended: pass object_info to helpers We take in a "struct object_info" which contains pointers to storage for items the caller cares about. But then rather than pass the whole object to the low-level loose/packed helper functions, we pass the individual pointers. Let's pass the whole struct instead, which will make adding more items later easier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:29:27 -07:00
Jeff King	5b0864070e	sha1_object_info_extended: make type calculation optional Each caller of sha1_object_info_extended sets up an object_info struct to tell the function which elements of the object it wants to get. Until now, getting the type of the object has always been required (and it is returned via the return type rather than a pointer in object_info). This can involve actually opening a loose object file to determine its type, or following delta chains to determine a packed file's base type. These effects produce a measurable slow-down when doing a "cat-file --batch-check" that does not include %(objecttype). This patch adds a "typep" query to struct object_info, so that it can be optionally queried just like size and disk_size. As a result, the return type of the function is no longer the object type, but rather 0/-1 for success/error. As there are only three callers total, we just fix up each caller rather than keep a compatibility wrapper: 1. The simpler sha1_object_info wrapper continues to always ask for and return the type field. 2. The istream_source function wants to know the type, and so always asks for it. 3. The cat-file batch code asks for the type only when %(objecttype) is part of the format string. On linux.git, the best-of-five for running: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectsize:disk)' on a fully packed repository goes from: real 0m8.680s user 0m8.160s sys 0m0.512s to: real 0m7.205s user 0m6.580s sys 0m0.608s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:16:36 -07:00
Jeff King	412916ee13	packed_object_info: make type lookup optional Currently, packed_object_info can save some work by not calculating the size or disk_size of the object if the caller is not interested. However, it always calculates the true object type, whether the caller cares or not, and only optionally returns the easy-to-get "representation type". Let's swap these types. The function will now return the representation type (or OBJ_BAD on failure), and will only optionally fill in the true type. There should be no behavior change yet, as the only caller, sha1_object_info_extended, will always feed it a type pointer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:14:06 -07:00
Jeff King	90191d37ab	packed_object_info: hoist delta type resolution to helper To calculate the type of a packed object, we must walk down its delta chain until we hit a true base object with a real type. Most of the code in packed_object_info is for handling this case. Let's hoist it out into a separate helper function, which will make it easier to make the type-lookup optional in the future (and keep our indentation level sane). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:13:23 -07:00
Jeff King	052fe5eaca	sha1_loose_object_info: make type lookup optional Until recently, the only items to request from sha1_object_info_extended were type and size. This meant that we always had to open a loose object file to determine one or the other. But with the addition of the disk_size query, it's possible that we can fulfill the query without even opening the object file at all. However, since the function interface always returns the type, we have no way of knowing whether the caller cares about it or not. This patch only modified sha1_loose_object_info to make type lookup optional using an out-parameter, similar to the way the size is handled (and the return value is "0" or "-1" for success or error, respectively). There should be no functional change yet, though, as sha1_object_info_extended, the only caller, will always ask for a type. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:10:04 -07:00
Jeff King	f2f57e31f6	sha1_object_info_extended: rename "status" to "type" The value we get from each low-level object_info function (e.g., loose, packed) is actually the object type (or -1 for error). Let's explicitly call it "type", which will make further refactorings easier to read. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:10:03 -07:00
Jeff King	161f00e708	teach sha1_object_info_extended a "disk_size" query Using sha1_object_info_extended, a caller can find out the type of an object, its size, and information about where it is stored. In addition to the object's "true" size, it can also be useful to know the size that the object takes on disk (e.g., to generate statistics about which refs consume space). This patch adds a "disk_sizep" field to "struct object_info", and fills it in during sha1_object_info_extended if it is non-NULL. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-07 10:53:22 -07:00
Jeff King	7c07385d90	zero-initialize object_info structs The sha1_object_info_extended function expects the caller to provide a "struct object_info" which contains pointers to "query" items that will be filled in. The purpose of providing pointers rather than storing the response directly in the struct is so that callers can choose not to incur the expense in finding particular fields that they do not care about. Right now the only query item is "sizep", and all callers set it explicitly to choose whether or not to query it; they can then leave the rest of the struct uninitialized. However, as we add new query items, each caller will have to be updated to explicitly turn off the new ones (by setting them to NULL). Instead, let's teach each caller to zero-initialize the struct, so that they do not have to learn about each new query item added. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-07 10:50:13 -07:00
Junio C Hamano	ee64e345b1	Merge branch 'jk/unpack-entry-fallback-to-another' * jk/unpack-entry-fallback-to-another: unpack_entry: do not die when we fail to apply a delta t5303: drop "count=1" from corruption dd	2013-06-23 14:53:20 -07:00
Junio C Hamano	8f0c843aab	Merge branch 'nd/traces' * nd/traces: git.txt: document GIT_TRACE_PACKET core: use env variable instead of config var to turn on logging pack access	2013-06-20 16:02:28 -07:00
Jeff King	1ee886c1f0	unpack_entry: do not die when we fail to apply a delta When we try to load an object from disk and fail, our general strategy is to see if we can get it from somewhere else (e.g., a loose object). That lets users fix corruption problems by copying known-good versions of objects into the object database. We already handle the case where we were not able to read the delta from disk. However, when we find that the delta we read does not apply, we simply die. This case is harder to trigger, as corruption in the delta data itself would trigger a crc error from zlib. However, a corruption that pointed us at the wrong delta base might cause it. We can do the same "fail and try to find the object elsewhere" trick instead of dying. This not only gives us a chance to recover, but also puts us on code paths that will alert the user to the problem (with the current message, they do not even know which sha1 caused the problem). Note that unlike some other pack corruptions, we do not recover automatically from this case when doing a repack. There is nothing apparently wrong with the delta, as it points to a valid, accessible object, and we realize the error only when the resulting size does not match up. And in theory, one could even have a case where the corrupted size is the same, and the problem would only be noticed by recomputing the sha1. We can get around this by recomputing the deltas with --no-reuse-delta, which our test does (and this is probably good advice for anyone recovering from pack corruption). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-14 14:56:09 -07:00
Junio C Hamano	cf6de2968c	Merge branch 'tr/sha1-file-silence-loose-object-info-under-prune-race' * tr/sha1-file-silence-loose-object-info-under-prune-race: sha1_file: silence sha1_loose_object_info	2013-06-11 13:31:19 -07:00
Nguyễn Thái Ngọc Duy	b12ca9631f	core: use env variable instead of config var to turn on logging pack access `5f44324` (core: log offset pack data accesses happened - 2011-07-06) provides a way to observe pack access patterns via a config switch. Setting an environment variable looks more obvious than a config var, especially when you just need to _observe_, and more inline with other tracing knobs we have. Document it as it may be useful for remote troubleshooting. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-09 16:07:50 -07:00
Thomas Rast	dbea72a8c0	sha1_file: silence sha1_loose_object_info sha1_object_info() returns -1 (OBJ_BAD) if it cannot find the object for some reason, which suggests that it wants the _caller_ to report this error. However, part of its work happens in sha1_loose_object_info, which _does_ report errors itself. This is doubly strange because: * packed_object_info(), which is the other half of the duo, does _not_ report this. * In the event that an object is packed and pruned while sha1_object_info_extended() goes looking for it, we would erroneously show the error -- even though the code of the latter function purports to handle this case gracefully. * A caller might invoke sha1_object_info() to find the type of an object even if that object is not known to exist. Silence this error. The others remain untouched as a corrupt object is a much more grave error than it merely being absent. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-03 12:51:53 -07:00
Felipe Contreras	4b8f772ce4	sha1_file: trivial style cleanup Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-03 10:14:48 -07:00
Junio C Hamano	7c2e8fc684	Merge branch 'tr/unpack-entry-use-after-free-fix' * tr/unpack-entry-use-after-free-fix: unpack_entry: avoid freeing objects in base cache	2013-05-03 15:18:04 -07:00
Thomas Rast	756a042600	unpack_entry: avoid freeing objects in base cache In the !delta_data error path of unpack_entry(), we run free(base). This became a window for use-after-free() in `abe601b` (sha1_file: remove recursion in unpack_entry, 2013-03-27), as follows: Before `abe601b`, we got the 'base' from cache_or_unpack_entry(..., 0); keep_cache=0 tells it to also remove that entry. So the 'base' is at this point not cached, and freeing it in the error path is the right thing. After `abe601b`, the structure changed: we use a three-phase approach where phase 1 finds the innermost base or a base that is already in the cache. In phase 3 we therefore know that all bases we unpack are not part of the delta cache yet. (Observe that we pop from the cache in phase 1, so this is also true for the very first base.) So we make no further attempts to look up the bases in the cache, and just call add_delta_base_cache() on every base object we have assembled. But the !delta_data error path remained unchanged, and now calls free() on a base that has already been entered in the cache. This means that there is a use-after-free if we later use the same base again. So remove that free(); we are still going to use that data. Reported-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-30 15:43:48 -07:00
Junio C Hamano	193e28f050	Merge branch 'tr/packed-object-info-wo-recursion' Attempts to reduce the stack footprint of sha1_object_info() and unpack_entry() codepaths. * tr/packed-object-info-wo-recursion: sha1_file: remove recursion in unpack_entry Refactor parts of in_delta_base_cache/cache_or_unpack_entry sha1_file: remove recursion in packed_object_info	2013-04-18 11:46:23 -07:00
Junio C Hamano	b9c78e9723	Merge branch 'jk/check-corrupt-objects-carefully' Have the streaming interface and other codepaths more carefully examine for corrupt objects. * jk/check-corrupt-objects-carefully: clone: leave repo in place after checkout errors clone: run check_everything_connected clone: die on errors from unpack_trees add tests for cloning corrupted repositories streaming_write_entry: propagate streaming errors add test for streaming corrupt blobs avoid infinite loop in read_istream_loose read_istream_filtered: propagate read error from upstream check_sha1_signature: check return value from read_istream stream_blob_to_fd: detect errors reading from stream	2013-04-03 09:34:29 -07:00
Junio C Hamano	37ba4c61d0	Merge branch 'sw/safe-create-leading-dir-race' * sw/safe-create-leading-dir-race: safe_create_leading_directories: fix race that could give a false negative	2013-04-02 15:09:48 -07:00
Jeff King	f54fac5378	check_sha1_signature: check return value from read_istream It's possible for read_istream to return an error, in which case we just end up in an infinite loop (aside from EOF, we do not even look at the result, but just feed it straight into our running hash). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:46:55 -07:00
Thomas Rast	abe601bba5	sha1_file: remove recursion in unpack_entry Similar to the recursion in packed_object_info(), this leads to problems on stack-space-constrained systems in the presence of long delta chains. We proceed in three phases: 1. Dig through the delta chain, saving each delta object's offsets and size on an ad-hoc stack. 2. Unpack the base object at the bottom. 3. Unpack and apply the deltas from the stack. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:25:16 -07:00
Thomas Rast	84dd81c126	Refactor parts of in_delta_base_cache/cache_or_unpack_entry The delta base cache lookup and test were shared. Refactor them; we'll need both parts again. Also, we'll use the clearing routine later. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:24:43 -07:00
Steven Walter	928734d993	safe_create_leading_directories: fix race that could give a false negative If two processes are racing to create the same directory tree, they will both see that the directory doesn't exist, both try to mkdir(), and one of them will fail. This is okay, as we only care that the directory gets created. So, we add a check for EEXIST from mkdir, and continue when the directory exists, taking the same codepath as the case where the earlier stat() succeeds and finds a directory. Signed-off-by: Steven Walter <stevenrwalter@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-26 21:07:42 -07:00
Thomas Rast	790d96c023	sha1_file: remove recursion in packed_object_info packed_object_info() and packed_delta_info() were mutually recursive. The former would handle ordinary types and defer deltas to the latter; the latter would use the former to resolve the delta base. This arrangement, however, leads to trouble with threaded index-pack and long delta chains on platforms where thread stacks are small, as happened on OS X (512kB thread stacks by default) with the chromium repo. The task of the two functions is not all that hard to describe without any recursion, however. It proceeds in three steps: - determine the representation type and size, based on the outermost object (delta or not) - follow through the delta chain, if any - determine the object type from what is found at the end of the delta chain The only complication stems from the error recovery. If parsing fails at any step, we want to mark that object (within the pack) as bad and try getting the corresponding SHA1 from elsewhere. If that also fails, we want to repeat this process back up the delta chain until we find a reasonable solution or conclude that there is no way to reconstruct the object. (This is conveniently checked by t5303.) To achieve that within the pack, we keep track of the entire delta chain in a stack. When things go sour, we process that stack from the top, marking entries as bad and attempting to re-resolve by sha1. To avoid excessive malloc(), the stack starts out with a small stack-allocated array. The choice of 64 is based on the default of pack.depth, which is 50, in the hope that it covers "most" delta chains without any need for malloc(). It's much harder to make the actual re-resolving by sha1 nonrecursive, so we skip that. If you can't afford that recursion, your corruption problems are more serious than your stack size problems. Reported-by: Stefan Zager <szager@google.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-25 15:48:18 -07:00
Nguyễn Thái Ngọc Duy	543c5caa6c	count-objects: report garbage files in pack directory too prepare_packed_git_one() is modified to allow count-objects to hook a report function to so we don't need to duplicate the pack searching logic in count-objects.c. When report_pack_garbage is NULL, the overhead is insignificant. The garbage is reported with warning() instead of error() in packed garbage case because it's not an error to have garbage. Loose garbage is still reported as errors and will be converted to warnings later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-15 08:13:13 -08:00
Nguyễn Thái Ngọc Duy	d90906a902	sha1_file: reorder code in prepare_packed_git_one() The current loop does while (...) { if (it is not an .idx file) continue; process .idx file; } and is reordered to while (...) { if (it is an .idx file) { process .idx file; } } This makes it easier to add new extension file processing. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-13 07:42:05 -08:00
Michael Haggerty	c595016402	link_alt_odb_entries(): take (char *, len) rather than two pointers Change link_alt_odb_entries() to take the length of the "alt" parameter rather than a pointer to the end of the "alt" string. This is the more common calling convention and simplifies the code a tiny bit. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>	2012-11-08 12:06:53 -05:00
Michael Haggerty	6eac50d827	link_alt_odb_entries(): use string_list_split_in_place() Change link_alt_odb_entry() to take a NUL-terminated string instead of (char *, len). Use string_list_split_in_place() rather than inline code in link_alt_odb_entries(). This approach saves some code and also avoids the (probably harmless) error of passing a non-NUL-terminated string to is_absolute_path(). Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>	2012-11-08 12:06:53 -05:00
Joachim Schmitz	a0788266d3	sha1_file.c: introduce get_max_fd_limit() helper Not all platforms have getrlimit(), but there are other ways to see the maximum number of files that a process can have open. If getrlimit() is unavailable, fall back to sysconf(_SC_OPEN_MAX) if available, and use OPEN_MAX from <limits.h>. Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-08-24 09:46:01 -07:00
Junio C Hamano	fbea95ce10	Merge branch 'hv/link-alt-odb-entry' The code to avoid mistaken attempt to add the object directory itself as its own alternate could read beyond end of a string while comparison. * hv/link-alt-odb-entry: link_alt_odb_entry: fix read over array bounds reported by valgrind	2012-07-30 12:55:01 -07:00
Heiko Voigt	cb2912c324	link_alt_odb_entry: fix read over array bounds reported by valgrind pfxlen can be longer than the path in objdir when relative_base contains the path to gits object directory. Here we are interested in checking if ent->base[] (the part that corresponds to .git/objects) is the same string as objdir, and the code NUL-terminated ent->base[] to LEADING PATH\0XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\0 in preparation for these "duplicate check" step (before we return from the function, the first NUL is turned into '/' so that we can fill XX when probing for loose objects). All we need to do is to compare the string with the path to our object directory. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-29 18:02:51 -07:00
Junio C Hamano	4809ff858b	Merge branch 'hv/submodule-alt-odb' When peeking into object stores of submodules, the code forgot that they might borrow objects from alternate object stores on their own. By Heiko Voigt * hv/submodule-alt-odb: teach add_submodule_odb() to look for alternates	2012-05-23 13:35:06 -07:00
Heiko Voigt	5e73633dbf	teach add_submodule_odb() to look for alternates Since we allow to link other object databases when loading a submodules database we should also load possible alternates. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-14 11:56:42 -07:00
Pete Wyckoff	5eaeda70de	remove blank filename in error message When write_loose_object() finds that it is unable to create a temporary file, it complains, for instance: unable to create temporary sha1 filename : Too many open files That extra space was supposed to be the name of the file, and will be an empty string if the git_mkstemps_mode() fails. The name of the temporary file is unimportant; delete it. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:54 -07:00
Pete Wyckoff	82247e9bd5	remove superfluous newlines in error messages The error handling routines add a newline. Remove the duplicate ones in error messages. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:51 -07:00
Nguyễn Thái Ngọc Duy	090ea12671	parse_object: avoid putting whole blob in core Traditionally, all the callers of check_sha1_signature() first called read_sha1_file() to prepare the whole object data in core, and called this function. The function is used to revalidate what we read from the object database actually matches the object name we used to ask for the data from the object database. Update the API to allow callers to pass NULL as the object data, and have the function read and hash the object data using streaming API to recompute the object name, without having to hold everything in core at the same time. This is most useful in parse_object() that parses a blob object, because this caller does not have to keep the actual blob data around in memory after a "struct blob" is returned. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-03-07 09:07:38 -08:00
Junio C Hamano	a09a0c2709	Merge branch 'jk/maint-avoid-streaming-filtered-contents' into maint * jk/maint-avoid-streaming-filtered-contents: do not stream large files to pack when filters are in use teach dry-run convert_to_git not to require a src buffer teach convert_to_git a "dry run" mode	2012-03-04 22:16:40 -08:00
Junio C Hamano	31e3d834b3	Merge branch 'jk/maint-avoid-streaming-filtered-contents' * jk/maint-avoid-streaming-filtered-contents: do not stream large files to pack when filters are in use teach dry-run convert_to_git not to require a src buffer teach convert_to_git a "dry run" mode	2012-02-26 23:05:38 -08:00
Jeff King	4f22b1015d	do not stream large files to pack when filters are in use Because git's object format requires us to specify the number of bytes in the object in its header, we must know the size before streaming a blob into the object database. This is not a problem when adding a regular file, as we can get the size from stat(). However, when filters are in use (such as autocrlf, or the ident, filter, or eol gitattributes), we have no idea what the ultimate size will be. The current code just punts on the whole issue and ignores filter configuration entirely for files larger than core.bigfilethreshold. This can generate confusing results if you use filters for large binary files, as the filter will suddenly stop working as the file goes over a certain size. Rather than try to handle unknown input sizes with streaming, this patch just turns off the streaming optimization when filters are in use. This has a slight performance regression in a very specific case: if you have autocrlf on, but no gitattributes, a large binary file will avoid the streaming code path because we don't know beforehand whether it will need conversion or not. But if you are handling large binary files, you should be marking them as such via attributes (or at least not using autocrlf, and instead marking your text files as such). And the flip side is that if you have a large _non_-binary file, there is a correctness improvement; before we did not apply the conversion at all. The first half of the new t1051 script covers these failures on input. The second half tests the matching output code paths. These already work correctly, and do not need any adjustment. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-24 14:18:20 -08:00
Junio C Hamano	f3ccea8dd4	Merge branch 'nd/find-pack-entry-recent-cache-invalidation' into maint * nd/find-pack-entry-recent-cache-invalidation: find_pack_entry(): do not keep packed_git pointer locally sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()	2012-02-21 14:56:36 -08:00
Junio C Hamano	c6a4e3f7a7	Merge branch 'mm/empty-loose-error-message' into maint * mm/empty-loose-error-message: fsck: give accurate error message on empty loose object files	2012-02-16 14:00:25 -08:00
Junio C Hamano	dd5253b4bd	Merge branch 'nd/find-pack-entry-recent-cache-invalidation' * nd/find-pack-entry-recent-cache-invalidation: find_pack_entry(): do not keep packed_git pointer locally sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()	2012-02-12 22:43:03 -08:00
Junio C Hamano	8c18a6f3fa	Merge branch 'mm/empty-loose-error-message' * mm/empty-loose-error-message: fsck: give accurate error message on empty loose object files	2012-02-12 22:42:02 -08:00
Matthieu Moy	33e42de0d2	fsck: give accurate error message on empty loose object files Since `3ba7a06552` (A loose object is not corrupt if it cannot be read due to EMFILE), "git fsck" on a repository with an empty loose object file complains with the error message fatal: failed to read object <sha1>: Invalid argument This comes from a failure of mmap on this empty file, which sets errno to EINVAL. Instead of calling xmmap on empty file, we display a clean error message ourselves, and return a NULL pointer. The new message is error: object file .git/objects/09/<rest-of-sha1> is empty fatal: loose object <sha1> (stored in .git/objects/09/<rest-of-sha1>) is corrupt The second line was already there before the regression in `3ba7a06552`, and the first is an additional message, that should help diagnosing the problem for the user. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-06 11:05:36 -08:00
Nguyễn Thái Ngọc Duy	c01f51cc75	find_pack_entry(): do not keep packed_git pointer locally Commit `f7c22cc` (always start looking up objects in the last used pack first - 2007-05-30) introduce a static packed_git* pointer as an optimization. The kept pointer however may become invalid if free_pack_by_name() happens to free that particular pack. Current code base does not access packs after calling free_pack_by_name() so it should not be a problem. Anyway, move the pointer out so that free_pack_by_name() can reset it to avoid running into troubles in future. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 14:12:42 -08:00
Nguyễn Thái Ngọc Duy	95099731bf	sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry() The new helper function implements the logic to find the offset for the object in one pack and fill a pack_entry structure. The next patch will restructure the loop and will call the helper from two places. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 14:12:41 -08:00
Ævar Arnfjörð Bjarmason	ab1900a36e	Appease Sun Studio by renaming "tmpfile" On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-21 10:21:04 -08:00
Junio C Hamano	48b303675a	Merge branch 'jc/stream-to-pack' * jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h	2011-12-16 22:33:40 -08:00
Junio C Hamano	df6246ed78	Merge branch 'nd/misc-cleanups' into maint * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-13 22:02:51 -08:00
Junio C Hamano	62cdb6b23a	Merge branch 'nd/misc-cleanups' * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-05 15:10:20 -08:00
Junio C Hamano	568508e765	bulk-checkin: replace fast-import based implementation This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-01 11:46:09 -08:00
Ramkumar Ramachandra	5e12e78e52	sha1_file: don't mix enum with int Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-15 16:09:20 -08:00
Junio C Hamano	ea4f9685cb	unpack_object_header_buffer(): clear the size field upon error The callers do not use the returned size when the function says it did not use any bytes and sets the type to OBJ_BAD, so this should not matter in practice, but it is a good code hygiene anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-27 11:42:57 -07:00
Junio C Hamano	2070950633	Merge branch 'jk/maint-pack-objects-compete-with-delete' * jk/maint-pack-objects-compete-with-delete: downgrade "packfile cannot be accessed" errors to warnings pack-objects: protect against disappearing packs	2011-10-21 16:04:33 -07:00
Jeff King	58a6a9cc43	downgrade "packfile cannot be accessed" errors to warnings These can happen if another process simultaneously prunes a pack. But that is not usually an error condition, because a properly-running prune should have repacked the object into a new pack. So we will notice that the pack has disappeared unexpectedly, print a message, try other packs (possibly after re-scanning the list of packs), and find it in the new pack. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:43:09 -07:00
Jeff King	4c08018204	pack-objects: protect against disappearing packs It's possible that while pack-objects is running, a simultaneously running prune process might delete a pack that we are interested in. Because we load the pack indices early on, we know that the pack contains our item, but by the time we try to open and map it, it is gone. Since `c715f78`, we already protect against this in the normal object access code path, but pack-objects accesses the packs at a lower level. In the normal access path, we call find_pack_entry, which will call find_pack_entry_one on each pack index, which does the actual lookup. If it gets a hit, we will actually open and verify the validity of the matching packfile (using c715f78's is_pack_valid). If we can't open it, we'll issue a warning and pretend that we didn't find it, causing us to go on to the next pack (or on to loose objects). Furthermore, we will cache the descriptor to the opened packfile. Which means that later, when we actually try to access the object, we are likely to still have that packfile opened, and won't care if it has been unlinked from the filesystem. Notice the "likely" above. If there is another pack access in the interim, and we run out of descriptors, we could close the pack. And then a later attempt to access the closed pack could fail (we'll try to re-open it, of course, but it may have been deleted). In practice, this doesn't happen because we tend to look up items and then access them immediately. Pack-objects does not follow this code path. Instead, it accesses the packs at a much lower level, using find_pack_entry_one directly. This means we skip the is_pack_valid check, and may end up with the name of a packfile, but no open descriptor. We can add the same is_pack_valid check here. Unfortunately, the access patterns of pack-objects are not quite as nice for keeping lookup and object access together. We look up each object as we find out about it, and the only later when writing the packfile do we necessarily access it. Which means that the opened packfile may be closed in the interim. In practice, however, adding this check still has value, for three reasons. 1. If you have a reasonable number of packs and/or a reasonable file descriptor limit, you can keep all of your packs open simultaneously. If this is the case, then the race is impossible to trigger. 2. Even if you can't keep all packs open at once, you may end up keeping the deleted one open (i.e., you may get lucky). 3. The race window is shortened. You may notice early that the pack is gone, and not try to access it. Triggering the problem without this check means deleting the pack any time after we read the list of index files, but before we access the looked-up objects. Triggering it with this check means deleting the pack means deleting the pack after we do a lookup (and successfully access the packfile), but before we access the object. Which is a smaller window. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:42:37 -07:00
Junio C Hamano	e99f8c6dcf	Merge branch 'wh/normalize-alt-odb-path' * wh/normalize-alt-odb-path: sha1_file: normalize alt_odb path before comparing and storing	2011-10-05 12:36:22 -07:00
Hui Wang	5bdf0a8468	sha1_file: normalize alt_odb path before comparing and storing When it needs to compare and add an alt object path to the alt_odb_list, we normalize this path first since comparing normalized path is easy to get correct result. Use strbuf to replace some string operations, since it is cleaner and safer. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Hui Wang <Hui.Wang@windriver.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-09-07 11:47:43 -07:00
Junio C Hamano	2478bd8318	Merge branch 'jc/maint-clone-alternates' * jc/maint-clone-alternates: clone: clone from a repository with relative alternates clone: allow more than one --reference Conflicts: builtin/clone.c	2011-08-28 21:19:21 -07:00
Junio C Hamano	6fcb384869	Merge branch 'rt/zlib-smaller-window' * rt/zlib-smaller-window: test: consolidate definition of $LF Tolerate zlib deflation with window size < 32Kb	2011-08-23 15:40:33 -07:00
Junio C Hamano	e6baf4a1ae	clone: clone from a repository with relative alternates Cloning from a local repository blindly copies or hardlinks all the files under objects/ hierarchy. This results in two issues: - If the repository cloned has an "objects/info/alternates" file, and the command line of clone specifies --reference, the ones specified on the command line get overwritten by the copy from the original repository. - An entry in a "objects/info/alternates" file can specify the object stores it borrows objects from as a path relative to the "objects/" directory. When cloning a repository with such an alternates file, if the new repository is not sitting next to the original repository, such relative paths needs to be adjusted so that they can be used in the new repository. This updates add_to_alternates_file() to take the path to the alternate object store, including the "/objects" part at the end (earlier, it was taking the path to $GIT_DIR and was adding "/objects" itself), as it is technically possible to specify in objects/info/alternates file the path of a directory whose name does not end with "/objects". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-23 09:56:14 -07:00
Roberto Tyley	7f684a2aff	Tolerate zlib deflation with window size < 32Kb Git currently reports loose objects as 'corrupt' if they've been deflated using a window size less than 32Kb, because the experimental_loose_object() function doesn't recognise the header byte as a zlib header. This patch makes the function tolerant of all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice it's accuracy in distingushing the standard loose-object format from the experimental (now abandoned) format. On memory constrained systems zlib may use a much smaller window size - working on Agit, I found that Android uses a 4KB window; giving a header byte of 0x48, not 0x78. Consequently all loose objects generated appear 'corrupt', which is why Agit is a read-only Git client at this time - I don't want my client to generate Git repos that other clients treat as broken :( This patch makes Git tolerant of different deflate settings - it might appear that it changes experimental_loose_object() to the point where it could incorrectly identify the experimental format as the standard one, but the two criteria (bitmask & checksum) can only give a false result for an experimental object where both of the following are true: 1) object size is exactly 8 bytes when uncompressed (bitmask) 2) [single-byte in-pack git type&size header] * 256 + [1st byte of the following zlib header] % 31 = 0 (checksum) As it happens, for all possible combinations of valid object type (1-4) and window bits (0-7), the only time when the checksum will be divisible by 31 is for 0x1838 - ie object type 1, a Commit - which, due the fields all Commit objects must contain, could never be as small as 8 bytes in size. Given this, the combination of the two criteria (bitmask & checksum) always correctly determines the buffer format, and is more tolerant than the previous version. The alternative to this patch is simply removing support for the experimental format, which I am also totally cool with. References: Android uses a 4KB window for deflation: http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53 Code snippet searching for false positives with the zlib checksum: https://gist.github.com/1118177 Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-11 13:02:47 -07:00
Junio C Hamano	96790ca029	Merge branch 'jc/pack-order-tweak' * jc/pack-order-tweak: pack-objects: optimize "recency order" core: log offset pack data accesses happened	2011-08-05 14:54:57 -07:00
Junio C Hamano	d48929e1c3	Merge branch 'jc/legacy-loose-object' into maint * jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format	2011-08-01 14:43:58 -07:00
Junio C Hamano	d907bf8ef3	Merge branch 'jc/index-pack' * jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c	2011-07-19 09:54:51 -07:00
Junio C Hamano	eb4f4076aa	Merge branch 'jc/zlib-wrap' * jc/zlib-wrap: zlib: allow feeding more than 4GB in one go zlib: zlib can only process 4GB at a time zlib: wrap deflateBound() too zlib: wrap deflate side of the API zlib: wrap inflateInit2 used to accept only for gzip format zlib: wrap remaining calls to direct inflate/inflateEnd zlib wrapper: refactor error message formatter Conflicts: sha1_file.c	2011-07-19 09:33:04 -07:00
Junio C Hamano	5f2e448370	Merge branch 'jc/legacy-loose-object' * jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format	2011-07-13 14:31:34 -07:00
Junio C Hamano	5f44324d88	core: log offset pack data accesses happened In a workload other than "git log" (without pathspec nor any option that causes us to inspect trees and blobs), the recency pack order is said to cause the access jump around quite a bit. Add a hook to allow us observe how bad it is. "git config core.logpackaccess /var/tmp/pal.txt" will give you the log in the specified file. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-06 19:09:29 -07:00
Junio C Hamano	ef49a7a012	zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:52:15 -07:00
Junio C Hamano	55bb5c9147	zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:10:29 -07:00
Junio C Hamano	cc5c54e78b	sha1_file.c: "legacy" is really the current format Every time I look at the read-loose-object codepath, legacy_loose_object() function makes my brain go through mental contortion. When we were playing with the experimental loose object format, it may have made sense to call the traditional format "legacy", in the hope that the experimental one will some day replace it to become official, but it never happened. This renames the function (and negates its return value) to detect if we are looking at the experimental format, and move the code around in its caller which used to do "if we are looing at legacy, do this special case, otherwise the normal case is this". The codepath to read from the loose objects in experimental format is the "unlikely" case. Someday after Git 2.0, we should drop the support of this format. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-08 16:39:33 -07:00
Junio C Hamano	3de89c9d42	verify-pack: use index-pack --verify This finally gets rid of the inefficient verify-pack implementation that walks objects in the packfile in their object name order and replaces it with a call to index-pack --verify. As a side effect, it also removes packed_object_info_detail() API which is rather expensive. As this changes the way errors are reported (verify-pack used to rely on the usual runtime error detection routine unpack_entry() to diagnose the CRC errors in an entry in the .idx file; index-pack --verify checks the whole .idx file in one go), update a test that expected the string "CRC" to appear in the error message. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-05 22:45:38 -07:00
Jim Meyering	23c7df6bdd	sha1_file: use the correct type (ssize_t, not size_t) for read-style function Using an unsigned type, we would fail to detect a read error and then proceed to try to write (size_t)-1 bytes. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 11:25:59 -07:00
Junio C Hamano	5cfe4256d9	Merge branch 'jc/bigfile' * jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag	2011-05-25 16:23:26 -07:00
Junio C Hamano	f0270efd46	sha1_file.c: expose helpers to read loose objects Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header() available to the streaming read API by exporting them via cache.h header file. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 23:16:53 -07:00
Junio C Hamano	f8c8abc5b7	unpack_object_header(): make it public This function is used to read and skip over the per-object header in a packfile. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 18:38:54 -07:00
Junio C Hamano	5266d369b2	sha1_object_info_extended(): hint about objects in delta-base cache An object found in the delta-base cache is not guaranteed to stay there, but we know it came from a pack and it is likely to give us a quick access if we read_sha1_file() it right now, which is a piece of useful information. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 18:38:50 -07:00
Junio C Hamano	61d7503da1	Merge branch 'jc/replacing' * jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h Conflicts: environment.c	2011-05-19 20:37:21 -07:00
Junio C Hamano	9a49059022	sha1_object_info_extended(): expose a bit more info The original interface for sha1_object_info() takes an object name and gives back a type and its size (the latter is given only when it was asked). The new interface wraps its implementation and exposes a bit more pieces of information that the interface used to discard, namely: - where the object is stored (loose? cached? packed?) - if packed, where in which packfile? Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * In the earlier round, this used u.pack.delta to record the length of the delta chain, but the caller is not necessarily interested in the length of the delta chain per-se, but may only want to know if it is a delta against another object or is stored as a deflated data. Calling packed_object_info_detail() involves walking the reverse index chain to compute the store size of the object and is unnecessarily expensive. We could resurrect the code if a new caller wants to know, but I doubt it.	2011-05-19 14:22:47 -07:00
Junio C Hamano	b9a62cbeb9	packed_object_info_detail(): do not return a string Instead return an integer that can be given to typename() if the caller wants a string, just like everybody else does. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-16 22:13:34 -07:00
Junio C Hamano	02071b27f1	Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streaming * jc/convert: convert: make it harder to screw up adding a conversion attribute convert: make it safer to add conversion attributes convert: give saner names to crlf/eol variables, types and functions convert: rename the "eol" global variable to "core_eol" * jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag * jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h	2011-05-15 16:30:13 -07:00
Junio C Hamano	f4e516834e	git_open_noatime(): drop unused parameter Since commit `c793430` (Limit file descriptors used by packs, 2011-02-28), the extra parameter added in `f2e872aa` (Work around EMFILE when there are too many pack files, 2010-11-01) is not used anymore. Remove it. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn O. Pearce <spearce@spearce.org>	2011-05-15 15:24:52 -07:00
Junio C Hamano	ccf5ace0dc	sha1_file: typofix The number zero is spelled "zero", not "zer0". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-15 15:24:36 -07:00
Junio C Hamano	5bf29b9500	read_sha1_file(): allow selective bypassing of replacement mechanism The way "object replacement" mechanism was tucked to the read_sha1_file() interface was suboptimal in a couple of ways: - Callers that want it to die with useful diagnosis upon seeing a corrupt object does not have a way to say that they do not want any object replacement. - Callers who do not want it to die but want to handle the errors themselves are told to arrange to call read_object(), but the function does not use the replacement mechanism, and also it is a file scope static function that not many callers can call to begin with. This adds a read_sha1_file_extended() that takes a set of flags; the callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask for object replacement mechanism to kick in. Later, we could add another flag bit to tell the function to return an error instead of dying and then remove the misguided "call read_object() yourself". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-15 15:23:34 -07:00
Junio C Hamano	4bbf5a2615	read_sha1_file(): get rid of read_sha1_file_repl() madness Most callers want to silently get a replacement object, and they do not care what the real name of the replacement object is. Worse yet, no sane interface to return the underlying object without replacement is provided. Remove the function and make only the few callers that want the name of the replacement object find it themselves. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-15 15:23:33 -07:00
Junio C Hamano	4dd1fbc7b1	Bigfile: teach "git add" to send a large file straight to a pack When adding a new content to the repository, we have always slurped the blob in its entirety in-core first, and computed the object name and compressed it into a loose object file. Handling large binary files (e.g. video and audio asset for games) has been problematic because of this design. At the middle level of "git add" callchain is an internal API index_fd() that takes an open file descriptor to read from the working tree file being added with its size. Teach it to call out to fast-import when adding a large blob. The write-out codepath in entry.c::write_entry() should be taught to stream, instead of reading everything in core. This should not be so hard to implement, especially if we limit ourselves only to loose object files and non-delta representation in packfiles. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-13 16:11:18 -07:00
Junio C Hamano	7b41e1e15b	index_fd(): split into two helper functions Split out the case where we do not know the size of the input (hence we read everything into a strbuf before doing anything) to index_pipe(), and the other case where we mmap or read the whole data to index_bulk(). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-09 11:58:19 -07:00
Junio C Hamano	c4ce46fc7a	index_fd(): turn write_object and format_check arguments into one flag The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-09 11:58:19 -07:00
Jim Meyering	0353a0c4ec	remove doubled words, e.g., s/to to/to/, and fix related typos I found that some doubled words had snuck back into projects from which I'd already removed them, so now there's a "syntax-check" makefile rule in gnulib to help prevent recurrence. Running the command below spotted a few in git, too: git ls-files \| xargs perl -0777 -n \ -e 'while (/\b(then?\|[iao]n\|i[fst]\|but\|f?or\|at\|and\|[dt])\s+\1\b/gims)' \ -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \ -e 'print "$ARGV:$n:$v\n"}' Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-04-13 11:59:11 -07:00
Junio C Hamano	ad7bb2f68c	Merge branch 'jc/maint-rerere-in-workdir' * jc/maint-rerere-in-workdir: rerere: make sure it works even in a workdir attached to a young repository	2011-03-26 20:13:16 -07:00
Junio C Hamano	90a6464b4a	rerere: make sure it works even in a workdir attached to a young repository The git-new-workdir script in contrib/ makes a new work tree by sharing many subdirectories of the .git directory with the original repository. When rerere.enabled is set in the original repository, but the user has not encountered any conflicts yet, the original repository may not yet have .git/rr-cache directory. When rerere wants to run in a new work tree created from such a young original repository, it fails to mkdir(2) .git/rr-cache that is a symlink to a yet-to-be-created directory. There are three possible approaches to this: - A naive solution is not to create a symlink in the git-new-workdir script to a directory the original does not have (yet). This is not a solution, as we tend to lazily create subdirectories of .git/, and having rerere.enabled configuration set is a strong indication that the user _wants_ to have this lazy creation to happen; - We could always create .git/rr-cache upon repository creation. This is tempting but will not help people with existing repositories. - Detect this case by seeing that mkdir(2) failed with EEXIST, checking that the path is a symlink, and try running mkdir(2) on the link target. This patch solves the issue by doing the third one. Strictly speaking, this is incomplete. It does not attempt to handle relative symbolic link that points into the original repository, but this is good enough to help people who use contrib/workdir/git-new-workdir script. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-03-23 16:05:44 -07:00
Junio C Hamano	3ed8868474	Merge branch 'jn/maint-c99-format' * jn/maint-c99-format: unbreak and eliminate NO_C99_FORMAT mktag: avoid %td in format string	2011-03-23 14:55:46 -07:00
Jonathan Nieder	28bd70d811	unbreak and eliminate NO_C99_FORMAT In the spirit of v1.5.0.2~21 (Check for PRIuMAX rather than NO_C99_FORMAT in fast-import.c, 2007-02-20), use PRIuMAX from git-compat-util.h on all platforms instead of C99-specific formats like %zu with dangerous fallbacks to %u or %lu. So now C99-challenged platforms can build git without provoking warnings or errors from printf, even if pointers do not have the same size as an int or long. The need for a fallback PRIuMAX is detected in git-compat-util.h with "#ifndef PRIuMAX". So while at it, simplify the Makefile and configure script by eliminating the NO_C99_FORMAT knob altogether. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-03-17 15:30:49 -07:00
Junio C Hamano	674ef90904	Merge branch 'sp/maint-fd-limit' * sp/maint-fd-limit: sha1_file.c: Don't retain open fds on small packs mingw: add minimum getrlimit() compatibility stub Limit file descriptors used by packs	2011-03-15 14:22:23 -07:00
Shawn O. Pearce	d131b7afea	sha1_file.c: Don't retain open fds on small packs If a pack file is small enough that its entire contents fits within one mmap window, mmap the file and then immediately close its file descriptor. This reduces the number of file descriptors that are needed to read from repositories with many tiny pack files, such as one that has received 1000 pushes (and created 1000 small pack files) since its last repack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-03-02 11:25:30 -08:00
Shawn O. Pearce	c7934306d1	Limit file descriptors used by packs Rather than using 'errno == EMFILE' after a failed open() call to indicate the process is out of file descriptors and an LRU pack window should be closed, place a hard upper limit on the number of open packs based on the actual rlimit of the process. By using a hard upper limit that is below the rlimit of the current process it is not necessary to check for EMFILE on every single fd-allocating system call. Instead reserving 25 file descriptors makes it safe to assume the system call won't fail due to being over the filedescriptor limit. Here 25 is chosen as a WAG, but considers 3 for stdin/stdout/stderr, and at least a few for other Git code to operate on temporary files. An additional 20 is reserved as it is not known what the C library needs to perform other services on Git's behalf, such as nsswitch or name resolution. This fixes a case where running `git gc --auto` in a repository with more than 1024 packs (but an rlimit of 1024 open fds) fails due to the temporary output file not being able to allocate a file descriptor. The output file is opened by pack-objects after object enumeration and delta compression are done, both of which have already opened all of the packs and fully populated the file descriptor table. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-28 13:08:31 -08:00
Junio C Hamano	fc7ae9c156	Merge branch 'nd/hash-object-sanity' * nd/hash-object-sanity: Make hash-object more robust against malformed objects Conflicts: cache.h	2011-02-27 21:58:30 -08:00
Jonathan Nieder	dab0d4108d	correct type of EMPTY_TREE_SHA1_BIN Functions such as hashcmp that expect a binary SHA-1 value take parameters of type "unsigned char *" to avoid accepting a textual SHA-1 passed by mistake. Unfortunately, this means passing the string literal EMPTY_TREE_SHA1_BIN requires an ugly cast. Tweak the definition of EMPTY_TREE_SHA1_BIN to produce a value of more convenient type. In the future the definition might change to extern const unsigned char empty_tree_sha1_bin[20]; #define EMPTY_TREE_SHA1_BIN empty_tree_sha1_bin Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-14 10:48:06 -08:00
Nguyễn Thái Ngọc Duy	c4d9986f5f	sha1_object_info: examine cached_object store too Cached object store was added in `d66b37b` (Add pretend_sha1_file() interface. - 2007-02-04) as a way to temporarily inject some objects to object store. But only read_sha1_file() knows about this store. While it will return an object from this store, sha1_object_info() will happily say "object not found". Teach sha1_object_info() about the cached store for consistency. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-07 15:05:48 -08:00
Nguyễn Thái Ngọc Duy	c597ba8010	sha1_file.c: move find_cached_object up so sha1_object_info can use it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-07 15:05:46 -08:00
Nguyễn Thái Ngọc Duy	c879daa237	Make hash-object more robust against malformed objects Commits, trees and tags have structure. Don't let users feed git with malformed ones. Sooner or later git will die() when encountering them. Note that this patch does not check semantics. A tree that points to non-existent objects is perfectly OK (and should be so, users may choose to add commit first, then its associated tree for example). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-07 15:05:25 -08:00
Björn Steinbrink	25f3af3f9d	Correctly report corrupted objects The errno check added in commit `3ba7a06` "A loose object is not corrupt if it cannot be read due to EMFILE" only checked for whether errno is not ENOENT and thus incorrectly treated "no error" as an error condition. Because of that, it never reached the code path that would report that the object is corrupted and instead caused funny errors like: fatal: failed to read object 333c4768ce595793fdab1ef3a036413e2a883853: Success So we have to extend the check to cover the case in which the object file was successfully read, but its contents are corrupted. Reported-by: Will Palmer <wmpalmer@gmail.com> Signed-off-by: BjÃ¶rn Steinbrink <B.Steinbrink@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-01-20 13:18:51 -08:00
Junio C Hamano	39f04dbaac	Merge branch 'jn/thinner-wrapper' * jn/thinner-wrapper: Remove pack file handling dependency from wrapper.o pack-objects: mark file-local variable static wrapper: give zlib wrappers their own translation unit strbuf: move strbuf_branchname to sha1_name.c path helpers: move git_mkstemp* to wrapper.c wrapper: move odb_* to environment.c wrapper: move xmmap() to sha1_file.c	2010-12-03 16:13:06 -08:00
Jonathan Nieder	e050029385	Remove pack file handling dependency from wrapper.o As v1.7.0-rc0~43 (slim down "git show-index", 2010-01-21) explains, use of xmalloc() brings in a dependency on zlib, the sha1 lib, and the rest of git's object file access machinery via try_to_free_pack_memory. That is overkill when xmalloc is just being used as a convenience wrapper to exit when no memory is available. So defer setting try_to_free_pack_memory as try_to_free_routine until the first packfile is opened in add_packed_git(). After this change, a simple program using xmalloc() and no other functions will not pull in any code from libgit.a aside from wrapper.o and usage.o. Improved-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-10 11:11:07 -08:00
Jonathan Nieder	58ecbd5ede	wrapper: move xmmap() to sha1_file.c wrapper.o depends on sha1_file.o for a number of reasons. One is release_pack_memory(). xmmap function calls mmap, discarding unused pack windows when necessary to relieve memory pressure. Simple git programs using wrapper.o as a friendly libc do not need this functionality. So move xmmap to sha1_file.o, where release_pack_memory() is. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-10 11:03:13 -08:00
Shawn O. Pearce	f2e872aa5e	Work around EMFILE when there are too many pack files When opening any files in the object database, release unused pack windows if the open(2) syscall fails due to EMFILE (too many open files in this process). This allows Git to degrade gracefully on a repository with thousands of pack files, and a commit stored in a loose object in the middle of the history. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-03 10:21:46 -07:00
Shawn O. Pearce	4865d2b662	Use git_open_noatime when accessing pack data This utility function avoids an unnecessary update of the access time for a loose object file. Just as the atime isn't useful on a loose object, its not useful on the pack or the corresonding idx file. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-03 09:25:58 -07:00
Junio C Hamano	3ba7a06552	A loose object is not corrupt if it cannot be read due to EMFILE "git fsck" bails out with a claim that a loose object that cannot be read but exists on the filesystem to be corrupt, which is wrong when read_object() failed due to e.g. EMFILE. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-03 09:24:57 -07:00
Junio C Hamano	b6c4ceccb3	read_sha1_file(): report correct name of packfile with a corrupt object Clarify the error reporting logic by moving the normal codepath (i.e. we read the object we wanted to read correctly) up and return early. The logic to report the name of the packfile with a corrupt object, introduced by `e8b15e6` (sha1_file: Show the the type and path to corrupt objects, 2010-06-10), was totally bogus. The function that knows which bad object came from what packfile is has_packed_and_bad(); make it report which packfile the problem was found. "Corrupt" is already an adjective, e.g. an object is "corrupt"; we do not have to say "corrupted object". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-03 09:24:47 -07:00
Ævar Arnfjörð Bjarmason	e8b15e6156	sha1_file: Show the the type and path to corrupt objects Change the error message that's displayed when we encounter corrupt objects to be more specific. We now print the type (loose or packed) of corrupted objects, along with the full path to the file in question. Before: $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df fatal: object 909ef997367880aaf2133bafa1f1a71aa28e09df is corrupted After: $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df fatal: loose object 909ef997367880aaf2133bafa1f1a71aa28e09df (stored in .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df) is corrupted Knowing the path helps to quickly analyze what's wrong: $ file .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df: empty Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-07-14 15:35:12 -07:00
Junio C Hamano	e391fdfc69	Merge branch 'jk/maint-sha1-file-name-fix' * jk/maint-sha1-file-name-fix: remove over-eager caching in sha1_file_name	2010-06-13 11:22:00 -07:00
Jeff King	560fb6a183	remove over-eager caching in sha1_file_name This function takes a sha1 and produces a loose object filename. It caches the location of the object directory so that it can fill the sha1 information directly without allocating a new buffer (and in its original incarnation, without calling getenv(), though these days we cache that with the code in environment.c). This cached base directory can become stale, however, if in a single process git changes the location of the object directory (e.g., by running setup_work_tree, which will chdir to the new worktree). In most cases this isn't a problem, because we tend to set up the git repository location and do any chdir()s before actually looking up any objects, so the first lookup will cache the correct location. In the case of reset --hard, however, we do something like: 1. look up the commit object 2. notice we are doing --hard, run setup_work_tree 3. look up the tree object to reset Step (3) fails because our cache object directory value is bogus. This patch simply removes the caching. We use a static buffer instead of allocating one each time (the original version treated the malloc'd buffer as a static, so there is no change in calling semantics). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-25 09:21:28 -07:00
Junio C Hamano	035bf8d7c4	Merge branch 'sp/maint-dumb-http-pack-reidx' * sp/maint-dumb-http-pack-reidx: http.c::new_http_pack_request: do away with the temp variable filename http-fetch: Use temporary files for pack-*.idx until verified http-fetch: Use index-pack rather than verify-pack to check packs Allow parse_pack_index on temporary files Extract verify_pack_index for reuse from verify_pack Introduce close_pack_index to permit replacement http.c: Remove unnecessary strdup of sha1_to_hex result http.c: Don't store destination name in request structures http.c: Drop useless != NULL test in finish_http_pack_request http.c: Tiny refactoring of finish_http_pack_request t5550-http-fetch: Use subshell for repository operations http.c: Remove bad free of static block	2010-05-21 04:02:19 -07:00
Junio C Hamano	636e87d705	Merge branch 'maint' * maint: Documentation/gitdiffcore: fix order in pickaxe description Documentation: fix minor inconsistency Documentation: rebase -i ignores options passed to "git am" hash_object: correction for zero length file	2010-05-18 22:39:56 -07:00
Dmitry Potapov	08bda2085c	hash_object: correction for zero length file The check whether size is zero was done after if size <= SMALL_FILE_SIZE, as result, zero size case was never triggered. Instead zero length file was treated as any other small file. This did not caused any problem, but if we have a special case for size equal to zero, it is better to make it work and avoid redundant malloc(). Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-18 21:46:36 -07:00
Shawn O. Pearce	7b64469a36	Allow parse_pack_index on temporary files The easiest way to verify a pack index is to open it through the standard parse_pack_index function, permitting the header check to happen when the file is mapped. However, the dumb HTTP client needs to verify a pack index before its moved into its proper file name within the objects/pack directory, to prevent a corrupt index from being made available. So permit the caller to specify the exact path of the index file. For now we're still using the final destination name within the sole call site in http.c, but eventually we will start to parse the temporary path instead. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-04-19 17:56:17 -07:00
Shawn O. Pearce	fa5fc15d6e	Introduce close_pack_index to permit replacement By closing the pack index, a caller can later overwrite the index with an updated index file, possibly after converting from v1 to the v2 format. Because p->index_data is NULL after close, on the next access the index will be opened again and the other members will be updated with new data. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-04-19 17:56:08 -07:00
Jeff King	40d52ff77b	make commit_tree a library function Until now, this has been part of the commit-tree builtin. However, it is already used by other builtins (like commit, merge, and notes), and it would be useful to access it from library code. The check_valid helper has to come along, too, but is given a more library-ish name of "assert_sha1_type". Otherwise, the code is unchanged. There are still a few rough edges for a library function, like printing the utf8 warning to stderr, but we can address those if and when they come up as inappropriate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-04-01 23:53:54 -07:00
Jeff King	c00e657df2	fix const-correctness of write_sha1_file These should take const buffers as input data, but zlib's next_in pointer is not const-correct. Let's fix it at the zlib level, though, so the cast happens in one obvious place. This should be safe, as a similar cast is used in zlib's example code for a const array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-04-01 23:49:03 -07:00
Junio C Hamano	493e433277	Merge branch 'mm/mkstemps-mode-for-packfiles' into maint * mm/mkstemps-mode-for-packfiles: Use git_mkstemp_mode instead of plain mkstemp to create object files git_mkstemps_mode: don't set errno to EINVAL on exit. Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later. git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument. Move gitmkstemps to path.c Add a testcase for ACL with restrictive umask.	2010-03-08 00:36:00 -08:00
Junio C Hamano	c2b456b895	Merge branch 'nd/root-git' * nd/root-git: Add test for using Git at root of file system Support working directory located at root Move offset_1st_component() to path.c init-db, rev-parse --git-dir: do not append redundant slash make_absolute_path(): Do not append redundant slash Conflicts: setup.c sha1_file.c	2010-03-07 12:47:15 -08:00
Junio C Hamano	87912fd617	Merge branch 'mm/mkstemps-mode-for-packfiles' * mm/mkstemps-mode-for-packfiles: Use git_mkstemp_mode instead of plain mkstemp to create object files git_mkstemps_mode: don't set errno to EINVAL on exit. Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later. git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument. Move gitmkstemps to path.c Add a testcase for ACL with restrictive umask.	2010-03-07 12:47:14 -08:00
Junio C Hamano	780fc9a0a6	Merge branch 'dp/read-not-mmap-small-loose-object' into maint * dp/read-not-mmap-small-loose-object: hash-object: don't use mmap() for small files	2010-03-04 22:26:17 -08:00
Junio C Hamano	34c014d13e	Merge branch 'np/compress-loose-object-memsave' * np/compress-loose-object-memsave: sha1_file: be paranoid when creating loose objects sha1_file: don't malloc the whole compressed result when writing out objects	2010-03-02 12:44:09 -08:00
Matthieu Moy	5256b00631	Use git_mkstemp_mode instead of plain mkstemp to create object files We used to unnecessarily give the read permission to group and others, regardless of the umask, which isn't serious because the objects are still protected by their containing directory, but isn't necessary either. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-22 15:24:46 -08:00
Nicolas Pitre	748af44c63	sha1_file: be paranoid when creating loose objects We don't want the data being deflated and stored into loose objects to be different from what we expect. While the deflated data is protected by a CRC which is good enough for safe data retrieval operations, we still want to be doubly sure that the source data used at object creation time is still what we expected once that data has been deflated and its CRC32 computed. The most plausible data corruption may occur if the source file is modified while Git is deflating and writing it out in a loose object. Or Git itself could have a bug causing memory corruption. Or even bad RAM could cause trouble. So it is best to make sure everything is coherent and checksum protected from beginning to end. To do so we compute the SHA1 of the data being deflated _after_ the deflate operation has consumed that data, and make sure it matches with the expected SHA1. This way we can rely on the CRC32 checked by the inflate operation to provide a good indication that the data is still coherent with its SHA1 hash. One pathological case we ignore is when the data is modified before (or during) deflate call, but changed back before it is hashed. There is some overhead of course. Using 'git add' on a set of large files: Before: real 0m25.210s user 0m23.783s sys 0m1.408s After: real 0m26.537s user 0m25.175s sys 0m1.358s The overhead is around 5% for full data coherency guarantee. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-21 22:33:25 -08:00
Dmitry Potapov	ea68b0ce9f	hash-object: don't use mmap() for small files Using read() instead of mmap() can be 39% speed up for 1Kb files and is 1% speed up 1Mb files. For larger files, it is better to use mmap(), because the difference between is not significant, and when there is not enough memory, mmap() performs much better, because it avoids swapping. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-21 11:39:10 -08:00
Nicolas Pitre	9892bebafe	sha1_file: don't malloc the whole compressed result when writing out objects There is no real advantage to malloc the whole output buffer and deflate the data in a single pass when writing loose objects. That is like only 1% faster while using more memory, especially with large files where memory usage is far more. It is best to deflate and write the data out in small chunks reusing the same memory instead. For example, using 'git add' on a few large files averaging 40 MB ... Before: 21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+828040outputs (0major+142640minor)pagefaults 0swaps After: 21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+828040outputs (0major+104408minor)pagefaults 0swaps While the runtime stayed relatively the same, the number of minor page faults went down significantly. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-21 11:36:23 -08:00
Nguyễn Thái Ngọc Duy	4bb43de259	Move offset_1st_component() to path.c The implementation is also lightly modified to use is_dir_sep() instead of hardcoding '/'. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-16 08:54:34 -08:00
Junio C Hamano	a0075d9e6a	Merge branch 'il/maint-xmallocz' * il/maint-xmallocz: Fix integer overflow in unpack_compressed_entry() Fix integer overflow in unpack_sha1_rest() Fix integer overflow in patch_delta() Add xmallocz()	2010-01-27 14:56:38 -08:00
Ilari Liusvaara	4ab07e4d10	Fix integer overflow in unpack_compressed_entry() Signed-off-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-26 13:00:16 -08:00
Ilari Liusvaara	3aee68aa68	Fix integer overflow in unpack_sha1_rest() [jc: later NUL termination by the caller becomes unnecessary] Signed-off-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-26 13:00:10 -08:00
Linus Torvalds	a5031214c4	slim down "git show-index" As the documentation says, this is primarily for debugging, and in the longer term we should rename it to test-show-index or something. In the meantime, just avoid xmalloc (which slurps in the rest of git), and separating out the trivial hex functions into "hex.o". This results in [torvalds@nehalem git]$ size git-show-index text data bss dec hex filename 222818 2276 112688 337782 52776 git-show-index (before) 5696 624 1264 7584 1da0 git-show-index (after) which is a whole lot better. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-21 20:03:45 -08:00
Junio C Hamano	356521ab22	sha1_file.c: remove unused function has_pack_file() is not used anywhere. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-12 01:06:09 -08:00
Junio C Hamano	39eea7bdd9	Fix incorrect error check while reading deflated pack data The loop in get_size_from_delta() feeds a deflated delta data from the pack stream _until_ we get inflated result of 20 bytes[] or we reach the end of stream. Side note. This magic number 20 does not have anything to do with the size of the hash we use, but comes from `1a3b55c` (reduce delta head inflated size, 2006-10-18). The loop reads like this: do { in = use_pack(); stream.next_in = in; st = git_inflate(&stream, Z_FINISH); curpos += stream.next_in - in; } while ((st == Z_OK \|\| st == Z_BUF_ERROR) && stream.total_out < sizeof(delta_head)); This git_inflate() can return: - Z_STREAM_END, if use_pack() fed it enough input and the delta itself was smaller than 20 bytes; - Z_OK, when some progress has been made; - Z_BUF_ERROR, if no progress is possible, because we either ran out of input (due to corrupt pack), or we ran out of output before we saw the end of the stream. The fix `b3118bd` (sha1_file: Fix infinite loop when pack is corrupted, 2009-10-14) attempted was against a corruption that appears to be a valid stream that produces a result larger than the output buffer, but we are not even trying to read the stream to the end in this loop. If avail_out becomes zero, total_out will be the same as sizeof(delta_head) so the loop will terminate without the "fix". There is no fix from `b3118bd` needed for this loop, in other words. The loop in unpack_compressed_entry() is quite a different story. It feeds a deflated stream (either delta or base) and allows the stream to produce output up to what we expect but no more. do { in = use_pack(); stream.next_in = in; st = git_inflate(&stream, Z_FINISH); curpos += stream.next_in - in; } while (st == Z_OK \|\| st == Z_BUF_ERROR) This _does_ risk falling into an endless interation, as we can exhaust avail_out if the length we expect is smaller than what the stream wants to produce (due to pack corruption). In such a case, avail_out will become zero and inflate() will return Z_BUF_ERROR, while avail_in may (or may not) be zero. But this is not a right fix: do { in = use_pack(); stream.next_in = in; st = git_inflate(&stream, Z_FINISH); + if (st == Z_BUF_ERROR && (stream.avail_in \|\| !stream.avail_out) + break; / wants more input??? */ curpos += stream.next_in - in; } while (st == Z_OK \|\| st == Z_BUF_ERROR) as Z_BUF_ERROR from inflate() may be telling us that avail_in has also run out before reading the end of stream marker. In such a case, both avail_in and avail_out would be zero, and the loop should iterate to allow the end of stream marker to be seen by inflate from the input stream. The right fix for this loop is likely to be to increment the initial avail_out by one (we allocate one extra byte to terminate it with NUL anyway, so there is no risk to overrun the buffer), and break out if we see that avail_out has become zero, in order to detect that the stream wants to produce more than what we expect. After the loop, we have a check that exactly tests this condition: if ((st != Z_STREAM_END) \|\| stream.total_out != size) { free(buffer); return NULL; } So here is a patch (without my previous botched attempts) to fix this issue. The first hunk reverts the corresponding hunk from `b3118bd`, and the second hunk is the same fix proposed earlier. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-10-21 23:19:47 -07:00
Shawn O. Pearce	b3118bdc91	sha1_file: Fix infinite loop when pack is corrupted Some types of corruption to a pack may confuse the deflate stream which stores an object. In Andy's reported case a 36 byte region of the pack was overwritten, leading to what appeared to be a valid deflate stream that was trying to produce a result larger than our allocated output buffer could accept. Z_BUF_ERROR is returned from inflate() if either the input buffer needs more input bytes, or the output buffer has run out of space. Previously we only considered the former case, as it meant we needed to move the stream's input buffer to the next window in the pack. We now abort the loop if inflate() returns Z_BUF_ERROR without consuming the entire input buffer it was given, or has filled the entire output buffer but has not yet returned Z_STREAM_END. Either state is a clear indicator that this loop is not working as expected, and should not continue. This problem cannot occur with loose objects as we open the entire loose object as a single buffer and treat Z_BUF_ERROR as an error. Reported-by: Andy Isaacson <adi@hexapodia.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-10-14 13:39:37 -07:00
Junio C Hamano	f00ecbe42b	Merge branch 'cc/replace' * cc/replace: t6050: check pushing something based on a replaced commit Documentation: add documentation for "git replace" Add git-replace to .gitignore builtin-replace: use "usage_msg_opt" to give better error messages parse-options: add new function "usage_msg_opt" builtin-replace: teach "git replace" to actually replace Add new "git replace" command environment: add global variable to disable replacement mktag: call "check_sha1_signature" with the replacement sha1 replace_object: add a test case object: call "check_sha1_signature" with the replacement sha1 sha1_file: add a "read_sha1_file_repl" function replace_object: add mechanism to replace objects found in "refs/replace/" refs: add a "for_each_replace_ref" function	2009-08-21 18:47:53 -07:00
Pierre Habouzit	f630cfda88	refactor: use bitsizeof() instead of 8 * sizeof() Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-07-22 21:57:41 -07:00
Junio C Hamano	dd787c19c4	Merge branch 'tr/die_errno' * tr/die_errno: Use die_errno() instead of die() when checking syscalls Convert existing die(..., strerror(errno)) to die_errno() die_errno(): double % in strerror() output just in case Introduce die_errno() that appends strerror(errno) to die()	2009-07-06 09:39:46 -07:00
Thomas Rast	d824cbba02	Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-06-27 11:14:53 -07:00
Linus Torvalds	48fb7deb5b	Fix big left-shifts of unsigned char Shifting 'unsigned char' or 'unsigned short' left can result in sign extension errors, since the C integer promotion rules means that the unsigned char/short will get implicitly promoted to a signed 'int' due to the shift (or due to other operations). This normally doesn't matter, but if you shift things up sufficiently, it will now set the sign bit in 'int', and a subsequent cast to a bigger type (eg 'long' or 'unsigned long') will now sign-extend the value despite the original expression being unsigned. One example of this would be something like unsigned long size; unsigned char c; size += c << 24; where despite all the variables being unsigned, 'c << 24' ends up being a signed entity, and will get sign-extended when then doing the addition in an 'unsigned long' type. Since git uses 'unsigned char' pointers extensively, we actually have this bug in a couple of places. I may have missed some, but this is the result of looking at git grep '[^0-9 ][ ]<<[ ][a-z]' -- '.c' '.h' git grep '<<[ ]24' which catches at least the common byte cases (shifting variables by a variable amount, and shifting by 24 bits). I also grepped for just 'unsigned char' variables in general, and converted the ones that most obviously ended up getting implicitly cast immediately anyway (eg hash_name(), encode_85()). In addition to just avoiding 'unsigned char', this patch also tries to use a common idiom for the delta header size thing. We had three different variations on it: "& 0x7fUL" in one place (getting the sign extension right), and "& ~0x80" and "& 0x7f" in two other places (not getting it right). Apart from making them all just avoid using "unsigned char" at all, I also unified them to then use a simple "& 0x7f". I considered making a sparse extension which warns about doing implicit casts from unsigned types to signed types, but it gets rather complex very quickly, so this is just a hack. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-06-18 09:22:46 -07:00
Christian Couder	f5552aee39	sha1_file: add a "read_sha1_file_repl" function This new function will replace "read_sha1_file". This latter function becoming just a stub to call the former will a NULL "replacement" argument. This new function is needed because sometimes we need to use the replacement sha1. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-05-31 17:02:59 -07:00
Christian Couder	6809557029	replace_object: add mechanism to replace objects found in "refs/replace/" The code implementing this mechanism has been copied more-or-less from the commit graft code. This mechanism is used in "read_sha1_file". sha1 passed to this function that match a ref name in "refs/replace/" are replaced by the sha1 that has been read in the ref. We "die" if the replacement recursion depth is too high or if we can't read the replacement object. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-05-31 17:02:59 -07:00
Junio C Hamano	2c5942dbae	Merge branch 'ar/unlink-err' into maint * ar/unlink-err: print unlink(2) errno in copy_or_link_directory replace direct calls to unlink(2) with unlink_or_warn Introduce an unlink(2) wrapper which gives warning if unlink failed	2009-05-25 19:01:50 -07:00
Junio C Hamano	065b0702f7	Merge branch 'maint' * maint: grep: fix word-regexp colouring completion: use git rev-parse to detect bare repos Cope better with a _lot_ of packs for-each-ref: fix segfault in copy_email	2009-05-20 18:59:09 -07:00
Johannes Schindelin	fd73ccf279	Cope better with a _lot_ of packs You might end up with a situation where you have tons of pack files, e.g. when using hg2git. In this situation, all kinds of operations may end up with a "too many files open" error. Let's recover gracefully from that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Looks-right-to-me-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-05-20 18:23:06 -07:00
Junio C Hamano	36587681b4	Merge branch 'ar/unlink-err' * ar/unlink-err: print unlink(2) errno in copy_or_link_directory replace direct calls to unlink(2) with unlink_or_warn Introduce an unlink(2) wrapper which gives warning if unlink failed	2009-05-18 09:01:06 -07:00
Felipe Contreras	4b25d091ba	Fix a bunch of pointer declarations (codestyle) Essentially; s/type* /type */ as per the coding guidelines. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-05-01 15:17:31 -07:00
Alex Riesen	691f1a28bf	replace direct calls to unlink(2) with unlink_or_warn This helps to notice when something's going wrong, especially on systems which lock open files. I used the following criteria when selecting the code for replacement: - it was already printing a warning for the unlink failures - it is in a function which already printing something or is called from such a function - it is in a static function, returning void and the function is only called from a builtin main function (cmd_) - it is in a function which handles emergency exit (signal handlers) - it is in a function which is obvously cleaning up the lockfiles Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-04-29 18:37:41 -07:00
Johannes Schindelin	348df16679	Rename core.unreliableHardlinks to core.createObject "Unreliable hardlinks" is a misleading description for what is happening. So rename it to something less misleading. Suggested by Linus Torvalds. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-04-29 16:50:07 -07:00
Johannes Schindelin	be66a6c43d	Add an option not to use link(src, dest) && unlink(src) when that is unreliable It seems that accessing NTFS partitions with ufsd (at least on my EeePC) has an unnerving bug: if you link() a file and unlink() it right away, the target of the link() will have the correct size, but consist of NULs. It seems as if the calls are simply not serialized correctly, as single-stepping through the function move_temp_to_file() works flawlessly. As ufsd is "Commertial software" (sic!), I cannot fix it, and have to work around it in Git. At the same time, it seems that this fixes msysGit issues 222 and 229 to assume that Windows cannot handle link() && unlink(). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-04-25 09:49:21 -07:00
Junio C Hamano	03a39a9184	Merge branch 'jc/shared-literally' * jc/shared-literally: t1301: loosen test for forced modes set_shared_perm(): sometimes we know what the final mode bits should look like move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath Move chmod(foo, 0444) into move_temp_to_file() "core.sharedrepository = 0mode" should set, not loosen	2009-04-06 00:42:52 -07:00
Junio C Hamano	3c91bf6805	Merge branch 'jc/maint-1.6.0-keep-pack' * jc/maint-1.6.0-keep-pack: pack-objects: don't loosen objects available in alternate or kept packs t7700: demonstrate repack flaw which may loosen objects unnecessarily Remove --kept-pack-only option and associated infrastructure pack-objects: only repack or loosen objects residing in "local" packs git-repack.sh: don't use --kept-pack-only option to pack-objects t7700-repack: add two new tests demonstrating repacking flaws Conflicts: t/t7700-repack.sh	2009-04-01 22:34:19 -07:00
Junio C Hamano	17e61b8288	set_shared_perm(): sometimes we know what the final mode bits should look like adjust_shared_perm() first obtains the mode bits from lstat(2), expecting to find what the result of applying user's umask is, and then tweaks it as necessary. When the file to be adjusted is created with mkstemp(3), however, the mode thusly obtained does not have anything to do with user's umask, and we would need to start from 0444 in such a case and there is no point running lstat(2) for such a path. This introduces a new API set_shared_perm() to bypass the lstat(2) and instead force setting the mode bits to the desired value directly. adjust_shared_perm() becomes a thin wrapper to the function. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-03-28 08:02:15 -07:00
Junio C Hamano	3be1f18e1b	move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath Now move_temp_to_file() is responsible for doing everything that is necessary to turn a tempfile in $GIT_DIR into its final form, it must make sure "Coda hack" codepath correctly makes the file read-only. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-03-28 08:01:21 -07:00
Johan Herland	fb8b193670	Move chmod(foo, 0444) into move_temp_to_file() When writing out a loose object or a pack (index), move_temp_to_file() is called to finalize the resulting file. These files (loose files and packs) should all have permission mode 0444 (modulo adjust_shared_perm()). Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite (or even forgetting to chmod() at all), do the chmod() call from within move_temp_to_file(). Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-03-27 22:10:58 -07:00
Junio C Hamano	5a688fe470	"core.sharedrepository = 0mode" should set, not loosen This fixes the behaviour of octal notation to how it is defined in the documentation, while keeping the traditional "loosen only" semantics intact for "group" and "everybody". Three main points of this patch are: - For an explicit octal notation, the internal shared_repository variable is set to a negative value, so that we can tell "group" (which is to "OR" in 0660) and 0660 (which is to "SET" to 0660); - git-init did not set shared_repository variable early enough to affect the initial creation of many files, notably copied templates and the configuration. We set it very early when a command-line option specifies a custom value. - Many codepaths create files inside $GIT_DIR by various ways that all involve mkstemp(), and then call move_temp_to_file() to rename it to its final destination. We can add adjust_shared_perm() call here; for the traditional "loosen-only", this would be a no-op for many codepaths because the mode is already loose enough, but with the new behaviour it makes a difference. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-03-27 21:51:04 -07:00
Junio C Hamano	89fbda2425	Merge branch 'maint' * maint: Increase the size of the die/warning buffer to avoid truncation close_sha1_file(): make it easier to diagnose errors avoid possible overflow in delta size filtering computation	2009-03-24 19:45:57 -07:00
Junio C Hamano	b0de555410	Merge branch 'maint-1.6.1' into maint * maint-1.6.1: close_sha1_file(): make it easier to diagnose errors avoid possible overflow in delta size filtering computation	2009-03-24 15:31:21 -07:00
Junio C Hamano	2a5643da73	Merge branch 'maint-1.6.0' into maint-1.6.1 * maint-1.6.0: close_sha1_file(): make it easier to diagnose errors avoid possible overflow in delta size filtering computation	2009-03-24 15:31:15 -07:00
Linus Torvalds	e8bd78c3fc	close_sha1_file(): make it easier to diagnose errors A bug report with "unable to write sha1 file" made us realize that we do not have enough information to guess why close() is failing. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-03-24 14:39:20 -07:00
Brandon Casey	4d6acb7041	Remove --kept-pack-only option and associated infrastructure This option to pack-objects/rev-list was created to improve the -A and -a options of repack. It was found to be lacking in that it did not provide the ability to differentiate between local and non-local kept packs, and found to be unnecessary since objects residing in local kept packs can be filtered out by the --honor-pack-keep option. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-03-20 13:32:33 -07:00
Junio C Hamano	aec813062b	Merge branch 'jc/maint-1.6.0-keep-pack' * jc/maint-1.6.0-keep-pack: is_kept_pack(): final clean-up Simplify is_kept_pack() Consolidate ignore_packed logic more has_sha1_kept_pack(): take "struct rev_info" has_sha1_pack(): refactor "pretend these packs do not exist" interface git-repack: resist stray environment variable	2009-03-11 13:49:56 -07:00
Junio C Hamano	69e020ae00	is_kept_pack(): final clean-up Now is_kept_pack() is just a member lookup into a structure, we can write it as such. Also rewrite the sole caller of has_sha1_kept_pack() to switch on the criteria the callee uses (namely, revs->kept_pack_only) between calling has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not have to take a pointer to struct rev_info as an argument. This removes the header file dependency issue temporarily introduced by the earlier commit, so we revert changes associated to that as well. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-28 01:06:06 -08:00
Junio C Hamano	03a9683d22	Simplify is_kept_pack() This removes --unpacked=<packfile> parameter from the revision parser, and rewrites its use in git-repack to pass a single --kept-pack-only option instead. The new --kept-pack-only option means just that. When this option is given, is_kept_pack() that used to say "not on the --unpacked=<packfile> list" now says "the packfile has corresponding .keep file". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-28 01:06:06 -08:00
Junio C Hamano	386cb77210	Consolidate ignore_packed logic more This refactors three loops that check if a given packfile is on the ignore_packed list into a function is_kept_pack(). The function returns false for a pack on the list, and true for a pack not on the list, because this list is solely used by "git repack" to pass list of packfiles that do not have corresponding .keep files, i.e. a packfile not on the list is "kept". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-28 01:06:06 -08:00
Junio C Hamano	b8431b033f	has_sha1_kept_pack(): take "struct rev_info" Its "ignore_packed" parameter always comes from struct rev_info. This patch makes the function take a pointer to the surrounding structure, so that the refactoring in the next patch becomes easier to review. There is an unfortunate header file dependency and the easiest workaround is to temporarily move the function declaration from cache.h to revision.h; this will be moved back to cache.h once the function loses this "ignore_packed" parameter altogether in the later part of the series. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-28 01:06:06 -08:00
Junio C Hamano	cd673c1f17	has_sha1_pack(): refactor "pretend these packs do not exist" interface Most of the callers of this function except only one pass NULL to its last parameter, ignore_packed. Introduce has_sha1_kept_pack() function that has the function signature and the semantics of this function, and convert the sole caller that does not pass NULL to call this new function. All other callers and has_sha1_pack() lose the ignore_packed parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-28 01:06:06 -08:00
Felipe Contreras	a9d98a148d	sha1_file.c: fix typo it's != its Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-25 00:49:54 -08:00
Junio C Hamano	30aa4fb15f	Merge branch 'maint' * maint: Prepare for 1.6.1.4. Make repack less likely to corrupt repository fast-export: ensure we traverse commits in topological order Clear the delta base cache if a pack is rebuilt Conflicts: RelNotes	2009-02-11 18:47:30 -08:00
Junio C Hamano	7a134dbbc9	Merge branch 'maint-1.6.0' into maint * maint-1.6.0: Make repack less likely to corrupt repository fast-export: ensure we traverse commits in topological order Clear the delta base cache if a pack is rebuilt	2009-02-11 18:32:37 -08:00
Shawn O. Pearce	fa3a0c94dc	Clear the delta base cache if a pack is rebuilt There is some risk that re-opening a regenerated pack file with different offsets could leave stale entries within the delta base cache that could be matched up against other objects using the same "struct packed_git" and pack offset. Throwing away the entire delta base cache in this case is safer, as we don't have to worry about a recycled "struct packed_git" matching to the wrong base object, resulting in delta apply errors while unpacking an object. Suggested-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-11 10:25:24 -08:00
Junio C Hamano	fd8475d9fb	Merge branch 'maint' * maint: Clear the delta base cache during fast-import checkpoint	2009-02-10 21:30:45 -08:00
Junio C Hamano	9b27ea9518	Merge branch 'maint-1.6.0' into maint * maint-1.6.0: Clear the delta base cache during fast-import checkpoint	2009-02-10 15:32:26 -08:00
Shawn O. Pearce	3d20c636af	Clear the delta base cache during fast-import checkpoint Otherwise we may reuse the same memory address for a totally different "struct packed_git", and a previously cached object from the prior occupant might be returned when trying to unpack an object from the new pack. Found-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-10 15:30:59 -08:00
Junio C Hamano	141b6b83d7	Merge branch 'lt/maint-wrap-zlib' into maint * lt/maint-wrap-zlib: Wrap inflate and other zlib routines for better error reporting Conflicts: http-push.c http-walker.c sha1_file.c	2009-02-05 18:01:00 -08:00
Junio C Hamano	8c95d3c31b	Sync with 1.6.1.2 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-29 00:32:52 -08:00
Junio C Hamano	8561b522d7	Merge branch 'maint-1.6.0' into maint * maint-1.6.0: avoid 31-bit truncation in write_loose_object	2009-01-28 23:41:28 -08:00
Jeff King	915308b187	avoid 31-bit truncation in write_loose_object The size of the content we are adding may be larger than 2.1G (i.e., "git add gigantic-file"). Most of the code-path to do so uses size_t or unsigned long to record the size, but write_loose_object uses a signed int. On platforms where "int" is 32-bits (which includes x86_64 Linux platforms), we end up passing malloc a negative size. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-28 23:40:53 -08:00
Junio C Hamano	36dd939393	Merge branch 'lt/maint-wrap-zlib' * lt/maint-wrap-zlib: Wrap inflate and other zlib routines for better error reporting Conflicts: http-push.c http-walker.c sha1_file.c	2009-01-21 16:55:17 -08:00
Christian Couder	c2c5b27051	sha1_file: make "read_object" static This function is only used from "sha1_file.c". And as we want to add a "replace_object" hook in "read_sha1_file", we must not let people bypass the hook using something other than "read_sha1_file". Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-13 00:14:55 -08:00
Linus Torvalds	39c68542fc	Wrap inflate and other zlib routines for better error reporting R. Tyler Ballance reported a mysterious transient repository corruption; after much digging, it turns out that we were not catching and reporting memory allocation errors from some calls we make to zlib. This one _just_ wraps things; it doesn't do the "retry on low memory error" part, at least not yet. It is an independent issue from the reporting. Some of the errors are expected and passed back to the caller, but we die when zlib reports it failed to allocate memory for now. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-11 02:13:06 -08:00
Linus Torvalds	b760d3aa74	Make 'index_path()' use 'strbuf_readlink()' This makes us able to properly index symlinks even on filesystems where st_size doesn't match the true size of the link. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-12-17 13:36:34 -08:00

... 2 3 4 5 6 ...

728 commits