development/git - HydraGit

mirror of https://github.com/git/git synced 2024-07-16 10:38:05 +00:00

Author	SHA1	Message	Date
Nguyễn Thái Ngọc Duy	7bc0dcaa61	sha1_file.c: move delayed getenv(altdb) back to setup_git_env() getenv() is supposed to work on the main repository only. This delayed getenv() code in sha1_file.c makes it more difficult to convert sha1_file.c to a generic object store that could be used by both submodule and main repositories. Move the getenv() back in setup_git_env() where other env vars are also fetched. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-05 11:14:03 -08:00
Junio C Hamano	9238941618	Merge branch 'cc/sha1-file-name' Code clean-up. * cc/sha1-file-name: sha1_file: improve sha1_file_name() perfs sha1_file: remove static strbuf from sha1_file_name()	2018-02-13 13:39:10 -08:00
Junio C Hamano	9bc89b17e3	Merge branch 'tb/crlf-conv-flags' Code clean-up. * tb/crlf-conv-flags: convert_to_git(): safe_crlf/checksafe becomes int conv_flags	2018-02-13 13:39:08 -08:00
Junio C Hamano	867622398f	Merge branch 'gs/retire-mru' Retire mru API as it does not give enough abstraction over underlying list API to be worth it. * gs/retire-mru: mru: Replace mru.[ch] with list.h implementation	2018-02-13 13:39:06 -08:00
Junio C Hamano	f3d618d2bf	Merge branch 'jh/fsck-promisors' In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. * jh/fsck-promisors: gc: do not repack promisor packfiles rev-list: support termination at promisor objects sha1_file: support lazily fetching missing objects introduce fetch-object: fetch one promisor object index-pack: refactor writing of .keep files fsck: support promisor objects as CLI argument fsck: support referenced promisor objects fsck: support refs pointing to promisor objects fsck: introduce partialclone extension extension.partialclone: introduce partial clone extension	2018-02-13 13:39:03 -08:00
Gargi Sharma	ec2dd32c70	mru: Replace mru.[ch] with list.h implementation Replace the custom calls to mru.[ch] with calls to list.h. This patch is the final step in removing the mru API completely and inlining the logic. This patch leads to significant code reduction and the mru API hence, is not a useful abstraction anymore. Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-24 09:52:16 -08:00
Christian Couder	3449847168	sha1_file: improve sha1_file_name() perfs As sha1_file_name() could be performance sensitive, let's make it faster by using strbuf_addstr() and strbuf_addc() instead of strbuf_addf(). Helped-by: Derrick Stolee <stolee@gmail.com> Helped-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-19 13:21:49 -08:00
Christian Couder	ea6577303f	sha1_file: remove static strbuf from sha1_file_name() Using a static buffer in sha1_file_name() is error prone and the performance improvements it gives are not needed in many of the callers. So let's get rid of this static buffer and, if necessary or helpful, let's use one in the caller. Suggested-by: Jeff Hostetler <git@jeffhostetler.com> Helped-by: Kevin Daudt <me@ikke.info> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-17 12:21:32 -08:00
Torsten Bögershausen	8462ff43e4	convert_to_git(): safe_crlf/checksafe becomes int conv_flags When calling convert_to_git(), the checksafe parameter defined what should happen if the EOL conversion (CRLF --> LF --> CRLF) does not roundtrip cleanly. In addition, it also defined if line endings should be renormalized (CRLF --> LF) or kept as they are. checksafe was an safe_crlf enum with these values: SAFE_CRLF_FALSE: do nothing in case of EOL roundtrip errors SAFE_CRLF_FAIL: die in case of EOL roundtrip errors SAFE_CRLF_WARN: print a warning in case of EOL roundtrip errors SAFE_CRLF_RENORMALIZE: change CRLF to LF SAFE_CRLF_KEEP_CRLF: keep all line endings as they are In some cases the integer value 0 was passed as checksafe parameter instead of the correct enum value SAFE_CRLF_FALSE. That was no problem because SAFE_CRLF_FALSE is defined as 0. FALSE/FAIL/WARN are different from RENORMALIZE and KEEP_CRLF. Therefore, an enum is not ideal. Let's use a integer bit pattern instead and rename the parameter to conv_flags to make it more generically usable. This allows us to extend the bit pattern in a subsequent commit. Reported-By: Randall S. Becker <rsbecker@nexbridge.com> Helped-By: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-16 12:35:56 -08:00
Junio C Hamano	97e1f857fc	Merge branch 'ds/for-each-file-in-obj-micro-optim' The code to iterate over loose object files got optimized. * ds/for-each-file-in-obj-micro-optim: sha1_file: use strbuf_add() instead of strbuf_addf()	2017-12-13 13:28:57 -08:00
Junio C Hamano	721cc4314c	Merge branch 'bc/hash-algo' An infrastructure to define what hash function is used in Git is introduced, and an effort to plumb that throughout various codepaths has been started. * bc/hash-algo: repository: fix a sparse 'using integer as NULL pointer' warning Switch empty tree and blob lookups to use hash abstraction Integrate hash algorithm support with repo setup Add structure representing hash algorithm setup: expose enumerated repo info	2017-12-13 13:28:54 -08:00
Jonathan Tan	8b4c0103a9	sha1_file: support lazily fetching missing objects Teach sha1_file to fetch objects from the remote configured in extensions.partialclone whenever an object is requested but missing. The fetching of objects can be suppressed through a global variable. This is used by fsck and index-pack. However, by default, such fetching is not suppressed. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate missing objects (without fetching them) or be more efficient in fetching them. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file.c that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in packfile.c (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and others. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about missing objects - builtin/cat-file - warning message added in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promisor objects - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-12-08 09:52:42 -08:00
Junio C Hamano	79bafd23a8	Merge branch 'jk/fewer-pack-rescan' Internaly we use 0{40} as a placeholder object name to signal the codepath that there is no such object (e.g. the fast-forward check while "git fetch" stores a new remote-tracking ref says "we know there is no 'old' thing pointed at by the ref, as we are creating it anew" by passing 0{40} for the 'old' side), and expect that a codepath to locate an in-core object to return NULL as a sign that the object does not exist. A look-up for an object that does not exist however is quite costly with a repository with large number of packfiles. This access pattern has been optimized. * jk/fewer-pack-rescan: sha1_file: fast-path null sha1 as a missing object everything_local: use "quick" object existence check p5551: add a script to test fetch pack-dir rescans t/perf/lib-pack: use fast-import checkpoint to create packs p5550: factor out nonsense-pack creation	2017-12-06 09:23:42 -08:00
Derrick Stolee	163ee5e635	sha1_file: use strbuf_add() instead of strbuf_addf() Replace use of strbuf_addf() with strbuf_add() when enumerating loose objects in for_each_file_in_obj_subdir(). Since we already check the length and hex-values of the string before consuming the path, we can prevent extra computation by using the lower- level method. One consumer of for_each_file_in_obj_subdir() is the abbreviation code. OID abbreviations use a cached list of loose objects (per object subdirectory) to make repeated queries fast, but there is significant cache load time when there are many loose objects. Most repositories do not have many loose objects before repacking, but in the GVFS case the repos can grow to have millions of loose objects. Profiling 'git log' performance in GitForWindows on a GVFS-enabled repo with ~2.5 million loose objects revealed 12% of the CPU time was spent in strbuf_addf(). Add a new performance test to p4211-line-log.sh that is more sensitive to this cache-loading. By limiting to 1000 commits, we more closely resemble user wait time when reading history into a pager. For a copy of the Linux repo with two ~512 MB packfiles and ~572K loose objects, running 'git log --oneline --parents --raw -1000' had the following performance: HEAD~1 HEAD ---------------------------------------- 7.70(7.15+0.54) 7.44(7.09+0.29) -3.4% Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-12-04 10:38:55 -08:00
Junio C Hamano	af6e0fe3a5	Merge branch 'tb/add-renormalize' "git add --renormalize ." is a new and safer way to record the fact that you are correcting the end-of-line convention and other "convert_to_git()" glitches in the in-repository data. * tb/add-renormalize: add: introduce "--renormalize"	2017-11-27 11:06:37 +09:00
Jeff King	87b5e236a1	sha1_file: fast-path null sha1 as a missing object In theory nobody should ever ask the low-level object code for a null sha1. It's used as a sentinel for "no such object" in lots of places, so leaking through to this level is a sign that the higher-level code is not being careful about its error-checking. In practice, though, quite a few code paths seem to rely on the null sha1 lookup failing as a way to quietly propagate non-existence (e.g., by feeding it to lookup_commit_reference_gently(), which then returns NULL). When this happens, we do two inefficient things: 1. We actually search for the null sha1 in packs and in the loose object directory. 2. When we fail to find it, we re-scan the pack directory in case a simultaneous repack happened to move it from loose to packed. This can be very expensive if you have a large number of packs. Only the second one actually causes noticeable performance problems, so we could treat them independently. But for the sake of simplicity (both of code and of reasoning about it), it makes sense to just declare that the null sha1 cannot be a real on-disk object, and looking it up will always return "no such object". There's no real loss of functionality to do so Its use as a sentinel value means that anybody who is unlucky enough to hit the 2^-160th chance of generating an object with that sha1 is already going to find the object largely unusable. In an ideal world, we'd simply fix all of the callers to notice the null sha1 and avoid passing it to us. But a simple experiment to catch this with a BUG() shows that there are a large number of code paths that do so. So in the meantime, let's fix the performance problem by taking a fast exit from the object lookup when we see a null sha1. p5551 shows off the improvement (when a fetched ref is new, the "old" sha1 is 0{40}, which ends up being passed for fast-forward checks, the status table abbreviations, etc): Test HEAD^ HEAD -------------------------------------------------------- 5551.4: fetch 5.51(5.03+0.48) 0.17(0.10+0.06) -96.9% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-22 10:50:11 +09:00
Junio C Hamano	5a80d1dd9c	Merge branch 'jk/info-alternates-fix' into maint We used to add an empty alternate object database to the system that does not help anything; it has been corrected. * jk/info-alternates-fix: link_alt_odb_entries: make empty input a noop	2017-11-21 14:05:31 +09:00
Torsten Bögershausen	9472935d81	add: introduce "--renormalize" Make it safer to normalize the line endings in a repository. Files that had been commited with CRLF will be commited with LF. The old way to normalize a repo was like this: # Make sure that there are not untracked files $ echo "* text=auto" >.gitattributes $ git read-tree --empty $ git add . $ git commit -m "Introduce end-of-line normalization" The user must make sure that there are no untracked files, otherwise they would have been added and tracked from now on. The new "add --renormalize" does not add untracked files: $ echo "* text=auto" >.gitattributes $ git add --renormalize . $ git commit -m "Introduce end-of-line normalization" Note that "git add --renormalize <pathspec>" is the short form for "git add -u --renormalize <pathspec>". While at it, document that the same renormalization may be needed, whenever a clean filter is added or changed. Helped-By: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-17 10:31:05 +09:00
Junio C Hamano	26a45eac80	Merge branch 'jk/info-alternates-fix' We used to add an empty alternate object database to the system that does not help anything; it has been corrected. * jk/info-alternates-fix: link_alt_odb_entries: make empty input a noop	2017-11-15 12:14:36 +09:00
Jeff King	f28e36686a	link_alt_odb_entries: make empty input a noop If an empty string is passed to link_alt_odb_entries(), our loop finds no entries and we link nothing. But we still do some preparatory work to normalize the object directory path, even though we'll never look at the result. This triggers in basically every git process, since we feed the usually-empty ALTERNATE_DB_ENVIRONMENT to the function. Let's detect early that there's nothing to do and return. While we're at it, let's treat NULL the same as an empty string as a favor to our callers. That saves prepare_alt_odb() from having to cover this case. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-13 14:05:27 +09:00
brian m. carlson	f50e766b7b	Add structure representing hash algorithm Since in the future we want to support an additional hash algorithm, add a structure that represents a hash algorithm and all the data that must go along with it. Add a constant to allow easy enumeration of hash algorithms. Implement function typedefs to create an abstract API that can be used by any hash algorithm, and wrappers for the existing SHA1 functions that conform to this API. Expose a value for hex size as well as binary size. While one will always be twice the other, the two values are both used extremely commonly throughout the codebase and providing both leads to improved readability. Don't include an entry in the hash algorithm structure for the null object ID. As this value is all zeros, any suitably sized all-zero object ID can be used, and there's no need to store a given one on a per-hash basis. The current hash function transition plan envisions a time when we will accept input from the user that might be in SHA-1 or in the NewHash format. Since we cannot know which the user has provided, add a constant representing the unknown algorithm to allow us to indicate that we must look the correct value up. Provide dummy API functions that die in this case. Finally, include git-compat-util.h in hash.h so that the required types are available. This aids people using automated tools their editors. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-13 13:20:44 +09:00
Junio C Hamano	bde1370010	Merge branch 'rs/hex-to-bytes-cleanup' Code cleanup. * rs/hex-to-bytes-cleanup: sha1_file: use hex_to_bytes() http-push: use hex_to_bytes() notes: move hex_to_bytes() to hex.c and export it	2017-11-09 14:31:27 +09:00
Junio C Hamano	e7e456f500	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (25 commits) refs/files-backend: convert static functions to object_id refs: convert read_raw_ref backends to struct object_id refs: convert peel_object to struct object_id refs: convert resolve_ref_unsafe to struct object_id worktree: convert struct worktree to object_id refs: convert resolve_gitlink_ref to struct object_id Convert remaining callers of resolve_gitlink_ref to object_id sha1_file: convert index_path and index_fd to struct object_id refs: convert reflog_expire parameter to struct object_id refs: convert read_ref_at to struct object_id refs: convert peel_ref to struct object_id builtin/pack-objects: convert to struct object_id pack-bitmap: convert traverse_bitmap_commit_list to object_id refs: convert dwim_log to struct object_id builtin/reflog: convert remaining unsigned char uses to object_id refs: convert dwim_ref and expand_ref to struct object_id refs: convert read_ref and read_ref_full to object_id refs: convert resolve_refdup and refs_resolve_refdup to struct object_id Convert check_connected to use struct object_id refs: update ref transactions to use struct object_id ...	2017-11-06 14:24:27 +09:00
Junio C Hamano	0b646bcac9	Merge branch 'ma/lockfile-fixes' An earlier update made it possible to use an on-stack in-core lockfile structure (as opposed to having to deliberately leak an on-heap one). Many codepaths have been updated to take advantage of this new facility. * ma/lockfile-fixes: read_cache: roll back lock in `update_index_if_able()` read-cache: leave lock in right state in `write_locked_index()` read-cache: drop explicit `CLOSE_LOCK`-flag cache.h: document `write_locked_index()` apply: remove `newfd` from `struct apply_state` apply: move lockfile into `apply_state` cache-tree: simplify locking logic checkout-index: simplify locking logic tempfile: fix documentation on `delete_tempfile()` lockfile: fix documentation on `close_lock_file_gently()` treewide: prefer lockfiles on the stack sha1_file: do not leak `lock_file`	2017-11-06 13:11:21 +09:00
René Scharfe	62a24c8923	sha1_file: use hex_to_bytes() The path of a loose object contains its hash value encoded into two substrings of 2 and 38 hexadecimal digits separated by a slash. The first part is handed to for_each_file_in_obj_subdir() in decoded form as subdir_nr. The current code builds a full hexadecimal representation of the hash in a temporary buffer, then uses get_oid_hex() to decode it. Avoid the intermediate step by taking subdir_nr as-is and using hex_to_bytes() directly on the second substring. That's shorter and easier. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-01 10:35:40 +09:00
Junio C Hamano	95c1a79630	Merge branch 'jk/info-alternates-fix' into maint A regression fix for 2.11 that made the code to read the list of alternate object stores overrun the end of the string. * jk/info-alternates-fix: read_info_alternates: warn on non-trivial errors read_info_alternates: read contents into strbuf	2017-10-23 14:40:00 +09:00
Junio C Hamano	96c6bb566e	Merge branch 'jk/write-in-full-fix' into maint Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. * jk/write-in-full-fix: read_pack_header: handle signed/unsigned comparison in read result config: flip return value of store_write_*() notes-merge: use ssize_t for write_in_full() return value pkt-line: check write_in_full() errors against "< 0" convert less-trivial versions of "write_in_full() != len" avoid "write_in_full(fd, buf, len) != len" pattern get-tar-commit-id: check write_in_full() return against 0 config: avoid "write_in_full(fd, buf, len) < len" pattern	2017-10-23 14:37:22 +09:00
Junio C Hamano	eeed979e6a	Merge branch 'jk/sha1-loose-object-info-fix' into maint Leakfix and futureproofing. * jk/sha1-loose-object-info-fix: sha1_loose_object_info: handle errors from unpack_sha1_rest	2017-10-18 14:19:14 +09:00
Junio C Hamano	7c9375db0e	Merge branch 'jk/drop-sha1-entry-pos' into maint Code clean-up. * jk/drop-sha1-entry-pos: sha1-lookup: remove sha1_entry_pos() from header file sha1_file: drop experimental GIT_USE_LOOKUP search	2017-10-18 14:19:06 +09:00
brian m. carlson	a98e6101f0	refs: convert resolve_gitlink_ref to struct object_id Convert the declaration and definition of resolve_gitlink_ref to use struct object_id and apply the following semantic patch: @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3.hash) + resolve_gitlink_ref(E1, E2, &E3) @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3->hash) + resolve_gitlink_ref(E1, E2, E3) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
brian m. carlson	bcd2986473	sha1_file: convert index_path and index_fd to struct object_id Convert these two functions and the functions that underlie them to take pointers to struct object_id. This is a prerequisite to convert resolve_gitlink_ref. Fix a stray tab in the middle of the index_mem call in index_pipe by converting it to a space. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
Junio C Hamano	40abbe4306	Merge branch 'jk/sha1-loose-object-info-fix' Leakfix and futureproofing. * jk/sha1-loose-object-info-fix: sha1_loose_object_info: handle errors from unpack_sha1_rest	2017-10-11 14:52:22 +09:00
Jeff King	b3ea7dd32d	sha1_loose_object_info: handle errors from unpack_sha1_rest When a caller of sha1_object_info_extended() sets the "contentp" field in object_info, we call unpack_sha1_rest() but do not check whether it signaled an error. This causes two problems: 1. We pass back NULL to the caller via the contentp field, but the function returns "0" for success. A caller might reasonably expect after a successful return that it can access contentp without a NULL check and segfault. As it happens, this is impossible to trigger in the current code. There is exactly one caller which uses contentp, read_object(). And the only thing it does after a successful call is to return the content pointer to its caller, using NULL as a sentinel for errors. So in effect it converts the success code from sha1_object_info_extended() back into an error! But this is still worth addressing avoid problems for future users of "contentp". 2. Callers of unpack_sha1_rest() are expected to close the zlib stream themselves on error. Which means that we're leaking the stream. The problem in (1) comes from from `c84a1f3ed4` (sha1_file: refactor read_object, 2017-06-21), which added the contentp field. Before that, we called unpack_sha1_rest() via unpack_sha1_file(), which directly used the NULL to signal an error. But note that the leak in (2) is actually older than that. The original unpack_sha1_file() directly returned the result of unpack_sha1_rest() to its caller, when it should have been closing the zlib stream itself on error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-06 13:04:41 +09:00
Martin Ågren	f132a127ee	sha1_file: do not leak `lock_file` There is no longer any need to allocate and leak a `struct lock_file`. Initialize it on the stack instead. Before this patch, we set `lock = NULL` to signal that we have already rolled back, and that we should not do any more work. We need to take another approach now that we cannot assign NULL. We could, e.g., use `is_lock_file_locked()`. But we already have another variable that we could use instead, `found`. Its scope is only too small. Bump `found` to the scope of the whole function and rearrange the "roll back or write?"-checks to a straightforward if-else on `found`. This also future-proves the code by making it obvious that we intend to take exactly one of these paths. Improved-by: Jeff King <peff@peff.net> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-06 10:07:10 +09:00
Junio C Hamano	cb1083ca23	Merge branch 'jk/read-in-full' Code clean-up to prevent future mistakes by copying and pasting code that checks the result of read_in_full() function. * jk/read-in-full: worktree: check the result of read_in_full() worktree: use xsize_t to access file size distinguish error versus short read from read_in_full() avoid looking at errno for short read_in_full() returns prefer "!=" when checking read_in_full() result notes-merge: drop dead zero-write code files-backend: prefer "0" for write_in_full() error check	2017-10-03 15:42:49 +09:00
Jeff King	90dca6710e	avoid looking at errno for short read_in_full() returns When a caller tries to read a particular set of bytes via read_in_full(), there are three possible outcomes: 1. An error, in which case -1 is returned and errno is set. 2. A short read, in which fewer bytes are returned and errno is unspecified (we never saw a read error, so we may have some random value from whatever syscall failed last). 3. The full read completed successfully. Many callers handle cases 1 and 2 together by just checking the result against the requested size. If their combined error path looks at errno (e.g., by calling die_errno), they may report a nonsense value. Let's fix these sites by having them distinguish between the two error cases. That avoids the random errno confusion, and lets us give more detailed error messages. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-27 15:45:24 +09:00
Junio C Hamano	f759c873a3	Merge branch 'jk/info-alternates-fix' A regression fix for 2.11 that made the code to read the list of alternate object stores overrun the end of the string. * jk/info-alternates-fix: read_info_alternates: warn on non-trivial errors read_info_alternates: read contents into strbuf	2017-09-25 15:24:09 +09:00
Junio C Hamano	c50424a6f0	Merge branch 'jk/write-in-full-fix' Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. * jk/write-in-full-fix: read_pack_header: handle signed/unsigned comparison in read result config: flip return value of store_write_*() notes-merge: use ssize_t for write_in_full() return value pkt-line: check write_in_full() errors against "< 0" convert less-trivial versions of "write_in_full() != len" avoid "write_in_full(fd, buf, len) != len" pattern get-tar-commit-id: check write_in_full() return against 0 config: avoid "write_in_full(fd, buf, len) < len" pattern	2017-09-25 15:24:06 +09:00
Jeff King	f0f7bebef7	read_info_alternates: warn on non-trivial errors When we fail to open $GIT_DIR/info/alternates, we silently assume there are no alternates. This is the right thing to do for ENOENT, but not for other errors. A hard error is probably overkill here. If we fail to read an alternates file then either we'll complete our operation anyway, or we'll fail to find some needed object. Either way, a warning is good idea. And we already have a helper function to handle this pattern; let's just call warn_on_fopen_error(). Note that technically the errno from strbuf_read_file() might be from a read() error, not open(). But since read() would never return ENOENT or ENOTDIR, and since it produces a generic "unable to access" error, it's suitable for handling errors from either. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-20 11:33:29 +09:00
Junio C Hamano	0db625f5d6	Merge branch 'jk/info-alternates-fix-2.11' into jk/info-alternates-fix * jk/info-alternates-fix-2.11: read_info_alternates: read contents into strbuf	2017-09-20 11:33:06 +09:00
Jeff King	dc732bd5cb	read_info_alternates: read contents into strbuf This patch fixes a regression in v2.11.1 where we might read past the end of an mmap'd buffer. It was introduced in `cf3c635210`. The link_alt_odb_entries() function has always taken a ptr/len pair as input. Until `cf3c635210` (alternates: accept double-quoted paths, 2016-12-12), we made a copy of those bytes in a string. But after that commit, we switched to parsing the input left-to-right, and we ignore "len" totally, instead reading until we hit a NUL. This has mostly gone unnoticed for a few reasons: 1. All but one caller passes a NUL-terminated string, with "len" pointing to the NUL. 2. The remaining caller, read_info_alternates(), passes in an mmap'd file. Unless the file is an exact multiple of the page size, it will generally be followed by NUL padding to the end of the page, which just works. The easiest way to demonstrate the problem is to build with: make SANITIZE=address NO_MMAP=Nope test Any test which involves $GIT_DIR/info/alternates will fail, as the mmap emulation (correctly) does not add an extra NUL, and ASAN complains about reading past the end of the buffer. One solution would be to teach link_alt_odb_entries() to respect "len". But it's actually a bit tricky, since we depend on unquote_c_style() under the hood, and it has no ptr/len variant. We could also just make a NUL-terminated copy of the input bytes and operate on that. But since all but one caller already is passing a string, instead let's just fix that caller to provide NUL-terminated input in the first place, by swapping out mmap for strbuf_read_file(). There's no advantage to using mmap on the alternates file. It's not expected to be large (and anyway, we're copying its contents into an in-memory linked list). Nor is using git_open() buying us anything here, since we don't keep the descriptor open for a long period of time. Let's also drop the "len" parameter entirely from link_alt_odb_entries(), since it's completely ignored. That will avoid any new callers re-introducing a similar bug. Reported-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-20 11:32:04 +09:00
Junio C Hamano	d811ba1897	Merge branch 'rs/strbuf-leakfix' Many leaks of strbuf have been fixed. * rs/strbuf-leakfix: (34 commits) wt-status: release strbuf after use in wt_longstatus_print_tracking() wt-status: release strbuf after use in read_rebase_todolist() vcs-svn: release strbuf after use in end_revision() utf8: release strbuf on error return in strbuf_utf8_replace() userdiff: release strbuf after use in userdiff_get_textconv() transport-helper: release strbuf after use in process_connect_service() sequencer: release strbuf after use in save_head() shortlog: release strbuf after use in insert_one_record() sha1_file: release strbuf on error return in index_path() send-pack: release strbuf on error return in send_pack() remote: release strbuf after use in set_url() remote: release strbuf after use in migrate_file() remote: release strbuf after use in read_remote_branches() refs: release strbuf on error return in write_pseudoref() notes: release strbuf after use in notes_copy_from_stdin() merge: release strbuf after use in write_merge_heads() merge: release strbuf after use in save_state() mailinfo: release strbuf on error return in handle_boundary() mailinfo: release strbuf after use in handle_from() help: release strbuf on error return in exec_woman_emacs() ...	2017-09-19 10:47:57 +09:00
Jeff King	f48ecd38cb	read_pack_header: handle signed/unsigned comparison in read result The result of read_in_full() may be -1 if we saw an error. But in comparing it to a sizeof() result, that "-1" will be promoted to size_t. In fact, the largest possible size_t which is much bigger than our struct size. This means that our "< sizeof(header)" error check won't trigger. In practice, we'd go on to read uninitialized memory and compare it to the PACK signature, which is likely to fail. But we shouldn't get there. We can fix this by making a direct "!=" comparison to the requested size, rather than "<". This means that errors get lumped in with short reads, but that's sufficient for our purposes here. There's no PH_ERROR tp represent our case. And anyway, this function reads from pipes and network sockets. A network error may racily appear as EOF to us anyway if there's data left in the socket buffers. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-14 15:18:00 +09:00
Junio C Hamano	f04f860dfa	Merge branch 'sb/sha1-file-cleanup' into maint Code clean-up. * sb/sha1-file-cleanup: sha1_file: make read_info_alternates static	2017-09-10 17:03:04 +09:00
Junio C Hamano	c580ce194f	Merge branch 'rs/find-pack-entry-bisection' into maint Code clean-up. * rs/find-pack-entry-bisection: sha1_file: avoid comparison if no packed hash matches the first byte	2017-09-10 17:03:02 +09:00
Junio C Hamano	438776e3d4	Merge branch 'rs/unpack-entry-leakfix' into maint Memory leak in an error codepath has been plugged. * rs/unpack-entry-leakfix: sha1_file: release delta_stack on error in unpack_entry()	2017-09-10 17:02:53 +09:00
Rene Scharfe	ea8e029785	sha1_file: release strbuf on error return in index_path() strbuf_readlink() already frees the buffer for us on error. Clean up if write_sha1_file() fails as well instead of returning early. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-07 08:49:28 +09:00
Junio C Hamano	8b36f0b196	Merge branch 'po/read-graft-line' Conversion from uchar[20] to struct object_id continues; this is to ensure that we do not assume sizeof(struct object_id) is the same as the length of SHA-1 hash (or length of longest hash we support). * po/read-graft-line: commit: rewrite read_graft_line commit: allocate array using object_id size commit: replace the raw buffer with strbuf in read_graft_line sha1_file: fix definition of null_sha1	2017-09-06 13:11:25 +09:00
Junio C Hamano	eabdcd4ab4	Merge branch 'jt/packmigrate' Code movement to make it easier to hack later. * jt/packmigrate: (23 commits) pack: move for_each_packed_object() pack: move has_pack_index() pack: move has_sha1_pack() pack: move find_pack_entry() and make it global pack: move find_sha1_pack() pack: move find_pack_entry_one(), is_pack_valid() pack: move check_pack_index_ptr(), nth_packed_object_offset() pack: move nth_packed_object_{sha1,oid} pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry() pack: move unpack_object_header() pack: move get_size_from_delta() pack: move unpack_object_header_buffer() pack: move {,re}prepare_packed_git and approximate_object_count pack: move install_packed_git() pack: move add_packed_git() pack: move unuse_pack() pack: move use_pack() pack: move pack-closing functions pack: move release_pack_memory() pack: move open_pack_index(), parse_pack_index() ...	2017-08-26 22:55:09 -07:00
Junio C Hamano	6b8aa3294e	Merge branch 'po/object-id' * po/object-id: sha1_file: convert index_stream to struct object_id sha1_file: convert hash_sha1_file_literally to struct object_id sha1_file: convert index_fd to struct object_id sha1_file: convert index_path to struct object_id read-cache: convert to struct object_id builtin/hash-object: convert to struct object_id	2017-08-26 22:55:07 -07:00

1 2 3 4 5 ...

932 commits