development/git - HydraGit

mirror of https://github.com/git/git synced 2024-11-05 18:59:29 +00:00

Author	SHA1	Message	Date
Junio C Hamano	5ae057d9a8	Merge branch 'es/unpack-trees-oob-fix' into maint The code that tries to skip over the entries for the paths in a single directory using the cache-tree was not careful enough against corrupt index file. * es/unpack-trees-oob-fix: unpack-trees: watch for out-of-range index position	2020-02-14 12:42:28 -08:00
Junio C Hamano	b783391018	Merge branch 'jk/clang-sanitizer-fixes' C pedantry ;-) fix. * jk/clang-sanitizer-fixes: obstack: avoid computing offsets from NULL pointer xdiff: avoid computing non-zero offset from NULL pointer avoid computing zero offsets from NULL pointer merge-recursive: use subtraction to flip stage merge-recursive: silence -Wxor-used-as-pow warning	2020-02-12 12:41:36 -08:00
Derrick Stolee	f998a3f1e5	sparse-checkout: fix cone mode behavior mismatch The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled. When a file path is given to "git sparse-checkout set" in cone mode, then the cone mode improperly matches the file as a recursive path. When setting the skip-worktree bits, files were not expecting the MATCHED_RECURSIVE response, and hence these were left out of the matched cone. Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED and add a test that prevents regression. Reported-by: Finn Bryant <finnbryant@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:05:29 -08:00
Junio C Hamano	043426c8fd	Merge branch 'ds/sparse-cone' The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected. * ds/sparse-cone: .mailmap: fix GGG authoship screwup unpack-trees: correctly compute result count	2020-01-30 14:17:08 -08:00
Jeff King	d20bc01a51	avoid computing zero offsets from NULL pointer The Undefined Behavior Sanitizer in clang-11 seems to have learned a new trick: it complains about computing offsets from a NULL pointer, even if that offset is 0. This causes numerous test failures. For example, from t1090: unpack-trees.c:1355:41: runtime error: applying zero offset to null pointer ... not ok 6 - in partial clone, sparse checkout only fetches needed blobs The code in question looks like this: struct cache_entry **cache_end = cache + nr; ... while (cache != cache_end) and we sometimes pass in a NULL and 0 for "cache" and "nr". This is conceptually fine, as "cache_end" would be equal to "cache" in this case, and we wouldn't enter the loop at all. But computing even a zero offset violates the C standard. And given the fact that UBSan is noticing this behavior, this might be a potential problem spot if the compiler starts making unexpected assumptions based on undefined behavior. So let's just avoid it, which is pretty easy. In some cases we can just switch to iterating with a numeric index (as we do in sequencer.c here). In other cases (like the cache_end one) the use of an end pointer is more natural; we can keep that by just explicitly checking for the NULL/0 case when assigning the end pointer. Note that there are two ways you can write this latter case, checking for the pointer: cache_end = cache ? cache + nr : cache; or the size: cache_end = nr ? cache + nr : cache; For the case of a NULL/0 ptr/len combo, they are equivalent. But writing it the second way (as this patch does) has the property that if somebody were to incorrectly pass a NULL pointer with a non-zero length, we'd continue to notice and segfault, rather than silently pretending the length was zero. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-28 23:12:48 -08:00
Junio C Hamano	36da2a8635	Merge branch 'es/unpack-trees-oob-fix' The code that tries to skip over the entries for the paths in a single directory using the cache-tree was not careful enough against corrupt index file. * es/unpack-trees-oob-fix: unpack-trees: watch for out-of-range index position	2020-01-22 15:07:31 -08:00
Junio C Hamano	a3648c02a2	Merge branch 'en/simplify-check-updates-in-unpack-trees' Code simplification. * en/simplify-check-updates-in-unpack-trees: unpack-trees: exit check_updates() early if updates are not wanted	2020-01-22 15:07:30 -08:00
Matheus Tavares	d7992421e1	submodule-config: add skip_if_read option to repo_read_gitmodules() Currently, submodule-config.c doesn't have an externally accessible function to read gitmodules only if it wasn't already read. But this exact behavior is internally implemented by gitmodules_read_check(), to perform a lazy load. Let's merge this function with repo_read_gitmodules() adding a 'skip_if_read' which allows both internal and external callers to access this functionality. This simplifies a little the code. The added option will also be used in the following patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Derrick Stolee via GitGitGadget	4c6c7971e0	unpack-trees: correctly compute result count The clear_ce_flags_dir() method processes the cache entries within a common directory. The returned int is the number of cache entries processed by that directory. When using the sparse-checkout feature in cone mode, we can skip the pattern matching for entries in the directories that are entirely included or entirely excluded. `eb42feca` (unpack-trees: hash less in cone mode, 2019-11-21) introduced this performance feature. The old mechanism relied on the counts returned by calling clear_ce_flags_1(), but the new mechanism calculated the number of rows by subtracting "cache_end" from "cache" to find the size of the range. However, the equation is wrong because it divides by sizeof(struct cache_entry ). This is not how pointer arithmetic works! A coverity build of Git for Windows in preparation for the 2.25.0 release found this issue with the warning, "Pointer differences, such as cache_end - cache, are automatically scaled down by the size (8 bytes) of the pointed-to type (struct cache_entry ). Most likely, the division by sizeof(struct cache_entry *) is extraneous and should be eliminated." This warning is correct. This leaves us with the question "how did this even work?" The problem that occurs with this incorrect pointer arithmetic is a performance-only bug, and a very slight one at that. Since the entry count returned by clear_ce_flags_dir() is reduced by a factor of 8, the loop in clear_ce_flags_1() will re-process entries from those directories. By inserting global counters into unpack-tree.c and tracing them with trace2_data_intmax() (in a private change, for testing), I was able to see count how many times the loop inside clear_ce_flags_1() processed an entry and how many times clear_ce_flags_dir() was called. Each of these are reduced by at least a factor of 8 with the current change. A factor larger than 8 happens when multiple levels of directories are repeated. Specifically, in the Linux kernel repo, the command git sparse-checkout set LICENSES restricts the working directory to only the files at root and in the LICENSES directory. Here are the measured counts: clear_ce_flags_1 loop blocks: Before: 11,520 After: 1,621 clear_ce_flags_dir calls: Before: 7,048 After: 606 While these are dramatic counts, the time spent in clear_ce_flags_1() is under one millisecond in each case, so the improvement is not measurable as an end-to-end time. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-10 11:34:36 -08:00
Emily Shaffer	573117dfa5	unpack-trees: watch for out-of-range index position It's possible in a case where the index file contains a tree extension but no blobs within that tree exist for index_pos_by_traverse_info() to segfault. If the name_entry passed into index_pos_by_traverse_info() has no blobs inside, AND is alphabetically later than all blobs currently in the index file, index_pos_by_traverse_info() will segfault. For example, an index file which looks something like this: aaa#0 bbb/aaa#0 [Extensions] TREE: zzz In this example, 'index_name_pos(..., "zzz/", ...)' will return '-4', indicating that "zzz/" could be inserted at position 3. However, when the checks which ensure that the insertion position of "zzz/" look for a blob at that position beginning with "zzz/", the index cache is accessed out of range, causing a segfault. This kind of index state is not typically generated during user operations, and is in fact an edge case of the state being checked for in the conditional where it was added. However, since the entry for the BUG() line is ambiguous, tell some additional context to help Git developers debug the failure later. When we know the name of the dir we were trying to look up, it becomes possible to examine the index file in a hex util to determine what went wrong; the position gives a hint about where to start looking. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-08 09:31:18 -08:00
Elijah Newren	26f924d50e	unpack-trees: exit check_updates() early if updates are not wanted check_updates() has a lot of code that repeatedly checks whether o->update or o->dry_run are set. (Note that o->dry_run is a near-synonym for !o->update, but not quite as per commit `2c9078d05b` ("unpack-trees: add the dry_run flag to unpack_trees_options", 2011-05-25).) In fact, this function almost turns into a no-op whenever the condition !o->update \|\| o->dry_run is met. Simplify the code by checking this condition at the beginning of the function, and when it is true, do the few things that are relevant and return early. There are a few things that make the conversion not quite obvious: * The fact that check_updates() does not actually turn into a no-op when updates are not wanted may be slightly surprising. However, commit `33ecf7eb61` (Discard "deleted" cache entries after using them to update the working tree, 2008-02-07) put the discarding of unused cache entries in check_updates() so we still need to keep the call to remove_marked_cache_entries(). It's possible this call belongs in another function, but it is certainly needed as tests will fail if it is removed. * The original called remove_scheduled_dirs() unconditionally. Technically, commit `7847892716` (unlink_entry(): introduce schedule_dir_for_removal(), 2009-02-09) should have made that call conditional, but it didn't matter in practice because remove_scheduled_dirs() becomes a no-op when all the calls to unlink_entry() are skipped. As such, we do not need to call it. * When (o->dry_run && o->update), the original would have two calls to git_attr_set_direction() surrounding a bunch of skipped updates. These two calls to git_attr_set_direction() cancel each other out and thus can be omitted when o->dry_run is true just as they already are when !o->update. * The code would previously call setup_collided_checkout_detection() and report_collided_checkout() even when o->dry_run. However, this was just an expensive no-op because setup_collided_checkout_detection() merely cleared the CE_MATCHED flag for each cache entry, and report_collided_checkout() reported which ones had it set. Since a dry-run would skip all the checkout_entry() calls, CE_MATCHED would never get set and thus no collisions would be reported. Since we can't detect the collisions anyway without doing updates, skipping the collisions detection setup and reporting is an optimization. * The code previously would call get_progress() and display_progress() even when (!o->update \|\| o->dry_run). This served to show how long it took to skip all the updates, which is somewhat useless. Since we are skipping the updates, we can skip showing how long it takes to skip them. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-07 08:37:37 -08:00
Junio C Hamano	bd72a08d6c	Merge branch 'ds/sparse-cone' Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command. * ds/sparse-cone: (21 commits) sparse-checkout: improve OS ls compatibility sparse-checkout: respect core.ignoreCase in cone mode sparse-checkout: check for dirty status sparse-checkout: update working directory in-process for 'init' sparse-checkout: cone mode should not interact with .gitignore sparse-checkout: write using lockfile sparse-checkout: use in-process update for disable subcommand sparse-checkout: update working directory in-process sparse-checkout: sanitize for nested folders unpack-trees: add progress to clear_ce_flags() unpack-trees: hash less in cone mode sparse-checkout: init and set in cone mode sparse-checkout: use hashmaps for cone patterns sparse-checkout: add 'cone' mode trace2: add region in clear_ce_flags sparse-checkout: create 'disable' subcommand sparse-checkout: add '--stdin' option to set subcommand sparse-checkout: 'set' subcommand clone: add --sparse mode sparse-checkout: create 'init' subcommand ...	2019-12-25 11:21:58 -08:00
Junio C Hamano	7034cd094b	Sync with Git 2.24.1	2019-12-09 22:17:55 -08:00
Johannes Schindelin	67af91c47a	Sync with 2.23.1 * maint-2.23: (44 commits) Git 2.23.1 Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters ...	2019-12-06 16:31:39 +01:00
Johannes Schindelin	7fd9fd94fb	Sync with 2.22.2 * maint-2.22: (43 commits) Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors ...	2019-12-06 16:31:30 +01:00
Johannes Schindelin	5421ddd8d0	Sync with 2.21.1 * maint-2.21: (42 commits) Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh ...	2019-12-06 16:31:23 +01:00
Johannes Schindelin	fc346cb292	Sync with 2.20.2 * maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...	2019-12-06 16:31:12 +01:00
Johannes Schindelin	d851d94151	Sync with 2.19.3 * maint-2.19: (34 commits) Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams ...	2019-12-06 16:30:49 +01:00
Johannes Schindelin	7c9fbda6e2	Sync with 2.18.2 * maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...	2019-12-06 16:30:38 +01:00
Johannes Schindelin	14af7ed5a9	Sync with 2.17.3 * maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...	2019-12-06 16:29:15 +01:00
Johannes Schindelin	bdfef0492c	Sync with 2.16.6 * maint-2.16: (31 commits) Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses ...	2019-12-06 16:27:36 +01:00
Johannes Schindelin	9ac92fed5b	Sync with 2.15.4 * maint-2.15: (29 commits) Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment ...	2019-12-06 16:27:18 +01:00
Johannes Schindelin	d3ac8c3f27	Sync with 2.14.6 * maint-2.14: (28 commits) Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment test-path-utils: offer to run a protectNTFS/protectHFS benchmark ...	2019-12-06 16:26:55 +01:00
Junio C Hamano	473b431410	Merge branch 'us/unpack-trees-fsmonitor' Users of oneway_merge() (like "reset --hard") learned to take advantage of fsmonitor to avoid unnecessary lstat(2) calls. * us/unpack-trees-fsmonitor: unpack-trees: skip stat on fsmonitor-valid files	2019-12-05 12:52:48 -08:00
Johannes Schindelin	cc756edda6	unpack-trees: let merged_entry() pass through do_add_entry()'s errors A `git clone` will end with exit code 0 when `merged_entry()` returns a positive value during a call of `unpack_trees()` to `traverse_trees()`. The reason is that `unpack_trees()` will interpret a positive value not to be an error. The problem is, however, that `add_index_entry()` (which is called by `merged_entry()` can report an error, and we really should fail the entire clone in such a case. Let's fix this problem, in preparation for a Windows-specific patch disallowing `mkdir()` with directory names that contain a trailing space (which is illegal on NTFS): we want `git clone` to abort when a path cannot be checked out due to that condition. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2019-12-05 15:37:06 +01:00
Derrick Stolee	e091228e17	sparse-checkout: update working directory in-process The sparse-checkout builtin used 'git read-tree -mu HEAD' to update the skip-worktree bits in the index and to update the working directory. This extra process is overly complex, and prone to failure. It also requires that we write our changes to the sparse-checkout file before trying to update the index. Remove this extra process call by creating a direct call to unpack_trees() in the same way 'git read-tree -mu HEAD' does. In addition, provide an in-memory list of patterns so we can avoid reading from the sparse-checkout file. This allows us to test a proposed change to the file before writing to it. An earlier version of this patch included a bug when the 'set' command failed due to the "Sparse checkout leaves no entry on working directory" error. It would not rollback the index.lock file, so the replay of the old sparse-checkout specification would fail. A test in t1091 now covers that scenario. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-22 16:11:44 +09:00
Derrick Stolee	4dcd4def3c	unpack-trees: add progress to clear_ce_flags() When a large repository has many sparse-checkout patterns, the process for updating the skip-worktree bits can take long enough that a user gets confused why nothing is happening. Update the clear_ce_flags() method to write progress. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-22 16:11:44 +09:00
Derrick Stolee	eb42feca97	unpack-trees: hash less in cone mode The sparse-checkout feature in "cone mode" can use the fact that the recursive patterns are "connected" to the root via parent patterns to decide if a directory is entirely contained in the sparse-checkout or entirely removed. In these cases, we can skip hashing the paths within those directories and simply set the skipworktree bit to the correct value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-22 16:11:44 +09:00
Derrick Stolee	96cc8ab531	sparse-checkout: use hashmaps for cone patterns The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match. As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here: /$folder/ !/$folder/*/ This resulted in a sparse-checkout file sith 8,296 patterns. Running 'git read-tree -mu HEAD' on this file had the following performance: core.sparseCheckout=false: 0.21 s (0.00 s) core.sparseCheckout=true: 3.75 s (3.50 s) core.sparseCheckoutCone=true: 0.23 s (0.01 s) The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces. While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-22 16:11:44 +09:00
Jeff Hostetler	e6152e35ff	trace2: add region in clear_ce_flags When Git updates the working directory with the sparse-checkout feature enabled, the unpack_trees() method calls clear_ce_flags() to update the skip-wortree bits on the cache entries. This check can be expensive, depending on the patterns used. Add trace2 regions around the method, including some flag information, so we can get granular performance data during experiments. This data will be used to measure improvements to the pattern-matching algorithms for sparse-checkout. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-22 16:11:44 +09:00
Utsav Shah	679f2f9fdd	unpack-trees: skip stat on fsmonitor-valid files The index might be aware that a file hasn't modified via fsmonitor, but unpack-trees did not pay attention to it and checked via ie_match_stat which can be inefficient on certain filesystems. This significantly slows down commands that run oneway_merge, like checkout and reset --hard. This patch makes oneway_merge check whether a file is considered unchanged through fsmonitor and skips ie_match_stat on it. unpack-trees also now correctly copies over fsmonitor validity state from the source index. Finally, for correctness, we force a refresh of fsmonitor state in tweak_fsmonitor. After this change, commands like stash (that use reset --hard internally) go from 8s or more to ~2s on a 250k file repository on a mac. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Kevin Willford <Kevin.Willford@microsoft.com> Signed-off-by: Utsav Shah <utsav@dropbox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-21 12:48:18 +09:00
Elijah Newren	15beaaa3d1	Fix spelling errors in code comments Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-10 16:00:54 +09:00
René Scharfe	2fe44394c8	treewide: remove duplicate #include directives Found with "git grep '^#include ' '*.c' \| sort \| uniq -d". Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-04 08:16:00 +09:00
Junio C Hamano	9755f70fe6	Merge branch 'ds/include-exclude' The internal code originally invented for ".gitignore" processing got reshuffled and renamed to make it less tied to "excluding" and stress more that it is about "matching", as it has been reused for things like sparse checkout specification that want to check if a path is "included". * ds/include-exclude: unpack-trees: rename 'is_excluded_from_list()' treewide: rename 'exclude' methods to 'pattern' treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_' treewide: rename 'struct exclude_list' to 'struct pattern_list' treewide: rename 'struct exclude' to 'struct path_pattern'	2019-09-30 13:19:32 +09:00
Junio C Hamano	b9ac6c59b8	Merge branch 'cc/multi-promisor' Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands	2019-09-18 11:50:09 -07:00
Derrick Stolee	468ce99b77	unpack-trees: rename 'is_excluded_from_list()' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! Now that this library is renamed to use 'struct pattern_list' and 'struct pattern', we can now rename the method used by the sparse-checkout feature to determine which paths should appear in the working directory. The method is_excluded_from_list() is only used by the sparse-checkout logic in unpack-trees and list-objects-filter. The confusing part is that it returned 1 for "excluded" (i.e. it matches the list of exclusions) but that really manes that the path matched the list of patterns for _inclusion_ in the working directory. Rename the method to be path_matches_pattern_list() and have it return an explicit 'enum pattern_match_result'. Here, the values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree with the previous integer values. This shift allows future consumers to better understand what the retur values mean, and provides more type checking for handling those values. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:05:12 -07:00
Derrick Stolee	65edd96aec	treewide: rename 'exclude' methods to 'pattern' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames several methods defined in dir.h to make more sense with the renamed 'struct exclude_list' to 'struct pattern_list' and 'struct exclude' to 'struct path_pattern': * last_exclude_matching() -> last_matching_pattern() * parse_exclude() -> parse_path_pattern() In addition, the word 'exclude' was replaced with 'pattern' in the methods below: * add_exclude_list() * add_excludes_from_file_to_list() * add_excludes_from_file() * add_excludes_from_blob_to_list() * add_exclude() * clear_exclude_list() A few methods with the word "exclude" remain. These will be handled seperately. In particular, the method "is_excluded()" is concretely about the .gitignore file relative to a specific directory. This is the important boundary between library and consumer: is_excluded() cares about .gitignore, but is_excluded() calls last_matching_pattern() to make that decision. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:05:12 -07:00
Derrick Stolee	caa3d55444	treewide: rename 'struct exclude_list' to 'struct pattern_list' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames 'struct exclude_list' to 'struct pattern_list' and renames several variables called 'el' to 'pl'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:05:11 -07:00
Junio C Hamano	1b01cdbf2e	Merge branch 'jk/tree-walk-overflow' Codepaths to walk tree objects have been audited for integer overflows and hardened. * jk/tree-walk-overflow: tree-walk: harden make_traverse_path() length computations tree-walk: add a strbuf wrapper for make_traverse_path() tree-walk: accept a raw length for traverse_path_len() tree-walk: use size_t consistently tree-walk: drop oid from traverse_info setup_traverse_info(): stop copying oid	2019-08-22 12:34:10 -07:00
Jeff King	5aa02f9868	tree-walk: harden make_traverse_path() length computations The make_traverse_path() function isn't very careful about checking its output buffer boundaries. In fact, it doesn't even _know_ the size of the buffer it's writing to, and just assumes that the caller used traverse_path_len() correctly. And even then we assume that our traverse_info.pathlen components are all correct, and just blindly write into the buffer. Let's improve this situation a bit: - have the caller pass in their allocated buffer length, which we'll check against our own computations - check for integer underflow as we do our backwards-insertion of pathnames into the buffer - check that we do not run out items in our list to traverse before we've filled the expected number of bytes None of these should be triggerable in practice (especially since our switch to size_t everywhere in a previous commit), but it doesn't hurt to check our assumptions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-01 13:06:52 -07:00
Jeff King	c43ab06259	tree-walk: add a strbuf wrapper for make_traverse_path() All but one of the callers of make_traverse_path() allocate a new heap buffer to store the path. Let's give them an easy way to write to a strbuf, which saves them from computing the length themselves (which is especially tricky when they want to add to the path). It will also make it easier for us to change the make_traverse_path() interface in a future patch to improve its bounds-checking. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-01 13:06:52 -07:00
Jeff King	b3b3cbcbf2	tree-walk: accept a raw length for traverse_path_len() We take a "struct name_entry", but only care about the length of the path name. Let's just take that length directly, making it easier to use the function from callers that sometimes do not have a name_entry at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-01 13:06:52 -07:00
Jeff King	37806080d7	tree-walk: use size_t consistently We store and manipulate the cumulative traverse_info.pathlen as an "int", which can overflow when we are fed ridiculously long pathnames (e.g., ones at the edge of 2GB or 4GB, even if the individual tree entry names are smaller than that). The results can be confusing, though after some prodding I was not able to use this integer overflow to cause an under-allocated buffer. Let's consistently use size_t to generate and store these, and make sure our addition doesn't overflow. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-01 13:06:40 -07:00
Jeff King	9055384710	tree-walk: drop oid from traverse_info As the previous commit shows, the presence of an oid in each level of the traverse_info is confusing and ultimately not necessary. Let's drop it to make it clear that it will not always be set (as well as convince us that it's unused, and let the compiler catch any merges with other branches that do add new uses). Since the oid is part of name_entry, we'll actually stop embedding a name_entry entirely, and instead just separately hold the pathname, its length, and the mode. This makes the resulting code slightly more verbose as we have to pass those elements around individually. But it also makes it more clear what each code path is going to use (and in most of the paths, we really only care about the pathname itself). A few of these conversions are noisier than they need to be, as they also take the opportunity to rename "len" to "namelen" for clarity (especially where we also have "pathlen" or "ce_len" alongside). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-31 13:34:25 -07:00
Junio C Hamano	1eb0a12ec3	Merge branch 'nd/tree-walk-with-repo' The tree-walk API learned to pass an in-core repository instance throughout more codepaths. * nd/tree-walk-with-repo: t7814: do not generate same commits in different repos Use the right 'struct repository' instead of the_repository match-trees.c: remove the_repo from shift_tree*() tree-walk.c: remove the_repo from get_tree_entry_follow_symlinks() tree-walk.c: remove the_repo from get_tree_entry() tree-walk.c: remove the_repo from fill_tree_descriptor() sha1-file.c: remove the_repo from read_object_with_reference()	2019-07-19 11:30:21 -07:00
Junio C Hamano	f496b064fc	Merge branch 'nd/switch-and-restore' Two new commands "git switch" and "git restore" are introduced to split "checking out a branch to work on advancing its history" and "checking out paths out of the index and/or a tree-ish to work on advancing the current history" out of the single "git checkout" command. * nd/switch-and-restore: (46 commits) completion: disable dwim on "git switch -d" switch: allow to switch in the middle of bisect t2027: use test_must_be_empty Declare both git-switch and git-restore experimental help: move git-diff and git-reset to different groups doc: promote "git restore" user-manual.txt: prefer 'merge --abort' over 'reset --hard' completion: support restore t: add tests for restore restore: support --patch restore: replace --force with --ignore-unmerged restore: default to --source=HEAD when only --staged is specified restore: reject invalid combinations with --staged restore: add --worktree and --staged checkout: factor out worktree checkout code restore: disable overlay mode by default restore: make pathspec mandatory restore: take tree-ish from --source option instead checkout: split part of it to new command 'restore' doc: promote "git switch" ...	2019-07-09 15:25:44 -07:00
Nguyễn Thái Ngọc Duy	5e57580733	tree-walk.c: remove the_repo from fill_tree_descriptor() While at there, clean up the_repo usage in builtin/merge-tree.c a tiny bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-27 12:45:17 -07:00
Christian Couder	b14ed5adaf	Use promisor_remote_get_direct() and has_promisor_remote() Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-25 14:05:37 -07:00
Junio C Hamano	32dc15dec1	Merge branch 'jt/batch-fetch-blobs-in-diff' While running "git diff" in a lazy clone, we can upfront know which missing blobs we will need, instead of waiting for the on-demand machinery to discover them one by one. Aim to achieve better performance by batching the request for these promised blobs. * jt/batch-fetch-blobs-in-diff: diff: batch fetching of missing blobs sha1-file: support OBJECT_INFO_FOR_PREFETCH	2019-04-25 16:41:19 +09:00
Junio C Hamano	4a3ed2bec6	Merge branch 'nd/checkout-m' "git checkout -m <other>" was about carrying the differences between HEAD and the working-tree files forward while checking out another branch, and ignored the differences between HEAD and the index. The command has been taught to abort when the index and the HEAD are different. * nd/checkout-m: checkout: prevent losing staged changes with --merge read-tree: add --quiet unpack-trees: rename "gently" flag to "quiet" unpack-trees: keep gently check inside add_rejected_path	2019-04-25 16:41:14 +09:00
Junio C Hamano	4284497396	Merge branch 'jk/unused-params-even-more' Code cleanup. * jk/unused-params-even-more: parse_opt_ref_sorting: always use with NONEG flag pretty: drop unused strbuf from parse_padding_placeholder() pretty: drop unused "type" parameter in needs_rfc2047_encoding() parse-options: drop unused ctx parameter from show_gitcomp() fetch_pack(): drop unused parameters report_path_error(): drop unused prefix parameter unpack-trees: drop unused error_type parameters unpack-trees: drop name_entry from traverse_by_cache_tree() test-date: drop unused "now" parameter from parse_dates() update-index: drop unused prefix_length parameter from do_reupdate() log: drop unused "len" from show_tagger() log: drop unused rev_info from early output revision: drop some unused "revs" parameters	2019-04-25 16:41:12 +09:00
Junio C Hamano	5795a75f9b	Merge branch 'bp/post-index-change-hook' A new hook "post-index-change" is called when the on-disk index file changes, which can help e.g. a virtualized working tree implementation. * bp/post-index-change-hook: read-cache: add post-index-change hook	2019-04-25 16:41:11 +09:00
Nguyễn Thái Ngọc Duy	328c6cb853	doc: promote "git switch" The new command "git switch" is added to avoid the confusion of one-command-do-all "git checkout" for new users. They are also helpful to avoid ambiguation context. For these reasons, promote it everywhere possible. This includes documentation, suggestions/advice from other commands... The "Checking out files" progress line in unpack-trees.c is also updated to "Updating files" to be neutral to both git-checkout and git-switch. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-02 13:57:00 +09:00
Jonathan Tan	0f4a4fb1c4	sha1-file: support OBJECT_INFO_FOR_PREFETCH Teach oid_object_info_extended() to support a new flag that inhibits fetching of missing objects. This is equivalent to setting fetch_is_missing to 0, calling oid_object_info_extended(), then setting fetch_if_missing to whatever it was before. Update unpack-trees.c to use this new flag instead of repeatedly setting fetch_if_missing. This new flag complicates things slightly in that there are now 2 ways to do the same thing. But this eliminates the need to repeatedly set a global variable, and more importantly, allows prefetching to be done in parallel (in the future); hence, this patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 15:47:15 +09:00
Nguyễn Thái Ngọc Duy	b165fac8c1	unpack-trees: rename "gently" flag to "quiet" The gently flag was added in `17e4642667` (Add flag to make unpack_trees() not print errors. - 2008-02-07) to suppress error messages. The name "gently" does not quite express that. Granted, being quiet is gentle but it could mean not performing some other actions. Rename the flag to "quiet" to be more on point. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-24 21:35:34 +09:00
Nguyễn Thái Ngọc Duy	191e9d2c2d	unpack-trees: keep gently check inside add_rejected_path This basically follows the footsteps of `6a143aa2b2` (checkout -m: attempt merge when deletion of path was staged - 2014-08-12) where there gently check is moved inside reject_merge() so that callers do not accidentally forget it. add_rejected_path() has the same usage pattern. All call sites check gently first, then decide to call add_rejected_path() if needed. Move the check inside. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-24 21:35:34 +09:00
Nguyễn Thái Ngọc Duy	ab5af825db	unpack-trees: fix oneway_merge accidentally carry over stage index Phillip found out that 'git checkout -f <branch>' does not restore conflict/unmerged files correctly. All tracked files should be taken from <branch> and all non-zero stages removed. Most of this is true, except that the final file could be in stage one instead of zero. "checkout -f" (among other commands) does this with one-way merge, which is supposed to take stat info from the index and everything else from the given tree. The add_entry(.., old, ...) call in oneway_merge() though will keep stage index from the index. This is normally not a problem if the entry from the index is normal (stage #0). But if there is a conflict, stage #0 does not exist and we'll get stage #1 entry as "old" variable, which gets recorded in the final index. Fix it by clearing stage mask. This bug probably comes from `b5b425074e` (git-read-tree: make one-way merge also honor the "update" flag, 2005-06-07). Before this commit, we may create the final ("dst") index entry from the one in index, but we do clear CE_STAGEMASK. I briefly checked two- and three-way merge functions. I think we don't have the same problem in those. Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-21 12:06:28 +09:00
Jeff King	df351c6e67	unpack-trees: drop unused error_type parameters The verify_clean_subdirectory() helper takes an error_type parameter from the caller, but doesn't actually use it. Instead, when it calls add_rejected_path() it passes NOT_UPTODATE_DIR, its own custom error type which is more specific than what the caller provides. Likewise for verify_clean_submodule(), which always passes WOULD_LOSE_SUBMODULE. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-20 18:34:09 +09:00
Jeff King	664296985c	unpack-trees: drop name_entry from traverse_by_cache_tree() We pull the names from the existing index rather than the tree entry, which is after all the point of this function. Let's drop the unused "names" parameter. Note that we leave the "nr_names" parameter, as it tells us how many trees we are traversing (and thus how many index stages to set up). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-20 18:34:09 +09:00
Junio C Hamano	7d0c1f4556	Merge branch 'tg/checkout-no-overlay' "git checkout --no-overlay" can be used to trigger a new mode of checking out paths out of the tree-ish, that allows paths that match the pathspec that are in the current index and working tree and are not in the tree-ish. * tg/checkout-no-overlay: revert "checkout: introduce checkout.overlayMode config" checkout: introduce checkout.overlayMode config checkout: introduce --{,no-}overlay option checkout: factor out mark_cache_entry_for_checkout function checkout: clarify comment read-cache: add invalidate parameter to remove_marked_cache_entries entry: support CE_WT_REMOVE flag in checkout_entry entry: factor out unlink_entry function move worktree tests to t24*	2019-03-07 09:59:51 +09:00
Ben Peart	1956ecd0ab	read-cache: add post-index-change hook Add a post-index-change hook that is invoked after the index is written in do_write_locked_index(). This hook is meant primarily for notification, and cannot affect the outcome of git commands that trigger the index write. The hook is passed a flag to indicate whether the working directory was updated or not and a flag indicating if a skip-worktree bit could have changed. These flags enable the hook to optimize its response to the index change notification. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-02-15 11:00:33 -08:00
Junio C Hamano	7589e63648	Merge branch 'nd/the-index-final' The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument	2019-02-06 22:05:23 -08:00
Junio C Hamano	371820d5f1	Merge branch 'bc/tree-walk-oid' The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use	2019-01-29 12:47:56 -08:00
Nguyễn Thái Ngọc Duy	f8adbec9fe	cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-24 11:55:06 -08:00
brian m. carlson	ea82b2a085	tree-walk: store object_id in a separate member When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-15 09:57:41 -08:00
Junio C Hamano	4084df42c2	Merge branch 'nd/checkout-noisy' "git checkout [<tree-ish>] path..." learned to report the number of paths that have been checked out of the index or the tree-ish, which gives it the same degree of noisy-ness as the case in which the command checks out a branch. * nd/checkout-noisy: t0027: squelch checkout path run outside test_expect_* block checkout: print something when checking out paths	2019-01-14 15:29:29 -08:00
Junio C Hamano	d6f05a435f	Merge branch 'nd/attr-pathspec-in-tree-walk' The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. * nd/attr-pathspec-in-tree-walk: tree-walk: support :(attr) matching dir.c: move, rename and export match_attrs() pathspec.h: clean up "extern" in function declarations tree-walk.c: make tree_entry_interesting() take an index tree.c: make read_tree() take 'struct repository '	2019-01-14 15:29:28 -08:00
Thomas Gummerer	6fdc205722	read-cache: add invalidate parameter to remove_marked_cache_entries When marking cache entries for removal, and later removing them all at once using 'remove_marked_cache_entries()', cache entries currently have to be invalidated manually in the cache tree and in the untracked cache. Add an invalidate flag to the function. With the flag set, the function will take care of invalidating the path in the cache tree and in the untracked cache. Note that the current callsites already do the invalidation properly in other places, so we're just passing 0 from there to keep the status quo. This will be useful in a subsequent commit. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-02 15:28:05 -08:00
Thomas Gummerer	b702dd12d5	entry: factor out unlink_entry function Factor out the 'unlink_entry()' function from unpack-trees.c to entry.c. It will be used in other places as well in subsequent steps. As it's no longer a static function, also move the documentation to the header file to make it more discoverable. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-02 15:28:05 -08:00
Nguyễn Thái Ngọc Duy	67022e0214	tree-walk.c: make tree_entry_interesting() take an index In order to support :(attr) when matching pathspec on a tree, tree_entry_interesting() needs to take an index (because git_check_attr() needs it). This is the preparation step for it. This also makes it clearer what index we fall back to when looking up attributes during an unpack-trees operation: the source index. This also fixes revs->pruning.repo initialization that should have been done in `2abf350385` (revision.c: remove implicit dependency on the_index - 2018-09-21). Without it, skip_uninteresting() will dereference a NULL pointer through this call chain get_revision(revs) get_revision_internal get_revision_1 try_to_simplify_commit rev_compare_tree diff_tree_oid(..., &revs->pruning) ll_diff_tree_oid diff_tree_paths ll_diff_tree skip_uninteresting Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-19 10:50:33 +09:00
Nguyễn Thái Ngọc Duy	0f086e6dca	checkout: print something when checking out paths One of the problems with "git checkout" is that it does so many different things and could confuse people specially when we fail to handle ambiguation correctly. One way to help with that is tell the user what sort of operation is actually carried out. When switching branches, we always print something unless --quiet, either - "HEAD is now at ..." - "Reset branch ..." - "Already on ..." - "Switched to and reset ..." - "Switched to a new branch ..." - "Switched to branch ..." Checking out paths however is silent. Print something so that if we got the user intention wrong, they won't waste too much time to find that out. For the remaining cases of checkout we now print either - "Checked out ... paths out of the index" - "Checked out ... paths out of <abbrev hash>" Since the purpose of printing this is to help disambiguate. Only do it when "--" is missing. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-14 15:10:35 +09:00
Nguyễn Thái Ngọc Duy	c207e9e1f6	cache-tree.c: remove the_repository references This case is more interesting than other boring "remove the_repo" commits because while we need access to the object database, we cannot simply use r->index because unpack-trees.c can operate on a temporary index, not $GIT_DIR/index. Ideally we should be able to pass an object database to lookup_tree() but that ship has sailed. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 14:50:06 +09:00
Carlo Marcelo Arenas Belón	07f967adb5	unpack-trees: avoid dead store for struct progress it is unconditionally initialized a few lines below Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-19 11:16:52 +09:00
Junio C Hamano	ee99ba7afb	Merge branch 'jt/lazy-object-fetch-fix' The code to backfill objects in lazily cloned repository did not work correctly, which has been corrected. * jt/lazy-object-fetch-fix: fetch-object: set exact_oid when fetching fetch-object: unify fetch_object[s] functions	2018-09-24 10:30:52 -07:00
Junio C Hamano	769af0fd9e	Merge branch 'jk/cocci' spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion	2018-09-17 13:53:57 -07:00
Junio C Hamano	7e794d0a3f	Merge branch 'nd/unpack-trees-with-cache-tree' The unpack_trees() API used in checking out a branch and merging walks one or more trees along with the index. When the cache-tree in the index tells us that we are walking a tree whose flattened contents is known (i.e. matches a span in the index), as linearly scanning a span in the index is much more efficient than having to open tree objects recursively and listing their entries, the walk can be optimized, which is done in this topic. * nd/unpack-trees-with-cache-tree: Document update for nd/unpack-trees-with-cache-tree cache-tree: verify valid cache-tree in the test suite unpack-trees: add missing cache invalidation unpack-trees: reuse (still valid) cache-tree from src_index unpack-trees: reduce malloc in cache-tree walk unpack-trees: optimize walking same trees with cache-tree unpack-trees: add performance tracing trace.h: support nested performance tracing	2018-09-17 13:53:53 -07:00
Junio C Hamano	c2407322b6	Merge branch 'nd/clone-case-smashing-warning' Running "git clone" against a project that contain two files with pathnames that differ only in cases on a case insensitive filesystem would result in one of the files lost because the underlying filesystem is incapable of holding both at the same time. An attempt is made to detect such a case and warn. * nd/clone-case-smashing-warning: clone: report duplicate entries on case-insensitive filesystems	2018-09-17 13:53:47 -07:00
Jonathan Tan	8708ca09a6	fetch-object: unify fetch_object[s] functions There are fetch_object() and fetch_objects() helpers in fetch-object.h; as the latter takes "struct oid_array", the former cannot be made into a thin wrapper around the latter without an extra allocation and set-up cost. Update fetch_objects() to take an array of "struct object_id" and number of elements in it as separate parameters, remove fetch_object(), and adjust all existing callers of these functions to use the new fetch_objects(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-13 13:56:19 -07:00
Jeff King	4a7e27e957	convert "oidcmp() == 0" to oideq() Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-29 11:32:49 -07:00
Nguyễn Thái Ngọc Duy	5f4436a721	Document update for nd/unpack-trees-with-cache-tree Fix an incorrect comment in the new code added in `b4da37380b` (unpack-trees: optimize walking same trees with cache-tree - 2018-08-18) and document about the new test variable that is enabled by default in test-lib.sh in `4592e6080f` (cache-tree: verify valid cache-tree in the test suite - 2018-08-18) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 12:26:17 -07:00
Nguyễn Thái Ngọc Duy	4592e6080f	cache-tree: verify valid cache-tree in the test suite This makes sure that cache-tree is consistent with the index. The main purpose is to catch potential problems by saving the index in unpack_trees() but the line in write_index() would also help spot missing invalidation in other code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-18 09:47:46 -07:00
Nguyễn Thái Ngọc Duy	5697ca9aa5	unpack-trees: add missing cache invalidation Any changes to the output index should be (confusingly) marked in the source index with invalidate_ce_path(). This is used to make sure we still have valid untracked cache and cache-tree extensions in the end. We do a pretty good job of invalidating except in two places. verify_clean_subdirectory() is part of verify_absent() and verify_absent_sparse(). The former is usually called by merged_entry() or directly in threeway_merge(). The latter is obviously used by sparse checkout. In these three call sites, only merged_entry() follows up with invalidate_ce_path(). The other two don't, but they should not trigger this ce removal because this is about D/F conflicts [1]. But let's be safe and invalidate_ce_path() here as well. The second place is keep_entry() which is also used by threeway_merge() to keep higher stage entries. In order to reuse cache-tree we need to invalidate these paths as well. It's not a problem in the past because whenever a higher stage entry is present, cache-tree will not be created [2]. Now we salvage cache-tree even when higher stage entries are present, we need more invalidation. [1] `c81935348b` (Fix switching to a branch with D/F when current branch has file D. - 2007-03-15) [2] This is probably too strict. We should be able to create and save cache-tree for the directories that do not have conflict entries in cache_tree_update(). And this becomes more important when cache-tree plays bigger role in terms of performance. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-18 09:47:46 -07:00
Nguyễn Thái Ngọc Duy	836ef2b69f	unpack-trees: reuse (still valid) cache-tree from src_index We do n-way merge by walking the source index and n trees at the same time and add merge results to a new temporary index called o->result. The merge result for any given path could be either - keep_entry(): same old index entry in o->src_index is reused - merged_entry(): either a new entry is added, or an existing one updated - deleted_entry(): one entry from o->src_index is removed For some reason [1] we keep making sure that the source index's cache-tree is still valid if used by o->result: for all those merged/deleted entries, we invalidate the same path in o->src_index, so only cache-trees covering the "keep_entry" parts remain good. Because of this, the cache-tree from o->src_index can be perfectly reused in o->result. And in fact we already rely on this logic to reuse untracked cache in `edf3b90553` (unpack-trees: preserve index extensions - 2017-05-08). Move the cache-tree to o->result before doing cache_tree_update() to reduce hashing cost. Since cache_tree_update() has risen up as one of the most expensive parts in unpack_trees() after the last few patches. This does help reduce unpack_trees() time significantly (on webkit.git): before after -------------------------------------------------------------------- 0.080394752 0.051258167 s: read cache .git/index 0.216010838 0.212106298 s: preload index 0.008534301 0.280521764 s: refresh index 0.251992198 0.218160442 s: traverse_trees 0.377031383 0.374948191 s: check_updates 0.372768105 0.037040114 s: cache_tree_update 1.045887251 0.672031609 s: unpack_trees 0.314983512 0.317456290 s: write index, changed mask = 2e 0.062572653 0.038382654 s: traverse_trees 0.000022544 0.000042731 s: check_updates 0.073795585 0.050930053 s: unpack_trees 0.073807557 0.051099735 s: diff-index 1.938191592 1.614241153 s: git command: git checkout - [1] I'm pretty sure the reason is an oversight in `34110cd4e3` (Make 'unpack_trees()' have a separate source and destination index - 2008-03-06). That patch aims to _not_ update the source index at all. The invalidation should have been done on o->result in that patch. But then there was no cache-tree on o->result even then so it's pointless to do so. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-18 09:47:46 -07:00
Nguyễn Thái Ngọc Duy	f1e11c6510	unpack-trees: reduce malloc in cache-tree walk This is a micro optimization that probably only shines on repos with deep directory structure. Instead of allocating and freeing a new cache_entry in every iteration, we reuse the last one and only update the parts that are new each iteration. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-18 09:47:46 -07:00
Nguyễn Thái Ngọc Duy	b4da37380b	unpack-trees: optimize walking same trees with cache-tree In order to merge one or many trees with the index, unpack-trees code walks multiple trees in parallel with the index and performs n-way merge. If we find out at start of a directory that all trees are the same (by comparing OID) and cache-tree happens to be available for that directory as well, we could avoid walking the trees because we already know what these trees contain: it's flattened in what's called "the index". The upside is of course a lot less I/O since we can potentially skip lots of trees (think subtrees). We also save CPU because we don't have to inflate and apply the deltas. The downside is of course more fragile code since the logic in some functions are now duplicated elsewhere. "checkout -" with this patch on webkit.git (275k files): baseline new -------------------------------------------------------------------- 0.056651714 0.080394752 s: read cache .git/index 0.183101080 0.216010838 s: preload index 0.008584433 0.008534301 s: refresh index 0.633767589 0.251992198 s: traverse_trees 0.340265448 0.377031383 s: check_updates 0.381884638 0.372768105 s: cache_tree_update 1.401562947 1.045887251 s: unpack_trees 0.338687914 0.314983512 s: write index, changed mask = 2e 0.411927922 0.062572653 s: traverse_trees 0.000023335 0.000022544 s: check_updates 0.423697246 0.073795585 s: unpack_trees 0.423708360 0.073807557 s: diff-index 2.559524127 1.938191592 s: git command: git checkout - Another measurement from Ben's running "git checkout" with over 500k trees (on the whole series): baseline new ---------------------------------------------------------------------- 0.535510167 0.556558733 s: read cache .git/index 0.3057373 0.3147105 s: initialize name hash 0.0184082 0.023558433 s: preload index 0.086910967 0.089085967 s: refresh index 7.889590767 2.191554433 s: unpack trees 0.120760833 0.131941267 s: update worktree after a merge 2.2583504 2.572663167 s: repair cache-tree 0.8916137 0.959495233 s: write index, changed mask = 28 3.405199233 0.2710663 s: unpack trees 0.000999667 0.0021554 s: update worktree after a merge 3.4063306 0.273318333 s: diff-index 16.9524923 9.462943133 s: git command: git.exe checkout This command calls unpack_trees() twice, the first time on 2way merge and the second 1way merge. In both times, "unpack trees" time is reduced to one third. Overall time reduction is not that impressive of course because index operations take a big chunk. And there's that repair cache-tree line. PS. A note about cache-tree invalidation and the use of it in this code. We do invalidate cache-tree in _source_ index when we add new entries to the (temporary) "result" index. But we also use the cache-tree from source index in this optimization. Does this mean we end up having no cache-tree in the source index to activate this optimization? The answer is twisted: the order of finding a good cache-tree and invalidating it matters. In this case we check for a good cache-tree first in all_trees_same_as_cache_tree(), then we start to merge things and potentially invalidate that same cache-tree in the process. Since cache-tree invalidation happens after the optimization kicks in, we're still good. But we may lose that cache-tree at the very first call_unpack_fn() call in traverse_by_cache_tree(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-18 09:47:46 -07:00
Nguyễn Thái Ngọc Duy	0d1ed5963d	unpack-trees: add performance tracing We're going to optimize unpack_trees() a bit in the following patches. Let's add some tracing to measure how long it takes before and after. This is the baseline ("git checkout -" on webkit.git, 275k files on worktree) performance: 0.056651714 s: read cache .git/index performance: 0.183101080 s: preload index performance: 0.008584433 s: refresh index performance: 0.633767589 s: traverse_trees performance: 0.340265448 s: check_updates performance: 0.381884638 s: cache_tree_update performance: 1.401562947 s: unpack_trees performance: 0.338687914 s: write index, changed mask = 2e performance: 0.411927922 s: traverse_trees performance: 0.000023335 s: check_updates performance: 0.423697246 s: unpack_trees performance: 0.423708360 s: diff-index performance: 2.559524127 s: git command: git checkout - Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-18 09:47:46 -07:00
Duy Nguyen	b878579ae7	clone: report duplicate entries on case-insensitive filesystems Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. In the case that we can't rely on filesystem (via inode number) to do this check, fall back to fspathcmp() which is not perfect but should not give false positives. This patch is tested with vim-colorschemes and Sublime-Gitignore repositories on a JFS partition with case insensitive support on Linux. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-17 12:10:37 -07:00
Nguyễn Thái Ngọc Duy	c4500e251f	attr: remove index from git_attr_set_direction() Since attr checking API now take the index, there's no need to set an index in advance with this call. Most call sites are straightforward because they either pass the_index or NULL (which defaults back to the_index previously). There's only one suspicious call site in unpack-trees.c where it sets a different index. This code in unpack-trees is about to check out entries from the new/temporary index after merging is done in it. The attributes will be used by entry.c code to do crlf conversion if needed. entry.c now respects struct checkout's istate field, and this field is correctly set in unpack-trees.c, there should be no regression from this change. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:43 -07:00
Nguyễn Thái Ngọc Duy	c7f3259d0d	unpack-trees: avoid the_index in verify_absent() Both functions that are updated in this commit are called by verify_absent(), which is part of the "unpack-trees" operation that is supposed to work on any index file specified by the caller. Thanks to Brandon [1] [2], an implicit dependency on the_index is exposed. This commit fixes it. In both functions, it makes sense to use src_index to check for exclusion because it's almost unchanged and should give us the same outcome as if running the exclude check before the unpack. It's "almost unchanged" because we do invalidate cache-tree and untracked cache in the source index. But this should not affect how exclude machinery uses the index: to see if a file is tracked, and to read a blob from the index instead of worktree if it's marked skip-worktree (i.e. it's not available in worktree) [1] `a0bba65b10` (dir: convert is_excluded to take an index - 2017-05-05 [2] `2c1eb10454` (dir: convert read_directory to take an index - 2017-05-05) Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:43 -07:00
Nguyễn Thái Ngọc Duy	27c82fb3b4	unpack-trees: convert clear_ce_flags* to avoid the_index Prior to `fba92be8f7`, this code implicitly (and incorrectly) assumes the_index when running the exclude machinery. `fba92be8f7` helps show this problem clearer because unpack-trees operation is supposed to work on whatever index the caller specifies... not specifically the_index. Update the code to use "istate" argument that's originally from mark_new_skip_worktree(). From the call sites, both in unpack_trees(), you can see that this function works on two separate indexes: o->src_index and o->result. The second mark_new_skip_worktree() so far has incorecctly applied exclude rules on o->src_index instead of o->result. It's unclear what is the consequences of this, but it's definitely wrong. [1] `fba92be8f7` (dir: convert is_excluded_from_list to take an index - 2017-05-05) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:43 -07:00
Nguyễn Thái Ngọc Duy	86016ec304	unpack-trees: don't shadow global var the_index This function mark_new_skip_worktree() has an argument named the_index which is also the name of a global variable. While they have different types (the global the_index is not a pointer) mistakes can easily happen and it's also confusing for readers. Rename the function argument to something other than the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:43 -07:00
Nguyễn Thái Ngọc Duy	383480ba4f	unpack-trees: add a note about path invalidation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:43 -07:00
Junio C Hamano	ae533c4a92	Merge branch 'jm/cache-entry-from-mem-pool' For a large tree, the index needs to hold many cache entries allocated on heap. These cache entries are now allocated out of a dedicated memory pool to amortize malloc(3) overhead. * jm/cache-entry-from-mem-pool: block alloc: add validations around cache_entry lifecyle block alloc: allocate cache entries from mem_pool mem-pool: fill out functionality mem-pool: add life cycle management functions mem-pool: only search head block for available space block alloc: add lifecycle APIs for cache_entry structs read-cache: teach make_cache_entry to take object_id read-cache: teach refresh_cache_entry to take istate	2018-08-02 15:30:43 -07:00
Junio C Hamano	284b444932	Merge branch 'mk/merge-in-sparse-checkout' "git reset --merge" (hence "git merge ---abort") and "git reset --hard" had trouble working correctly in a sparsely checked out working tree after a conflict, which has been corrected. * mk/merge-in-sparse-checkout: unpack-trees: do not fail reset because of unmerged skipped entry	2018-07-24 14:50:48 -07:00
Junio C Hamano	00624d608c	Merge branch 'sb/object-store-grafts' The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-07-18 12:20:28 -07:00
Max Kirillov	b33fdfc34c	unpack-trees: do not fail reset because of unmerged skipped entry After modify/delete merge conflict happens in a file skipped by sparse checkout, "git reset --merge", which implements the "--abort" actions, and "git reset --hard" fail with message "Entry * not uptodate. Cannot update sparse checkout." As explained in [1], the up-to-date checker mistakenly treats conflicted entry which does not exist in HEAD as still skipped by sparse checkout. Use the fix suggested in [1]. Also, add test case which verifies the issue is fixed. [1] https://public-inbox.org/git/20180616051444.GA29754@duynguyen.home/ Signed-off-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Max Kirillov <max@max630.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-11 09:35:41 -07:00
Jameson Miller	8e72d67529	block alloc: allocate cache entries from mem_pool When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:27 -07:00
Jameson Miller	a849735bfb	block alloc: add lifecycle APIs for cache_entry structs It has been observed that the time spent loading an index with a large number of entries is partly dominated by malloc() calls. This change is in preparation for using memory pools to reduce the number of malloc() calls made to allocate cahce entries when loading an index. Add an API to allocate and discard cache entries, abstracting the details of managing the memory backing the cache entries. This commit does actually change how memory is managed - this will be done in a later commit in the series. This change makes the distinction between cache entries that are associated with an index and cache entries that are not associated with an index. A main use of cache entries is with an index, and we can optimize the memory management around this. We still have other cases where a cache entry is not persisted with an index, and so we need to handle the "transient" use case as well. To keep the congnitive overhead of managing the cache entries, there will only be a single discard function. This means there must be enough information kept with the cache entry so that we know how to discard them. A summary of the main functions in the API is: make_cache_entry: create cache entry for use in an index. Uses specified parameters to populate cache_entry fields. make_empty_cache_entry: Create an empty cache entry for use in an index. Returns cache entry with empty fields. make_transient_cache_entry: create cache entry that is not used in an index. Uses specified parameters to populate cache_entry fields. make_empty_transient_cache_entry: create cache entry that is not used in an index. Returns cache entry with empty fields. discard_cache_entry: A single function that knows how to discard a cache entry regardless of how it was allocated. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:27 -07:00
Junio C Hamano	e47dbece39	Merge branch 'ma/unpack-trees-free-msgs' Leak plugging. * ma/unpack-trees-free-msgs: unpack_trees_options: free messages when done argv-array: return the pushed string from argv_push*() merge-recursive: provide pair of `unpack_trees_{start,finish}()` merge: setup `opts` later in `checkout_fast_forward()`	2018-05-30 21:51:29 +09:00
Junio C Hamano	42c8ce1c49	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (42 commits) merge-one-file: compute empty blob object ID add--interactive: compute the empty tree value Update shell scripts to compute empty tree object ID sha1_file: only expose empty object constants through git_hash_algo dir: use the_hash_algo for empty blob object ID sequencer: use the_hash_algo for empty tree object ID cache-tree: use is_empty_tree_oid sha1_file: convert cached object code to struct object_id builtin/reset: convert use of EMPTY_TREE_SHA1_BIN builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX wt-status: convert two uses of EMPTY_TREE_SHA1_HEX submodule: convert several uses of EMPTY_TREE_SHA1_HEX sequencer: convert one use of EMPTY_TREE_SHA1_HEX merge: convert empty tree constant to the_hash_algo builtin/merge: switch tree functions to use object_id builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo sha1-file: add functions for hex empty tree and blob OIDs builtin/receive-pack: avoid hard-coded constants for push certs diff: specify abbreviation size in terms of the_hash_algo upload-pack: replace use of several hard-coded constants ...	2018-05-30 14:04:10 +09:00
Junio C Hamano	50f08db594	Merge branch 'js/use-bug-macro' Developer support update, by using BUG() macro instead of die() to mark codepaths that should not happen more clearly. * js/use-bug-macro: BUG_exit_code: fix sparse "symbol not declared" warning Convert remaining die*(BUG) messages Replace all die("BUG: ...") calls by BUG() ones run-command: use BUG() to report bugs, not die() test-tool: help verifying BUG() code paths	2018-05-30 14:04:07 +09:00
Junio C Hamano	d658196f3c	Merge branch 'en/unpack-trees-split-index-fix' The split-index feature had a long-standing and dormant bug in certain use of the in-core merge machinery, which has been fixed. * en/unpack-trees-split-index-fix: unpack_trees: fix breakage when o->src_index != o->dst_index	2018-05-23 14:38:22 +09:00
Junio C Hamano	c67de747f4	Merge branch 'en/rename-directory-detection-reboot' Rename detection logic in "diff" family that is used in "merge" has learned to guess when all of x/a, x/b and x/c have moved to z/a, z/b and z/c, it is likely that x/d added in the meantime would also want to move to z/d by taking the hint that the entire directory 'x' moved to 'z'. A bug causing dirty files involved in a rename to be overwritten during merge has also been fixed as part of this work. Incidentally, this also avoids updating a file in the working tree after a (non-trivial) merge whose result matches what our side originally had. * en/rename-directory-detection-reboot: (36 commits) merge-recursive: fix check for skipability of working tree updates merge-recursive: make "Auto-merging" comment show for other merges merge-recursive: fix remainder of was_dirty() to use original index merge-recursive: fix was_tracked() to quit lying with some renamed paths t6046: testcases checking whether updates can be skipped in a merge merge-recursive: avoid triggering add_cacheinfo error with dirty mod merge-recursive: move more is_dirty handling to merge_content merge-recursive: improve add_cacheinfo error handling merge-recursive: avoid spurious rename/rename conflict from dir renames directory rename detection: new testcases showcasing a pair of bugs merge-recursive: fix remaining directory rename + dirty overwrite cases merge-recursive: fix overwriting dirty files involved in renames merge-recursive: avoid clobbering untracked files with directory renames merge-recursive: apply necessary modifications for directory renames merge-recursive: when comparing files, don't include trees merge-recursive: check for file level conflicts then get new name merge-recursive: add computation of collisions due to dir rename & merging merge-recursive: check for directory level conflicts merge-recursive: add get_directory_renames() merge-recursive: make a helper function for cleanup for handle_renames ...	2018-05-23 14:38:19 +09:00
Martin Ågren	1c41d2805e	unpack_trees_options: free messages when done The strings allocated in `setup_unpack_trees_porcelain()` are never freed. Provide a function `clear_unpack_trees_porcelain()` to do so and call it where we use `setup_unpack_trees_porcelain()`. The only non-trivial user is `unpack_trees_start()`, where we should place the new call in `unpack_trees_finish()`. We keep the string pointers in an array, mixing pointers to static memory and memory that we allocate on the heap. We also keep several copies of the individual pointers. So we need to make sure that we do not free what we must not free and that we do not double-free. Let a separate argv_array take ownership of all the strings we create so that we can easily free them. Zero the whole array of string pointers to make sure that we do not leave any dangling pointers. Note that we only take responsibility for the memory allocated in `setup_unpack_trees_porcelain()` and not any other members of the `struct unpack_trees_options`. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-22 11:59:31 +09:00
Stefan Beller	cbd53a2193	object-store: move object access functions to object-store.h This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-16 11:42:03 +09:00
Elijah Newren	64b1abe962	merge-recursive: fix overwriting dirty files involved in renames This fixes an issue that existed before my directory rename detection patches that affects both normal renames and renames implied by directory rename detection. Additional codepaths that only affect overwriting of dirty files that are involved in directory rename detection will be added in a subsequent commit. Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-08 16:11:00 +09:00
Junio C Hamano	0c7ecb7c31	Merge branch 'sb/submodule-move-nested' Moving a submodule that itself has submodule in it with "git mv" forgot to make necessary adjustment to the nested sub-submodules; now the codepath learned to recurse into the submodules. * sb/submodule-move-nested: submodule: fixup nested submodules after moving the submodule submodule-config: remove submodule_from_cache submodule-config: add repository argument to submodule_from_{name, path} submodule-config: allow submodule_free to handle arbitrary repositories grep: remove "repo" arg from non-supporting funcs submodule.h: drop declaration of connect_work_tree_and_git_dir	2018-05-08 15:59:17 +09:00
Johannes Schindelin	033abf97fc	Replace all die("BUG: ...") calls by BUG() ones In `d8193743e0` (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in `588a538ae5` (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-06 19:06:13 +09:00
brian m. carlson	75691ea345	Update struct index_state to use struct object_id Adjust struct index_state to use struct object_id instead of unsigned char [20]. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-02 13:59:50 +09:00
Elijah Newren	7db118303a	unpack_trees: fix breakage when o->src_index != o->dst_index Currently, all callers of unpack_trees() set o->src_index == o->dst_index. The code in unpack_trees() does not correctly handle them being different. There are two separate issues: First, there is the possibility of memory corruption. Since unpack_trees() creates a temporary index in o->result and then discards o->dst_index and overwrites it with o->result, in the special case that o->src_index == o->dst_index, it is safe to just reuse o->src_index's split_index for o->result. However, when src and dst are different, reusing o->src_index's split_index for o->result will cause the split_index to be shared. If either index then has entries replaced or removed, it will result in the other index referring to free()'d memory. Second, we can drop the index extensions. Previously, we were moving index extensions from o->dst_index to o->result. Since o->src_index is the one that will have the necessary extensions (o->dst_index is likely to be a new index temporary index created to store the results), we should be moving the index extensions from there. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-02 10:21:28 +09:00
Junio C Hamano	8b026edac3	Revert "Merge branch 'en/rename-directory-detection'" This reverts commit `e4bb62fa1e`, reversing changes made to `468165c1d8`. The topic appears to inflict severe regression in renaming merges, even though the promise of it was that it would improve them. We do not yet know which exact change in the topic was wrong, but in the meantime, let's play it safe and revert it out of 'master' before real Git-using projects are harmed. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 18:07:11 +09:00
Junio C Hamano	e4bb62fa1e	Merge branch 'en/rename-directory-detection' Rename detection logic in "diff" family that is used in "merge" has learned to guess when all of x/a, x/b and x/c have moved to z/a, z/b and z/c, it is likely that x/d added in the meantime would also want to move to z/d by taking the hint that the entire directory 'x' moved to 'z'. A bug causing dirty files involved in a rename to be overwritten during merge has also been fixed as part of this work. * en/rename-directory-detection: (29 commits) merge-recursive: ensure we write updates for directory-renamed file merge-recursive: avoid spurious rename/rename conflict from dir renames directory rename detection: new testcases showcasing a pair of bugs merge-recursive: fix remaining directory rename + dirty overwrite cases merge-recursive: fix overwriting dirty files involved in renames merge-recursive: avoid clobbering untracked files with directory renames merge-recursive: apply necessary modifications for directory renames merge-recursive: when comparing files, don't include trees merge-recursive: check for file level conflicts then get new name merge-recursive: add computation of collisions due to dir rename & merging merge-recursive: check for directory level conflicts merge-recursive: add get_directory_renames() merge-recursive: make a helper function for cleanup for handle_renames merge-recursive: split out code for determining diff_filepairs merge-recursive: make !o->detect_rename codepath more obvious merge-recursive: fix leaks of allocated renames and diff_filepairs merge-recursive: introduce new functions to handle rename logic merge-recursive: move the get_renames() function directory rename detection: tests for handling overwriting dirty files directory rename detection: tests for handling overwriting untracked files ...	2018-04-10 08:25:43 +09:00
Junio C Hamano	c2a499e6c3	Merge branch 'jh/partial-clone' Hotfix. * jh/partial-clone: upload-pack: disable object filtering when disabled by config unpack-trees: release oid_array after use in check_updates()	2018-03-29 15:39:59 -07:00
Stefan Beller	f793b895fd	submodule-config: allow submodule_free to handle arbitrary repositories At some point we may want to rename the function so that it describes what it actually does as 'submodule_free' doesn't quite describe that this clears a repository's submodule cache. But that's beyond the scope of this series. While at it remove the extern key word from its declaration. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-29 09:44:50 -07:00
René Scharfe	9f242a1336	unpack-trees: release oid_array after use in check_updates() Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-25 10:51:46 -07:00
Junio C Hamano	169c9c0169	Merge branch 'bw/c-plus-plus' Avoid using identifiers that clash with C++ keywords. Even though it is not a goal to compile Git with C++ compilers, changes like this help use of code analysis tools that targets C++ on our codebase. * bw/c-plus-plus: (37 commits) replace: rename 'new' variables trailer: rename 'template' variables tempfile: rename 'template' variables wrapper: rename 'template' variables environment: rename 'namespace' variables diff: rename 'template' variables environment: rename 'template' variables init-db: rename 'template' variables unpack-trees: rename 'new' variables trailer: rename 'new' variables submodule: rename 'new' variables split-index: rename 'new' variables remote: rename 'new' variables ref-filter: rename 'new' variables read-cache: rename 'new' variables line-log: rename 'new' variables imap-send: rename 'new' variables http: rename 'new' variables entry: rename 'new' variables diffcore-delta: rename 'new' variables ...	2018-03-06 14:54:07 -08:00
Elijah Newren	e0052f4613	merge-recursive: fix overwriting dirty files involved in renames This fixes an issue that existed before my directory rename detection patches that affects both normal renames and renames implied by directory rename detection. Additional codepaths that only affect overwriting of dirty files that are involved in directory rename detection will be added in a subsequent commit. Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-27 14:11:58 -08:00
Junio C Hamano	cf44c1e0a2	Merge branch 'nd/fix-untracked-cache-invalidation' Some bugs around "untracked cache" feature have been fixed. * nd/fix-untracked-cache-invalidation: dir.c: ignore paths containing .git when invalidating untracked cache dir.c: stop ignoring opendir() error in open_cached_dir() dir.c: fix missing dir invalidation in untracked code dir.c: avoid stat() in valid_cached_dir() status: add a failing test showing a core.untrackedCache bug	2018-02-27 10:33:50 -08:00
Brandon Williams	69caed593e	unpack-trees: rename 'new' variables Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-22 10:08:05 -08:00
Junio C Hamano	6bed209a20	Merge branch 'jh/partial-clone' The machinery to clone & fetch, which in turn involves packing and unpacking objects, have been told how to omit certain objects using the filtering mechanism introduced by the jh/object-filtering topic, and also mark the resulting pack as a promisor pack to tolerate missing objects, taking advantage of the mechanism introduced by the jh/fsck-promisors topic. * jh/partial-clone: t5616: test bulk prefetch after partial fetch fetch: inherit filter-spec from partial clone t5616: end-to-end tests for partial clone fetch-pack: restore save_commit_buffer after use unpack-trees: batch fetching of missing blobs clone: partial clone partial-clone: define partial clone settings in config fetch: support filters fetch: refactor calculation of remote list fetch-pack: test support excluding large blobs fetch-pack: add --no-filter fetch-pack, index-pack, transport: partial clone upload-pack: add object filtering for partial clone	2018-02-13 13:39:04 -08:00
Nguyễn Thái Ngọc Duy	0cacebf099	dir.c: ignore paths containing .git when invalidating untracked cache read_directory() code ignores all paths named ".git" even if it's not a valid git repository. See treat_path() for details. Since ".git" is basically invisible to read_directory(), when we are asked to invalidate a path that contains ".git", we can safely ignore it because the slow path would not consider it anyway. This helps when fsmonitor is used and we have a real ".git" repo at worktree top. Occasionally .git/index will be updated and if the fsmonitor hook does not filter it, untracked cache is asked to invalidate the path ".git/index". Without this patch, we invalidate the root directory unncessarily, which: - makes read_directory() fall back to slow path for root directory (slower) - makes the index dirty (because UNTR extension is updated). Depending on the index size, writing it down could also be slow. A note about the new "safe_path" knob. Since this new check could be relatively expensive, avoid it when we know it's not needed. If the path comes from the index, it can't contain ".git". If it does contain, we may be screwed up at many more levels, not just this one. Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-07 12:27:02 -08:00
Stefan Beller	ad17312e11	unpack-trees: oneway_merge to update submodules When there is a one way merge, each submodule needs to be one way merged as well, if we're asked to recurse into submodules. In case of a submodule, check if it is up-to-date, otherwise set the flag CE_UPDATE, which will trigger an update of it in the phase updating the tree later. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-05 12:35:35 -08:00
Jonathan Tan	c0c578b33c	unpack-trees: batch fetching of missing blobs When running checkout, first prefetch all blobs that are to be updated but are missing. This means that only one pack is downloaded during such operations, instead of one per missing blob. This operates only on the blob level - if a repository has a missing tree, they are still fetched one at a time. This does not use the delayed checkout mechanism introduced in commit `2841e8f` ("convert: add "status=delayed" to filter process protocol", 2017-06-30) due to significant conceptual differences - in particular, for partial clones, we already know what needs to be fetched based on the contents of the local repo alone, whereas for status=delayed, it is the filter process that tells us what needs to be checked in the end. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-12-08 09:58:51 -08:00
Junio C Hamano	e05336bdda	Merge branch 'bp/fsmonitor' We learned to talk to watchman to speed up "git status" and other operations that need to see which paths have been modified. * bp/fsmonitor: fsmonitor: preserve utf8 filenames in fsmonitor-watchman log fsmonitor: read entirety of watchman output fsmonitor: MINGW support for watchman integration fsmonitor: add a performance test fsmonitor: add a sample integration script for Watchman fsmonitor: add test cases for fsmonitor extension split-index: disable the fsmonitor extension when running the split index test fsmonitor: add a test tool to dump the index extension update-index: add fsmonitor support to update-index ls-files: Add support in ls-files to display the fsmonitor valid bit fsmonitor: add documentation for the fsmonitor extension. fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. update-index: add a new --force-write-index option preload-index: add override to enable testing preload-index bswap: add 64 bit endianness helper get_be64	2017-11-21 14:07:50 +09:00
brian m. carlson	a98e6101f0	refs: convert resolve_gitlink_ref to struct object_id Convert the declaration and definition of resolve_gitlink_ref to use struct object_id and apply the following semantic patch: @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3.hash) + resolve_gitlink_ref(E1, E2, &E3) @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3->hash) + resolve_gitlink_ref(E1, E2, E3) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
brian m. carlson	1053fe829c	Convert remaining callers of resolve_gitlink_ref to object_id Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
Ben Peart	883e248b8a	fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. When the index is read from disk, the fsmonitor index extension is used to flag the last known potentially dirty index entries. The registered core.fsmonitor command is called with the time the index was last updated and returns the list of files changed since that time. This list is used to flag any additional dirty cache entries and untracked cache directories. We can then use this valid state to speed up preload_index(), ie_match_stat(), and refresh_cache_ent() as they do not need to lstat() files to detect potential changes for those entries marked CE_FSMONITOR_VALID. In addition, if the untracked cache is turned on valid_cached_dir() can skip checking directories for new or changed files as fsmonitor will invalidate the cache only for those directories that have been identified as having potential changes. To keep the CE_FSMONITOR_VALID state accurate during git operations; when git updates a cache entry to match the current state on disk, it will now set the CE_FSMONITOR_VALID bit. Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit is cleared and the corresponding untracked cache directory is marked invalid. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-01 17:23:01 +09:00
Junio C Hamano	7fbbd3ec0f	Merge branch 'ls/convert-filter-progress' The codepath to call external process filter for smudge/clean operation learned to show the progress meter. * ls/convert-filter-progress: convert: display progress for filtered objects that have been delayed	2017-09-10 17:08:22 +09:00
Junio C Hamano	8e36002add	Merge branch 'ma/up-to-date' Message and doc updates. * ma/up-to-date: treewide: correct several "up-to-date" to "up to date" Documentation/user-manual: update outdated example output	2017-09-10 17:08:22 +09:00
Junio C Hamano	614ea03a71	Merge branch 'bw/submodule-config-cleanup' Code clean-up to avoid mixing values read from the .gitmodules file and values read from the .git/config file. * bw/submodule-config-cleanup: submodule: remove gitmodules_config unpack-trees: improve loading of .gitmodules submodule-config: lazy-load a repository's .gitmodules file submodule-config: move submodule-config functions to submodule-config.c submodule-config: remove support for overlaying repository config diff: stop allowing diff to have submodules configured in .git/config submodule: remove submodule_config callback routine unpack-trees: don't respect submodule.update submodule: don't rely on overlayed config when setting diffopts fetch: don't overlay config with submodule-config submodule--helper: don't overlay config in update-clone submodule--helper: don't overlay config in remote_submodule_branch add, reset: ensure submodules can be added or reset submodule: don't use submodule_from_name t7411: check configuration parsing errors	2017-08-26 22:55:08 -07:00
Lars Schneider	52f1d62eb4	convert: display progress for filtered objects that have been delayed In `2841e8f` ("convert: add "status=delayed" to filter process protocol", 2017-06-30) we taught the filter process protocol to delayed responses. These responses are processed after the "Checking out files" phase. If the processing takes noticeable time, then the user might think Git is stuck. Display the progress of the delayed responses to let the user know that Git is still processing objects. This works very well for objects that can be filtered quickly. If filtering of an individual object takes noticeable time, then the user might still think that Git is stuck. However, in that case the user would at least know what Git is doing. It would be technical more correct to display "Checking out files whose content filtering has been delayed". For brevity we only print "Filtering content". The finish_delayed_checkout() call was moved below the stop_progress() call in unpack-trees.c to ensure that the "Checking out files" progress is properly stopped before the "Filtering content" progress starts in finish_delayed_checkout(). Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Suggested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-24 12:41:20 -07:00
Junio C Hamano	d33a433236	Merge branch 'jc/simplify-progress' The API to start showing progress meter after a short delay has been simplified. * jc/simplify-progress: progress: simplify "delayed" progress API	2017-08-24 10:20:02 -07:00
Junio C Hamano	11bd95604a	Merge branch 'rs/object-id' Conversion from uchar[20] to struct object_id continues. * rs/object-id: tree-walk: convert fill_tree_descriptor() to object_id	2017-08-24 10:20:02 -07:00
Martin Ågren	7560f547e6	treewide: correct several "up-to-date" to "up to date" Follow the Oxford style, which says to use "up-to-date" before the noun, but "up to date" after it. Don't change plumbing (specifically send-pack.c, but transport.c (git push) also has the same string). This was produced by grepping for "up-to-date" and "up to date". It turned out we only had to edit in one direction, removing the hyphens. Fix a typo in Documentation/git-diff-index.txt while we're there. Reported-by: Jeffrey Manian <jeffrey.manian@gmail.com> Reported-by: STEVEN WHITE <stevencharleswhitevoices@gmail.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 12:17:22 -07:00
Junio C Hamano	5aa0b6c506	Merge branch 'bw/grep-recurse-submodules' "git grep --recurse-submodules" has been reworked to give a more consistent output across submodule boundary (and do its thing without having to fork a separate process). * bw/grep-recurse-submodules: grep: recurse in-process using 'struct repository' submodule: merge repo_read_gitmodules and gitmodules_config submodule: check for unmerged .gitmodules outside of config parsing submodule: check for unstaged .gitmodules outside of config parsing submodule: remove fetch.recursesubmodules from submodule-config parsing submodule: remove submodule.fetchjobs from submodule-config parsing config: add config_from_gitmodules cache.h: add GITMODULES_FILE macro repository: have the_repository use the_index repo_read_index: don't discard the index	2017-08-22 10:29:01 -07:00
Junio C Hamano	8aade107dd	progress: simplify "delayed" progress API We used to expose the full power of the delayed progress API to the callers, so that they can specify, not just the message to show and expected total amount of work that is used to compute the percentage of work performed so far, the percent-threshold parameter P and the delay-seconds parameter N. The progress meter starts to show at N seconds into the operation only if we have not yet completed P per-cent of the total work. Most callers used either (0%, 2s) or (50%, 1s) as (P, N), but there are oddballs that chose more random-looking values like 95%. For a smoother workload, (50%, 1s) would allow us to start showing the progress meter earlier than (0%, 2s), while keeping the chance of not showing progress meter for long running operation the same as the latter. For a task that would take 2s or more to complete, it is likely that less than half of it would complete within the first second, if the workload is smooth. But for a spiky workload whose earlier part is easier, such a setting is likely to fail to show the progress meter entirely and (0%, 2s) is more appropriate. But that is merely a theory. Realistically, it is of dubious value to ask each codepath to carefully consider smoothness of their workload and specify their own setting by passing two extra parameters. Let's simplify the API by dropping both parameters and have everybody use (0%, 2s). Oh, by the way, the percent-threshold parameter and the structure member were consistently misspelled, which also is now fixed ;-) Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-19 14:01:34 -07:00
René Scharfe	5c377d3d59	tree-walk: convert fill_tree_descriptor() to object_id All callers of fill_tree_descriptor() have been converted to object_id already, so convert that function as well. As a nice side-effect we get rid of NULL checks in tree-diff.c, as fill_tree_descriptor() already does them for us. Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-14 12:38:54 -07:00
Junio C Hamano	51b8aecabe	Merge branch 'ls/filter-process-delayed' The filter-process interface learned to allow a process with long latency give a "delayed" response. * ls/filter-process-delayed: convert: add "status=delayed" to filter process protocol convert: refactor capabilities negotiation convert: move multiple file filter error handling to separate function convert: put the flags field before the flag itself for consistent style t0021: write "OUT <size>" only on success t0021: make debug log file name configurable t0021: keep filter log files on comparison	2017-08-11 13:27:00 -07:00
Brandon Williams	3302871320	unpack-trees: improve loading of .gitmodules When recursing submodules 'check_updates()' needs to have strict control over the submodule-config subsystem to ensure that the gitmodules file has been read before checking cache entries which are marked for removal as well ensuring the proper gitmodules file is read before updating cache entries. Because of this let's not rely on callers of 'check_updates()' to read the gitmodules file before calling 'check_updates()' and handle the reading explicitly. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-03 13:11:02 -07:00
Brandon Williams	7463e2ec3e	unpack-trees: don't respect submodule.update The 'submodule.update' config was historically used and respected by the 'submodule update' command because update handled a variety of different ways it updated a submodule. As we begin teaching other commands about submodules it makes more sense for the different settings of 'submodule.update' to be handled by the individual commands themselves (checkout, rebase, merge, etc) so it shouldn't be respected by the native checkout command. Also remove the overlaying of the repository's config (via using 'submodule_config()') from the commands which use the unpack-trees logic (checkout, read-tree, reset). Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-03 13:11:01 -07:00
Brandon Williams	4c0eeafe47	cache.h: add GITMODULES_FILE macro Add a macro to be used when specifying the '.gitmodules' file and convert any existing hard coded '.gitmodules' file strings to use the new macro. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-02 14:26:46 -07:00
Junio C Hamano	487fe1ffcd	Merge branch 'ls/filter-process-delayed' into jt/subprocess-handshake * ls/filter-process-delayed: convert: add "status=delayed" to filter process protocol convert: refactor capabilities negotiation convert: move multiple file filter error handling to separate function convert: put the flags field before the flag itself for consistent style t0021: write "OUT <size>" only on success t0021: make debug log file name configurable t0021: keep filter log files on comparison	2017-07-26 12:56:19 -07:00
Lars Schneider	2841e8f81c	convert: add "status=delayed" to filter process protocol Some `clean` / `smudge` filters may require a significant amount of time to process a single blob (e.g. the Git LFS smudge filter might perform network requests). During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol, introduced in `edcc8581` ("convert: add filter.<driver>.process option", 2016-10-16), to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations for now. The optimization is most effective in these code paths as all files of the tree are processed. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-30 13:50:41 -07:00
Junio C Hamano	f31d23a399	Merge branch 'bw/config-h' Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h	2017-06-24 14:28:41 -07:00
Brandon Williams	b2141fc1d2	config: don't include config.h by default Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-15 12:56:22 -07:00
Junio C Hamano	fa0624f79f	Merge branch 'dt/unpack-save-untracked-cache-extension' When "git checkout", "git merge", etc. manipulates the in-core index, various pieces of information in the index extensions are discarded from the original state, as it is usually not the case that they are kept up-to-date and in-sync with the operation on the main index. The untracked cache extension is copied across these operations now, which would speed up "git status" (as long as the cache is properly invalidated). * dt/unpack-save-untracked-cache-extension: unpack-trees: preserve index extensions	2017-05-30 11:16:45 +09:00
Junio C Hamano	4eeed27e16	Merge branch 'bw/dir-c-stops-relying-on-the-index' API update. * bw/dir-c-stops-relying-on-the-index: dir: convert fill_directory to take an index dir: convert read_directory to take an index dir: convert read_directory_recursive to take an index dir: convert open_cached_dir to take an index dir: convert is_excluded to take an index dir: convert prep_exclude to take an index dir: convert add_excludes to take an index dir: convert is_excluded_from_list to take an index dir: convert last_exclude_matching_from_list to take an index dir: convert dir_add* to take an index dir: convert get_dtype to take index dir: convert directory_exists_in_index to take index dir: convert read_skip_worktree_file_from_index to take an index dir: stop using the index compatibility macros	2017-05-29 12:34:41 +09:00
Junio C Hamano	5f074ca7e8	Merge branch 'sb/reset-recurse-submodules' "git reset" learned "--recurse-submodules" option. * sb/reset-recurse-submodules: builtin/reset: add --recurse-submodules switch submodule.c: submodule_move_head works with broken submodules submodule.c: uninitialized submodules are ignored in recursive commands entry.c: submodule recursing: respect force flag correctly	2017-05-29 12:34:40 +09:00
David Turner	edf3b90553	unpack-trees: preserve index extensions Make git checkout (and other unpack_tree operations) preserve the untracked cache. This is valuable for two reasons: 1. Often, an unpack_tree operation will not touch large parts of the working tree, and thus most of the untracked cache will continue to be valid. 2. Even if the untracked cache were entirely invalidated by such an operation, the user has signaled their intention to have such a cache, and we don't want to throw it away. [jes: backed out the watchman-specific parts] Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-20 18:26:45 +09:00
Brandon Williams	2c1eb10454	dir: convert read_directory to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00
Brandon Williams	a0bba65b10	dir: convert is_excluded to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00
Brandon Williams	473e39307d	dir: convert add_excludes to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00
Brandon Williams	fba92be8f7	dir: convert is_excluded_from_list to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00
Junio C Hamano	8868ba1962	Merge branch 'jh/unpack-trees-micro-optim' In a 2- and 3-way merge of trees, more than one source trees often end up sharing an identical subtree; optimize by not reading the same tree multiple times in such a case. * jh/unpack-trees-micro-optim: unpack-trees: avoid duplicate ODB lookups during checkout	2017-04-23 22:07:48 -07:00
Stefan Beller	cd279e2e1b	entry.c: submodule recursing: respect force flag correctly In case of a non-forced worktree update, the submodule movement is tested in a dry run first, such that it doesn't matter if the actual update is done via the force flag. However for correctness, we want to give the flag as specified by the user. All callers have been inspected and updated if needed. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-18 21:18:29 -07:00
Jeff Hostetler	d12a8cf0af	unpack-trees: avoid duplicate ODB lookups during checkout Teach traverse_trees_recursive() to not do redundant ODB lookups when both directories refer to the same OID. In operations such as read-tree and checkout, there will likely be many peer directories that have the same OID when the differences between the commits are relatively small. In these cases we can avoid hitting the ODB multiple times for the same OID. This patch handles n=2 and n=3 cases and simply copies the data rather than repeating the fill_tree_descriptor(). ================ On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference. (avg) before: 22.699 (avg) after: 21.955 =============== ================ On Linux using p0006-read-tree-checkout.sh with linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------------------------------------- 0006.2: read-tree br_base br_ballast (57994) 0.24(0.20+0.03) 0.24(0.22+0.01) +0.0% 0006.3: switch between br_base br_ballast (57994) 10.58(6.23+2.86) 10.67(5.94+2.87) +0.9% 0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.60(0.44+0.17) 0.57(0.44+0.14) -5.0% 0006.5: switch between aliases (57994) 0.59(0.48+0.13) 0.57(0.44+0.15) -3.4% ================ Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-15 02:50:44 -07:00
Stefan Beller	6362ed0b29	unpack-trees.c: align submodule error message to the other error messages As the place holder in the error message is for multiple submodules, we don't want to encapsulate the string place holder in single quotes. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-29 15:47:36 -07:00
Stefan Beller	a7bc845a9a	unpack-trees: check if we can perform the operation for submodules In a later patch we'll support submodule entries to be in sync with the tree in working tree changing commands, such as checkout or read-tree. When a new submodule entry changes in the tree, we need to check if there are conflicts (directory/file conflicts) for the tree. Add this check for submodules to be performed before the working tree is touched. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-16 14:07:16 -07:00
Stefan Beller	d6b1230067	unpack-trees: pass old oid to verify_clean_submodule The check (which uses the old oid) is yet to be implemented, but this part is just a refactor, so it can go separately first. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-16 14:07:16 -07:00
Junio C Hamano	2243d229f7	Merge branch 'sb/unpack-trees-super-prefix' "git read-tree" and its underlying unpack_trees() machinery learned to report problematic paths prefixed with the --super-prefix option. * sb/unpack-trees-super-prefix: unpack-trees: support super-prefix option t1001: modernize style t1000: modernize style read-tree: use OPT_BOOL instead of OPT_SET_INT	2017-02-03 11:25:18 -08:00
Stefan Beller	3d415425c7	unpack-trees: support super-prefix option In the future we want to support working tree operations within submodules, e.g. "git checkout --recurse-submodules", which will update the submodule to the commit as recorded in its superproject. In the submodule the unpack-tree operation is carried out as usual, but the reporting to the user needs to prefix any path with the superproject. The mechanism for this is the super-prefix. (see `74866d757`, git: make super-prefix option) Add support for the super-prefix option for commands that unpack trees by wrapping any path output in unpacking trees in the newly introduced super_prefixed function. This new function prefixes any path with the super-prefix if there is one. Assuming the submodule case doesn't happen in the majority of the cases, we'd want to have a fast behavior for no super prefix, i.e. no reallocation/copying, but just returning path. Another aspect of introducing the `super_prefixed` function is to consider who owns the memory and if this is the right place where the path gets modified. As the super prefix ought to change the output behavior only and not the actual unpack tree part, it is fine to be that late in the line. As we get passed in 'const char *path', we cannot change the path itself, which means in case of a super prefix we have to copy over the path. We need two static buffers in that function as the error messages contain at most two paths. For testing purposes enable it in read-tree, which has no output of paths other than an unpack-trees.c. These are all converted in this patch. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-25 12:33:33 -08:00
Junio C Hamano	0c0e0fd0ca	Merge branch 'sb/unpack-trees-cleanup' Code cleanup. * sb/unpack-trees-cleanup: unpack-trees: factor progress setup out of check_updates unpack-trees: remove unneeded continue unpack-trees: move checkout state into check_updates	2017-01-18 15:12:17 -08:00
Stefan Beller	384f1a167b	unpack-trees: factor progress setup out of check_updates This makes check_updates shorter and easier to understand. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-10 11:53:33 -08:00
Stefan Beller	c4bfc7728b	unpack-trees: remove unneeded continue The continue is the last statement in the loop, so not needed. This situation arose in `700e66d66` (2010-07-30, unpack-trees: let read-tree -u remove index entries outside sparse area) when statements after the continue were removed. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-10 11:51:19 -08:00
Stefan Beller	30ac275b1c	unpack-trees: move checkout state into check_updates The checkout state was introduced via `16da134b1f` (read-trees: refactor the unpack_trees() part, 2006-07-30). An attempt to refactor the checkout state was done in `b56aa5b268` (unpack-trees: pass checkout state explicitly to check_updates(), 2016-09-13), but we can go even further. The `struct checkout state` is not used in unpack_trees apart from initializing it, so move it into the function that makes use of it, which is `check_updates`. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-10 11:51:14 -08:00
Junio C Hamano	0f30315ba1	Merge branch 'sb/unpack-trees-grammofix' * sb/unpack-trees-grammofix: unpack-trees: fix grammar for untracked files in directories	2016-12-19 14:45:31 -08:00
Stefan Beller	584f99c87b	unpack-trees: fix grammar for untracked files in directories Noticed-by: David Turner <dturner@twosigma.com> Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-12-05 12:17:02 -08:00
René Scharfe	68e3d6292f	introduce CHECKOUT_INIT Add a static initializer for struct checkout and use it throughout the code base. It's shorter, avoids a memset(3) call and makes sure the base_dir member is initialized to a valid (empty) string. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-22 13:42:18 -07:00
Junio C Hamano	e9c6d3dcc2	Merge branch 'rs/unpack-trees-reduce-file-scope-global' Code cleanup. * rs/unpack-trees-reduce-file-scope-global: unpack-trees: pass checkout state explicitly to check_updates()	2016-09-21 15:15:26 -07:00
Junio C Hamano	4af9a7d344	Merge branch 'bc/object-id' The "unsigned char sha1[20]" to "struct object_id" conversion continues. Notable changes in this round includes that ce->sha1, i.e. the object name recorded in the cache_entry, turns into an object_id. It had merge conflicts with a few topics in flight (Christian's "apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's "status --porcelain-v2"). Extra sets of eyes double-checking for mismerges are highly appreciated. * bc/object-id: builtin/reset: convert to use struct object_id builtin/commit-tree: convert to struct object_id builtin/am: convert to struct object_id refs: add an update_ref_oid function. sha1_name: convert get_sha1_mb to struct object_id builtin/update-index: convert file to struct object_id notes: convert init_notes to use struct object_id builtin/rm: convert to use struct object_id builtin/blame: convert file to use struct object_id Convert read_mmblob to take struct object_id. notes-merge: convert struct notes_merge_pair to struct object_id builtin/checkout: convert some static functions to struct object_id streaming: make stream_blob_to_fd take struct object_id builtin: convert textconv_object to use struct object_id builtin/cat-file: convert some static functions to struct object_id builtin/cat-file: convert struct expand_data to use struct object_id builtin/log: convert some static functions to use struct object_id builtin/blame: convert struct origin to use struct object_id builtin/apply: convert static functions to struct object_id cache: convert struct cache_entry to use struct object_id	2016-09-19 13:47:19 -07:00
René Scharfe	b56aa5b268	unpack-trees: pass checkout state explicitly to check_updates() Add a parameter for the struct checkout variable to check_updates() instead of using a static global variable. Passing it explicitly makes object ownership and usage more easily apparent. And we get rid of a static variable; those can be problematic in library-like code. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-13 16:26:12 -07:00
Alex Henrie	a1c8044662	unpack-trees: do not capitalize "working" In English, only proper nouns are capitalized. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-08 12:17:23 -07:00
brian m. carlson	99d1a9861a	cache: convert struct cache_entry to use struct object_id Convert struct cache_entry to use struct object_id by applying the following semantic patch and the object_id transforms from contrib, plus the actual change to the struct: @@ struct cache_entry E1; @@ - E1.sha1 + E1.oid.hash @@ struct cache_entry *E1; @@ - E1->sha1 + E1->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-07 12:59:42 -07:00
Alex Henrie	c2691e2add	unpack-trees: fix English grammar in do-this-before-that messages Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-27 08:29:36 -07:00
Junio C Hamano	40cfc95856	Merge branch 'nd/error-errno' The code for warning_errno/die_errno has been refactored and a new error_errno() reporting helper is introduced. * nd/error-errno: (41 commits) wrapper.c: use warning_errno() vcs-svn: use error_errno() upload-pack.c: use error_errno() unpack-trees.c: use error_errno() transport-helper.c: use error_errno() sha1_file.c: use {error,die,warning}_errno() server-info.c: use error_errno() sequencer.c: use error_errno() run-command.c: use error_errno() rerere.c: use error_errno() and warning_errno() reachable.c: use error_errno() mailmap.c: use error_errno() ident.c: use warning_errno() http.c: use error_errno() and warning_errno() grep.c: use error_errno() gpg-interface.c: use error_errno() fast-import.c: use error_errno() entry.c: use error_errno() editor.c: use error_errno() diff-no-index.c: use error_errno() ...	2016-05-17 14:38:28 -07:00
Junio C Hamano	e5e7a9115d	Merge branch 'va/i18n-misc-updates' Mark several messages for translation. * va/i18n-misc-updates: i18n: unpack-trees: avoid substituting only a verb in sentences i18n: builtin/pull.c: split strings marked for translation i18n: builtin/pull.c: mark placeholders for translation i18n: git-parse-remote.sh: mark strings for translation i18n: branch: move comment for translators i18n: branch: unmark string for translation i18n: builtin/rm.c: remove a comma ',' from string i18n: unpack-trees: mark strings for translation i18n: builtin/branch.c: mark option for translation i18n: index-pack: use plural string instead of normal one	2016-05-17 14:38:23 -07:00
Vasco Almeida	2e3926b948	i18n: unpack-trees: avoid substituting only a verb in sentences Instead of reusing the same set of message templates for checkout and other actions and substituting the verb with "%s", prepare separate message templates for each known action. That would make it easier for translation into languages where the same verb may conjugate differently depending on the message we are giving. See gettext documentation for details: http://www.gnu.org/software/gettext/manual/html_node/Preparing-Strings.html Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-12 16:28:43 -07:00
Nguyễn Thái Ngọc Duy	43c728e2c2	unpack-trees.c: use error_errno() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-09 12:29:08 -07:00
brian m. carlson	7d924c9139	struct name_entry: use struct object_id instead of unsigned char sha1[20] Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-25 14:23:42 -07:00
Vasco Almeida	ed47fdf7fa	i18n: unpack-trees: mark strings for translation Mark strings seen by the user inside setup_unpack_trees_porcelain() and display_error_msgs() functions for translation. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-12 10:29:16 -07:00
David Turner	a6720955f1	unpack-trees: fix accidentally quadratic behavior While unpacking trees (e.g. during git checkout), when we hit a cache entry that's past and outside our path, we cut off iteration. This provides about a 45% speedup on git checkout between master and master^20000 on Twitter's monorepo. Speedup in general will depend on repostitory structure, number of changes, and packfile packing decisions. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-22 13:03:10 -08:00
David Turner	d9c2bd560e	do_compare_entry: use already-computed path In traverse_trees, we generate the complete traverse path for a traverse_info. Later, in do_compare_entry, we used to go do a bunch of work to compare the traverse_info to a cache_entry's name without computing that path. But since we already have that path, we don't need to do all that work. Instead, we can just put the generated path into the traverse_info, and do the comparison more directly. We copy the path because prune_traversal might mutate `base`. This doesn't happen in any codepaths where do_compare_entry is called, but it's better to be safe. This makes git checkout much faster -- about 25% on Twitter's monorepo. Deeper directory trees are likely to benefit more than shallower ones. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-05 13:39:46 -08:00
Jeff King	75faa45ae0	replace trivial malloc + sprintf / strcpy calls with xstrfmt It's a common pattern to do: foo = xmalloc(strlen(one) + strlen(two) + 1 + 1); sprintf(foo, "%s %s", one, two); (or possibly some variant with strcpy()s or a more complicated length computation). We can switch these to use xstrfmt, which is shorter, involves less error-prone manual computation, and removes many sprintf and strcpy calls which make it harder to audit the code for real buffer overflows. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Junio C Hamano	f0bc854623	Sync with 2.5.2	2015-09-09 14:30:35 -07:00
Junio C Hamano	3d3caf0b78	Sync with 2.4.9	2015-09-04 10:43:23 -07:00
Junio C Hamano	8267cd11d6	Sync with 2.2.3	2015-09-04 10:29:28 -07:00
Jeff King	f514ef9787	verify_absent: allow filenames longer than PATH_MAX When unpack-trees wants to know whether a path will overwrite anything in the working tree, we use lstat() to see if there is anything there. But if we are going to write "foo/bar", we can't just lstat("foo/bar"); we need to look for leading prefixes (e.g., "foo"). So we use the lstat cache to find the length of the leading prefix, and copy the filename up to that length into a temporary buffer (since the original name is const, we cannot just stick a NUL in it). The copy we make goes into a PATH_MAX-sized buffer, which will overflow if the prefix is longer than PATH_MAX. How this happens is a little tricky, since in theory PATH_MAX is the biggest path we will have read from the filesystem. But this can happen if: - the compiled-in PATH_MAX does not accurately reflect what the filesystem is capable of - the leading prefix is not _quite_ what is on disk; it contains the next element from the name we are checking. So if we want to write "aaa/bbb/ccc/ddd" and "aaa/bbb" exists, the prefix of interest is "aaa/bbb/ccc". If "aaa/bbb" approaches PATH_MAX, then "ccc" can overflow it. So this can be triggered, but it's hard to do. In particular, you cannot just "git clone" a bogus repo. The verify_absent checks happen before unpack-trees writes anything to the filesystem, so there are never any leading prefixes during the initial checkout, and the bug doesn't trigger. And by definition, these files are larger than PATH_MAX, so writing them will fail, and clone will complain (though it may write a partial path, which will cause a subsequent "git checkout" to hit the bug). We can fix it by creating the temporary path on the heap. The extra malloc overhead is not important, as we are already making at least one stat() call (and probably more for the prefix discovery). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-04 08:50:50 -07:00
Junio C Hamano	8c9155e031	Merge branch 'jk/git-path' git_path() and mkpath() are handy helper functions but it is easy to misuse, as the callers need to be careful to keep the number of active results below 4. Their uses have been reduced. * jk/git-path: memoize common git-path "constant" files get_repo_path: refactor path-allocation find_hook: keep our own static buffer refs.c: remove_empty_directories can take a strbuf refs.c: avoid git_path assignment in lock_ref_sha1_basic refs.c: avoid repeated git_path calls in rename_tmp_log refs.c: simplify strbufs in reflog setup and writing path.c: drop git_path_submodule refs.c: remove extra git_path calls from read_loose_refs remote.c: drop extraneous local variable from migrate_file prefer mkpathdup to mkpath in assignments prefer git_pathdup to git_path in some possibly-dangerous cases add_to_alternates_file: don't add duplicate entries t5700: modernize style cache.h: complete set of git_path_submodule helpers cache.h: clarify documentation for git_path, et al	2015-08-19 14:48:56 -07:00
Junio C Hamano	4f66e44300	Merge branch 'as/sparse-checkout-removal' into maint "sparse checkout" misbehaved for a path that is excluded from the checkout when switching between branches that differ at the path. * as/sparse-checkout-removal: unpack-trees: don't update files with CE_WT_REMOVE set	2015-08-19 14:41:27 -07:00
Junio C Hamano	9ad8474b98	Merge branch 'dt/unpack-trees-cache-tree-revalidate' The code to perform multi-tree merges has been taught to repopulate the cache-tree upon a successful merge into the index, so that subsequent "diff-index --cached" (hence "status") and "write-tree" (hence "commit") will go faster. The same logic in "git checkout" may now be removed, but that is a separate issue. * dt/unpack-trees-cache-tree-revalidate: unpack-trees: populate cache-tree on successful merge	2015-08-12 14:09:57 -07:00
Jeff King	fcd12db6af	prefer git_pathdup to git_path in some possibly-dangerous cases Because git_path uses a static buffer that is shared with calls to git_path, mkpath, etc, it can be dangerous to assign the result to a variable or pass it to a non-trivial function. The value may change unexpectedly due to other calls. None of the cases changed here has a known bug, but they're worth converting away from git_path because: 1. It's easy to use git_pathdup in these cases. 2. They use constructs (like assignment) that make it hard to tell whether they're safe or not. The extra malloc overhead should be trivial, as an allocation should be an order of magnitude cheaper than a system call (which we are clearly about to make, since we are constructing a filename). The real cost is that we must remember to free the result. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-10 15:37:12 -07:00
Junio C Hamano	8348bf1b69	Merge branch 'as/sparse-checkout-removal' "sparse checkout" misbehaved for a path that is excluded from the checkout when switching between branches that differ at the path. * as/sparse-checkout-removal: unpack-trees: don't update files with CE_WT_REMOVE set	2015-08-03 11:01:28 -07:00
Brian Degenhardt	52fca2184d	unpack-trees: populate cache-tree on successful merge When we unpack trees into an existing index, we discard the old index and replace it with the new, merged index. Ensure that this index has its cache-tree populated. This will make subsequent git status and commit commands faster. Signed-off-by: Brian Degenhardt <bmd@bmdhacks.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-07-28 13:43:13 -07:00
David Turner	7d782416cb	unpack-trees: don't update files with CE_WT_REMOVE set Don't update files in the worktree from cache entries which are flagged with CE_WT_REMOVE. When a user does a sparse checkout, git removes files that are marked with CE_WT_REMOVE (because they are out-of-scope for the sparse checkout). If those files are also marked CE_UPDATE (for instance, because they differ in the branch that is being checked out and the outgoing branch), git would previously recreate them. This patch prevents them from being recreated. These erroneously-created files would also interfere with merges, causing pre-merge revisions of out-of-scope files to appear in the worktree. apply_sparse_checkout() is the function where all "action" manipulation (add, delete, update files..) for sparse checkout occurs; it should not ask to delete and update both at the same time. Signed-off-by: Anatole Shaw <git-devel@omni.poc.net> Signed-off-by: David Turner <dturner@twopensource.com> Helped-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-07-21 13:19:20 -07:00
Nguyễn Thái Ngọc Duy	e931371a8f	untracked cache: invalidate at index addition or removal Ideally we should implement untracked_cache_remove_from_index() and untracked_cache_add_to_index() so that they update untracked cache right away instead of invalidating it and wait for read_directory() next time to deal with it. But that may need some more work in unpack-trees.c. So stay simple as the first step. The new call in add_index_entry_with_check() may look strange because new calls usually stay close to cache_tree_invalidate_path(). We do it a bit later than c_t_i_p() in this function because if it's about replacing the entry with the same name, we don't care (but cache-tree does). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-03-12 13:45:16 -07:00
Junio C Hamano	3f1509809e	Sync with v2.2.1 * maint: Git 2.2.1 Git 2.1.4 Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-18 12:30:53 -08:00
Junio C Hamano	58f1d950e3	Sync with v2.0.5 * maint-2.0: Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:42:28 -08:00
Junio C Hamano	5e519fb8b0	Sync with v1.9.5 * maint-1.9: Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:28:54 -08:00
Junio C Hamano	6898b79721	Sync with v1.8.5.6 * maint-1.8.5: Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:20:31 -08:00
Junio C Hamano	2aa9100846	Merge branch 'dotgit-case-maint-1.8.5' into maint-1.8.5 * dotgit-case-maint-1.8.5: fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:11:15 -08:00

... 2 3 4 5 6 ...

588 commits