development/git - git

mirror of https://github.com/git/git synced 2024-06-30 22:54:27 +00:00

Author	SHA1	Message	Date
Junio C Hamano	d11b0c75ec	Merge branch 'th/quiet-lazy-fetch-from-promisor' The promisor.quiet configuration knob can be set to true to make lazy fetching from promisor remotes silent. * th/quiet-lazy-fetch-from-promisor: promisor-remote: add promisor.quiet configuration option	2024-06-06 12:49:24 -07:00
Junio C Hamano	cf792653ad	Merge branch 'ps/leakfixes' Leakfixes. * ps/leakfixes: builtin/mv: fix leaks for submodule gitfile paths builtin/mv: refactor to use `struct strvec` builtin/mv duplicate string list memory builtin/mv: refactor `add_slash()` to always return allocated strings strvec: add functions to replace and remove strings submodule: fix leaking memory for submodule entries commit-reach: fix memory leak in `ahead_behind()` builtin/credential: clear credential before exit config: plug various memory leaks config: clarify memory ownership in `git_config_string()` builtin/log: stop using globals for format config builtin/log: stop using globals for log config convert: refactor code to clarify ownership of check_roundtrip_encoding diff: refactor code to clarify memory ownership of prefixes config: clarify memory ownership in `git_config_pathname()` http: refactor code to clarify memory ownership checkout: clarify memory ownership in `unique_tracking_name()` strbuf: fix leak when `appendwholeline()` fails with EOF transport-helper: fix leaking helper name	2024-06-06 12:49:23 -07:00
Aaron Plattner	27db485c34	credential: clear expired c->credential, unify secret clearing When a struct credential expires, credential_fill() clears c->password so that clients don't try to use it later. However, a struct cred that uses an alternate authtype won't have a password, but might have a credential stored in c->credential. This is a problem, for example, when an OAuth2 bearer token is used. In the system I'm using, the OAuth2 configuration generates and caches a bearer token that is valid for an hour. After the token expires, git needs to call back into the credential helper to use a stored refresh token to get a new bearer token. But if c->credential is still non-NULL, git will instead try to use the expired token and fail with an error: fatal: Authentication failed for 'https://<oauth2-enabled-server>/repository' And on the server: [auth_openidc:error] [client <ip>:34012] oidc_proto_validate_exp: "exp" validation failure (1717522989): JWT expired 224 seconds ago Fix this by clearing both c->password and c->credential for an expired struct credential. While we're at it, use credential_clear_secrets() wherever both c->password and c->credential are being cleared. Update comments in credential.h to mention the new struct fields. Signed-off-by: Aaron Plattner <aplattner@nvidia.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 11:42:40 -07:00
Jeff King	62c71ace44	test-terminal: drop stdin handling Since `18d8c26930` (test_terminal: redirect child process' stdin to a pty, 2015-08-04), we set up a pty and copy stdin to the child program. But this ends up being racy; once we send all of the bytes and close the descriptor, the child program will no longer see a terminal! isatty() will return 0, and trying to read may return EIO, even if we didn't yet get all of the bytes. This was mentioned even in the commit message of `18d8c26930`, but we hacked around it by just sending an infinite input from /dev/zero (in the intended case, we only cared about isatty(0), not reading actual input). And it came up again recently in: https://lore.kernel.org/git/d42a55b1-1ba9-4cfb-9c3d-98ea4d86da33@gmail.com/ where we tried to actually send bytes, but they don't always all come through. So this interface is somewhat of an accident waiting to happen; a caller might not even care about stdin being a tty, but will get bit by the flaky behavior. One solution would probably be to avoid closing test_terminal's end of the pty altogether. But then the other side would never see EOF on its stdin. That may be OK for some cases, but it's another gotcha that might cause races or deadlocks, depending on what the child expects to read. Let's instead just drop test_terminal's stdin feature completely. Since the previous commit dropped the two cases from t4153 for which the feature was originally added, there are no callers left that need it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 10:07:41 -07:00
Jeff King	53ce2e3f0a	am: add explicit "--retry" option After a patch fails, you can ask "git am" to try applying it again with new options by running without any of the resume options. E.g.: git am <patch # oops, it failed; let's try again git am --3way But since this second command has no explicit resume option (like "--continue"), it looks just like an invocation to read a fresh patch from stdin. To avoid confusing the two cases, there are some heuristics, courtesy of `8d18550318` (builtin-am: reject patches when there's a session in progress, 2015-08-04): if (in_progress) { /* * Catch user error to feed us patches when there is a session * in progress: * * 1. mbox path(s) are provided on the command-line. * 2. stdin is not a tty: the user is trying to feed us a patch * from standard input. This is somewhat unreliable -- stdin * could be /dev/null for example and the caller did not * intend to feed us a patch but wanted to continue * unattended. */ if (argc \|\| (resume_mode == RESUME_FALSE && !isatty(0))) die(_("previous rebase directory %s still exists but mbox given."), state.dir); if (resume_mode == RESUME_FALSE) resume_mode = RESUME_APPLY; [...] So if no resume command is given, then we require that stdin be a tty, and otherwise complain about (potentially) receiving an mbox on stdin. But of course you might not actually have a terminal available! And sadly there is no explicit way to hit this same code path; this is the only place that sets RESUME_APPLY. So you're stuck, and scripts like our test suite have to bend over backwards to create a pseudo-tty. Let's provide an explicit option to trigger this mode. The code turns out to be quite simple; just setting "resume_mode" to RESUME_FALSE is enough to dodge the tty check, and then our state is the same as it would be with the heuristic case (which we'll continue to allow). When we don't have a session in progress, there's already code to complain when resume_mode is set (but we'll add a new test to cover that). To test the new option, we'll convert the existing tests that rely on the fake stdin tty. That lets us test them on more platforms, and will let us simplify test_terminal a bit in a future patch. It does, however, mean we're not testing the tty heuristic at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 10:07:41 -07:00
Patrick Steinhardt	df651330ab	ci: fix check for Ubuntu 20.04 In `5ca0c455f1` (ci: fix Python dependency on Ubuntu 24.04, 2024-05-06), we made the use of Python 2 conditional on whether or not the CI job runs Ubuntu 20.04. There was a brown-paper-bag-style bug though, where the condition forgot to invoke the `test` builtin. The result of it is that the check always fails, and thus all of our jobs run with Python 3 by accident. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:59:27 -07:00
Patrick Steinhardt	25a0023f28	builtin/refs: new command to migrate ref storage formats Introduce a new command that allows the user to migrate a repository between ref storage formats. This new command is implemented as part of a new git-refs(1) executable. This is due to two reasons: - There is no good place to put the migration logic in existing commands. git-maintenance(1) felt unwieldy, and git-pack-refs(1) is not the correct place to put it, either. - I had it in my mind to create a new low-level command for accessing refs for quite a while already. git-refs(1) is that command and can over time grow more functionality relating to refs. This should help discoverability by consolidating low-level access to refs into a single executable. As mentioned in the preceding commit that introduces the ref storage format migration logic, the new `git refs migrate` command still has a bunch of restrictions. These restrictions are documented accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:34 -07:00
Patrick Steinhardt	6d6a3a99c7	refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:33 -07:00
Patrick Steinhardt	64a6dd8ffc	refs: implement removal of ref storages We're about to introduce logic to migrate ref storages. One part of the migration will be to delete the files that are part of the old ref storage format. We don't yet have a way to delete such data generically across ref backends though. Implement a new `delete` callback and expose it via a new `ref_storage_delete()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:33 -07:00
Patrick Steinhardt	1339cb3c47	worktree: don't store main worktree twice In `get_worktree_ref_store()` we either return the repository's main ref store, or we look up the ref store via the map of worktree ref stores. Which of these worktrees gets picked depends on the `is_current` bit of the worktree, which indicates whether the worktree is the one that corresponds to `the_repository`. The bit is getting set in `get_worktrees()`, but only after we have computed the list of all worktrees. This is too late though, because at that time we have already called `get_worktree_ref_store()` on each of the worktrees via `add_head_info()`. The consequence is that the current worktree will not have been marked accordingly, which means that we did not use the main ref store, but instead created a new ref store. We thus have two separate ref stores now that map to the same ref database. Fix this by setting `is_current` before we call `add_head_info()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:33 -07:00
Patrick Steinhardt	b5d7db9e83	reftable: inline `merged_table_release()` The function `merged_table_release()` releases a merged table, whereas `reftable_merged_table_free()` releases a merged table and then also free's its pointer. But all callsites of `merged_table_release()` are in fact followed by `reftable_merged_table_free()`, which is redundant. Inline `merged_table_release()` into `reftable_merged_table_free()` to get rid of this redundance. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:32 -07:00
Patrick Steinhardt	b3e098d6e7	refs/files: fix NULL pointer deref when releasing ref store The `free_ref_cache()` function is not `NULL` safe and will thus segfault when being passed such a pointer. This can easily happen when trying to release a partially initialized "files" ref store. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:32 -07:00
Patrick Steinhardt	120b67172f	refs/files: extract function to iterate through root refs Extract a new function that can be used to iterate through all root refs known to the "files" backend. This will be used in the next commit, where we start to teach ref backends to remove themselves. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:32 -07:00
Patrick Steinhardt	66275a6311	refs/files: refactor `add_pseudoref_and_head_entries()` The `add_pseudoref_and_head_entries()` function accepts both the ref store as well as a directory name as input. This is unnecessary though as the ref store already uniquely identifies the root directory of the ref store anyway. Furthermore, the function is misnamed now that we have clarified the meaning of pseudorefs as it doesn't add pseudorefs, but root refs. Rename it accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:32 -07:00
Patrick Steinhardt	fbd1a693c7	refs: allow to skip creation of reflog entries The ref backends do not have any way to disable the creation of reflog entries. This will be required for upcoming ref format migration logic so that we do not create any entries that didn't exist in the original ref database. Provide a new `REF_SKIP_CREATE_REFLOG` flag that allows the caller to disable reflog entry creation. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:31 -07:00
Patrick Steinhardt	6e1683ace9	refs: pass storage format to `ref_store_init()` explicitly We're about to introduce logic to migrate refs from one storage format to another one. This will require us to initialize a ref store with a different format than the one used by the passed-in repository. Prepare for this by accepting the desired ref storage format as parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:31 -07:00
Patrick Steinhardt	318efb966b	refs: convert ref storage format to an enum The ref storage format is tracked as a simple unsigned integer, which makes it harder than necessary to discover what that integer actually is or where its values are defined. Convert the ref storage format to instead be an enum. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:31 -07:00
Patrick Steinhardt	a83f7f51e1	setup: unset ref storage when reinitializing repository version When reinitializing a repository's version we may end up unsetting the hash algorithm when it matches the default hash algorithm. If we didn't do that then the previously configured value might remain intact. While the same issue exists for the ref storage extension, we don't do this here. This has been fine for most of the part because it is not supported to re-initialize a repository with a different ref storage format anyway. We're about to introduce a new command to migrate ref storages though, so this is about to become an issue there. Prepare for this and unset the ref storage format when reinitializing a repository with the "files" format. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:31 -07:00
Patrick Steinhardt	f60fec6a16	ci/test-documentation: work around SyntaxWarning in Python 3.12 In Python 3.6, unrecognized escape sequences in regular expressions started to produce a DeprecationWarning [1]. In Python 3.12, this was upgraded to a SyntaxWarning and will eventually be raised even further to a SyntaxError. We indirectly hit such unrecognized escape sequences via Asciidoc, which results in a bunch of warnings: $ asciidoc -o /dev/null git-cat-file.txt <unknown>:1: SyntaxWarning: invalid escape sequence '\S' <unknown>:1: SyntaxWarning: invalid escape sequence '\S' This in turn causes our "ci/test-documentation.sh" script to fail, as it checks that stderr of `make doc` is empty. These escape sequences seem to be part of Asciidoc itself. In the long term, we should probably consider dropping support for Asciidoc in favor of Asciidoctor. Upstream also considers itself to be legacy software and recommends to move away from it [2]: It is suggested that unless you specifically require the AsciiDoc.py toolchain, you should find a processor that handles the modern AsciiDoc syntax. For now though, let's expand its lifetime a little bit more by filtering out these new warnings. We should probably reconsider once the warnings are upgraded to errors by Python. [1]: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals [2]: `6d9f76cff0/README.md (asciidocpy)` Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 08:20:51 -07:00
Patrick Steinhardt	401151de9e	gitlab-ci: add job to run `make check-docs` Add another job to execute `make check-docs`, which lints our documentation and makes sure that expected manpages exist. This job mirrors the same job that we already have for GitHub Actions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 08:20:51 -07:00
Patrick Steinhardt	6423920974	Documentation/lint-manpages: bubble up errors The "lint-manpages.sh" script does not return an error in case any of its checks fail. While this is faithful to the implementation that we had as part of the "check-docs" target before the preceding commit, it makes it hard to spot any violations of the rules via the corresponding CI job, which will of course exit successfully, too. Adapt the script to bubble up errors. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 08:20:51 -07:00
Patrick Steinhardt	2dd100c513	Makefile: extract script to lint missing/extraneous manpages The "check-docs" target of our top-level Makefile fulfills two different roles. For one it runs the "lint-docs" target of the "Documentation/" Makefile. And second it performs some checks of whether there are any manpages that are missing or extraneous via some inline scripts. The second set of checks feels quite misplaced in the top-level Makefile as it would fit in much better with our "lint-docs" target. Back when the checks were introduced in `8c989ec528` (Makefile: $(MAKE) check-docs, 2006-04-13), that target did not yet exist though. Furthermore, the script makes use of several Makefile variables which are defined in the top-level Makefile, which makes it hard to access their contents from elsewhere. There is a trick though that we already use in "check-builtins.sh" to gain access: we can create an ad-hoc Makefile that has an extra target to print those variables. Pull out the script into a separate "lint-manpages.sh" script by using that trick. Wire up that script via the "lint-docs" target. For one, normal shell scripts are way easier to reason about than those which are embedded in a Makefile. Second, it allows one to easily execute the script standalone without any of the other checks. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 08:20:50 -07:00
Junio C Hamano	a74c0686fa	add-i: finally retire add.interactive.useBuiltin The configuration variable stopped doing anything (other than announcing itself as a variable that does not do anything useful, when it is used) in Git 2.40. At this point, it is not even worth giving the warning, which was meant to be a way to help users notice they are carrying unused cruft in their configuration files and give them a chance to clean-up. Let's remove the warning and documentation for it, and truly stop paying attention to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/config/add.txt \| 6 ------ builtin/add.c \| 6 +----- t/t3701-add-interactive.sh \| 15 --------------- 3 files changed, 1 insertion(+), 26 deletions(-)	2024-06-05 14:53:26 -07:00
Junio C Hamano	5c71d6b63a	attr.tree: HEAD:.gitattributes is no longer the default in a bare repo `51441e64` (stop using HEAD for attributes in bare repository by default, 2024-05-03) has addressed a recent performance regression by partially reverting a topic that was merged at `26dd307c` (Merge branch 'jc/attr-tree-config', 2023-10-30). But it forgot to update the documentation to remove the mention of a special case in bare repositories. Let's update the document before the update hits the next release. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 14:52:53 -07:00
Jeff King	6d107751b2	sparse-checkout: free duplicate hashmap entries In insert_recursive_pattern(), we create a new pattern_entry to insert into the parent_hashmap. If we find that the same entry already exists in the hashmap, we skip adding the new one. But we forget to free the new one, creating a leak. We can fix it by cleaning up the discarded entry. It would probably be possible to avoid creating it in the first place, but it's non-trivial. We'd have to define a "keydata" struct that lets us compare the existing entries to the broken-out fields. It's probably not worth the complexity, so we'll punt on that for now. There is one subtlety here: our insertion is happening in a loop, with each iteration looking at the pattern we just inserted (hence the "recursive" in the name). So if we skip insertion, what do we look at? The obvious answer is that we should remember the existing duplicate we found and use that. But I _think_ in that case, we probably already have all of the recursive bits already (from when the original entry was added). And so just breaking out of the loop would be correct. But I'm not 100% sure on that; after all, the original leaky code could have done the same break, but it didn't. So I went with the "obvious answer" above, which has no chance of changing the behavior aside from fixing the leak. With this patch, t1091 can now be marked leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:51:43 -07:00
Jeff King	a544b7da2c	sparse-checkout: free string list after displaying In sparse_checkout_list(), we put the hashmap entries into a string_list so we can sort them. But after printing, we forget to free the list. This patch drops 5 leaks from t1091. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:51:43 -07:00
Jeff King	521e04e6e8	sparse-checkout: free pattern list in sparse_checkout_list() In sparse_checkout_list(), we create a pattern_list that needs to eventually be cleared. We remember to do so in the regular code path, but the cone-mode path does an early return, and forgets to clean up. We could fix the leak by adding a new call to clear_pattern_list(). But we can simplify even further by just skipping the early return, pushing the other code path (which consists now of only one line!) into an else block. That also matches the same cone/non-cone if/else used in some other functions. This fixes 15 leaks found in t1091. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:51:43 -07:00
Jeff King	008f59d2d6	sparse-checkout: free sparse_filename after use We allocate a heap buffer via get_sparse_checkout_filename(). Most calls remember to free it, but sparse_checkout_init() forgets to, causing a leak. Ironically, it remembers to do so in the error return paths, but not in the path that makes it all the way to the function end! Fixing this clears up 6 leaks from t1091. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:51:43 -07:00
Jeff King	a14d49ca84	sparse-checkout: refactor temporary sparse_checkout_patterns In update_working_directory(), we take in a pattern_list, attach it to the repository index by assigning it to index->sparse_checkout_patterns, and then call unpack_trees. Afterwards, we remove it by setting index->sparse_checkout_patterns back to NULL. But there are two possible leaks here: 1. If the index already had a populated sparse_checkout_patterns, we've obliterated it. We can fix this by saving and restoring it, rather than always setting it back to NULL. 2. We may call the function with a NULL pattern_list, expecting it to use the on-disk sparse file. In that case, the index routines will lazy-load the sparse patterns automatically. But now at the end of the function when we restore the patterns, we'll leak those lazy-loaded ones! We can fix this by freeing the pattern list before overwriting its pointer whenever it does not match what was passed in (in practice this should only happen when the passed-in list is NULL, but this is erring on the defensive side). Together these remove 48 indirect leaks found in t1091. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:51:43 -07:00
Jeff King	d765fa0331	sparse-checkout: always free "line" strbuf after reading input In add_patterns_from_input(), we may read lines from a file with a loop like this: while (!strbuf_getline(&line, file)) { ... strbuf_to_cone_pattern(&line, pl); } /* we don't strbuf_release(&line) here! */ This generally is OK because strbuf_to_cone_pattern() consumes the buffer via strbuf_detach(). But we can leak in a few cases: 1. We don't always consume the buffer! If the line ends up empty after trimming, we leave strbuf_to_cone_pattern() without detaching. In most cases this is OK, because a subsequent getline() call will use the same buffer. But if you had an empty line at the end of file, for example, it would leak. 2. Even if strbuf_to_cone_pattern() always consumed the buffer, there's a subtle issue with strbuf_getline(). As we saw in `94e2aa555e` (strbuf: fix leak when `appendwholeline()` fails with EOF, 2024-05-27), it's possible for it to return EOF with an allocated buffer (e.g., if the underlying getdelim() call saw an error). So we should always strbuf_release() after finishing a read loop like this. Note that even the code to read patterns from argv has the same problem. Because that also uses strbuf_to_cone_pattern(), we stuff each argv entry into a strbuf. It uses the same "line" strbuf as the getline code, but we should position the strbuf_release() to cover both code paths. This fixes at least 9 leaks found in t1091. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:51:43 -07:00
Jeff King	c3324649ed	sparse-checkout: reuse --stdin buffer when reading patterns When we read patterns from --stdin, we loop on strbuf_getline(), and detach each line we read to pass into add_pattern(). This used to be necessary because add_pattern() required that the pattern strings remain valid while the pattern_list was in use. But it also created a leak, since we didn't record the detached buffers anywhere else. Now that add_pattern() has been modified to make its own copy of the strings, we can stop detaching and fix the leak. This fixes 4 leaks detected in t1091. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:51:42 -07:00
Jeff King	eed1fbe73b	dir.c: always copy input to add_pattern() The add_pattern() function has a subtle and undocumented gotcha: the pattern string you pass in must remain valid as long as the pattern_list is in use (and nor do we take ownership of it). This is easy to get wrong, causing either subtle bugs (because you free or reuse the string buffer) or leaks (because you copy the string, but don't track ownership separately). All of this "pattern" code was originally the "exclude" mechanism. So this _usually_ works OK because you add entries in one of two ways: 1. From the command-line (e.g., "--exclude"), in which case we're pointing to an argv entry which remains valid for the lifetime of the program. 2. From a file (e.g., ".gitignore"), in which case we read the whole file into a buffer, attach it to the pattern_list's "filebuf" entry, then parse the buffer in-place (adding NULs). The strings point into the filebuf, which is cleaned up when the whole pattern_list goes away. But other code, like sparse-checkout, reads individual lines from stdin and passes them one by one to add_pattern(), leaking each. We could fix this by refactoring it to take in the whole buffer at once, like (2) above, and stuff it in "filebuf". But given how subtle the interface is, let's just fix it to always copy the string. That seems at first like we'd be wasting extra memory, but we can mitigate that: a. The path_pattern struct already uses a FLEXPTR, since we sometimes make a copy (when we see "foo/", we strip off the trailing slash, requiring a modifiable copy of the string). Since we'll now always embed the string inside the struct, we can switch to the regular FLEX_ARRAY pattern, saving us 8 bytes of pointer. So patterns with a trailing slash and ones under 8 bytes actually get smaller. b. Now that we don't need the original string to hang around, we can get rid of the "filebuf" mechanism entirely, and just free the file contents after parsing. Since files are the sources we'd expect to have the largest pattern sets, we should mostly break even on stuffing the same data into the individual structs. This patch just adjusts the add_pattern() interface; it doesn't fix any leaky callers yet. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:51:42 -07:00
Jeff King	e7c3d1ddba	dir.c: reduce max pattern file size to 100MB In `a2bc523e1e` (dir.c: skip .gitignore, etc larger than INT_MAX, 2024-05-31) we put capped the size of some files whose parsing code and data structures used ints. Setting the limit to INT_MAX was a natural spot, since we know the parsing code would misbehave above that. But it also leaves the possibility of overflow errors when we multiply that limit to allocate memory. For instance, a file consisting only of "a\na\n..." could have INT_MAX/2 entries. Allocating an array of pointers for each would need INT_MAX*4 bytes on a 64-bit system, enough to overflow a 32-bit int. So let's give ourselves a bit more safety margin by giving a much smaller limit. The size 100MB is somewhat arbitrary, but is based on the similar value for attribute files added by `3c50032ff5` (attr: ignore overly large gitattributes files, 2022-12-01). There's no particular reason these have to be the same, but the idea is that they are in the ballpark of "so huge that nobody would care, but small enough to avoid malicious overflow". So lacking a better guess, it makes sense to use the same value. The implementation here doesn't share the same constant, but we could change that later (or even give it a runtime config knob, though nobody has complained yet about the attribute limit). And likewise, let's add a few tests that exercise the limits, based on the attr ones. In this case, though, we never read .gitignore from the index; the blob code is exercised only for sparse filters. So we'll trigger it that way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-05 09:23:42 -07:00
Junio C Hamano	607c3d372e	show-ref: introduce --branches and deprecate --heads We call the tips of branches "heads", but this command calls the option to show only branches "--heads", which confuses the branches themselves and the tips of branches. Straighten the terminology by introducing "--branches" option that limits the output to branches, and deprecate "--heads" option used that way. We do not plan to remove "--heads" or "-h" yet; we may want to do so at Git 3.0, in which case, we may need to start advertising upcoming removal with an extra warning when they are used. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 15:07:08 -07:00
Junio C Hamano	b773fb8822	ls-remote: introduce --branches and deprecate --heads We call the tips of branches "heads", but this command calls the option to show only branches "--heads", which confuses the branches themselves and the tips of branches. Straighten the terminology by introducing "--branches" option that limits the output to branches, and deprecate "--heads" option used that way. We do not plan to remove "--heads" or "-h" yet; we may want to do so at Git 3.0, in which case, we may need to start advertising upcoming removal with an extra warning when they are used. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 15:07:08 -07:00
Junio C Hamano	a096e70c78	refs: call branches branches These things in refs/heads/ hierarchy are called "branches" in human parlance. Replace REF_HEADS with REF_BRANCHES to make it clearer. No end-user visible change intended at this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 15:07:08 -07:00
Junio C Hamano	56f4f4a29d	imap-send: minimum leakfix EVen with the minimum "no-op" invocation t1517 makes, "git imap-send" leaks an empty strbuf it used to read a 0-byte string into. There are a few other topics cooking in 'next' that plugs many other leaks in this program, so let's minimally fix this one, barely enough to make CI pass, leaving the rest for the other topic. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 11:48:20 -07:00
Jeff King	4c844c2f49	dir.c: free removed sparse-pattern hashmap entries In add_pattern_to_hashsets(), we remove entries from the recursive_hashmap when adding similar ones to the parent_hashmap. I won't pretend to understand all of what's going on here, but there's an obvious leak: whatever we removed from recursive_hashmap is not referenced anywhere else, and is never free()d. We can easily fix this by asking the hashmap to return a pointer to the old entry. This makes t7002 now completely leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 10:38:23 -07:00
Jeff King	db83b64cda	sparse-checkout: clear patterns when init() sees existing sparse file In sparse_checkout_init(), we first try to load patterns from an existing file. If we found any, we return immediately, but end up leaking the patterns we parsed. Fixing this reduces the number of leaks in t7002 from 9 down to 5. Note that there are two other exits from the function, but they don't need the same treatment: - if we can't resolve HEAD, we write out a hard-coded sparse file and return. But we know the pattern list is empty there, since we didn't find any in the on-disk file and we haven't yet added any of our own. - otherwise, we do populate the list and then tail-call into write_patterns_and_update(). But that function frees the pattern_list itself, so we don't need to. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 10:38:23 -07:00
Jeff King	4318d3ab65	dir.c: free strings in sparse cone pattern hashmaps The pattern_list structs used for cone-mode sparse lookups use a few extra hashmaps. These store pattern_entry structs, each of which has its own heap-allocated pattern string. When we clean up the hashmaps, we free the individual pattern_entry structs, but forget to clean up the embedded strings, causing memory leaks. We can fix this by iterating over the hashmaps to free the extra strings. This reduces the numbers of leaks in t7002 from 22 to 9. One alternative here would be to make the string a FLEX_ARRAY member of the pattern_entry. Then there's no extra free() required, and as a bonus it would be a little more efficient. However, some of the refactoring gets awkward, as we are often assigning strings allocated by helper functions. So let's just fix the leak for now, and we can explore bigger refactoring separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 10:38:23 -07:00
Jeff King	4d7f95ed1f	sparse-checkout: pass string literals directly to add_pattern() The add_pattern() function takes a pattern string, but neither makes a copy of it nor takes ownership of the memory. So it is the caller's responsibility to make sure the string hangs around as long as the pattern_list which references it. There are a few cases in sparse-checkout where we use string literal patterns by stuffing them into a strbuf, detaching the buffer, and then passing the result into add_pattern(). This creates a leak when the pattern_list is eventually cleared, since we don't retain a copy of the detached buffer to free. But we can observe that the whole strbuf dance is unnecessary. The point was presumably[1] to satisfy the lifetime requirement of the string. But string literals have static duration; we can count on them lasting for the whole program. So we can fix the leak by just passing them directly. And as a bonus, that simplifies the code. The leaks can be seen in t7002, which drops from 25 leaks to 22 with this patch. It also makes t3602 and t1090 leak-free. In the long run, we will also want to clean up this (undocumented!) memory lifetime requirement of add_pattern(). But that can come in a later patch; passing the string literals directly will be the right thing either way. [1] The code in question comes from `416adc8711` (sparse-checkout: update working directory in-process for 'init', 2019-11-21) and `99dfa6f970` (sparse-checkout: use in-process update for disable subcommand, 2019-11-21), but I didn't see anything in their commit messages or on the list explaining the strbufs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 10:38:23 -07:00
Jeff King	2181fe6e46	sparse-checkout: free string list in write_cone_to_file() We use a string list to hold sorted and de-duped patterns, but don't free it before leaving the function, causing a leak. This drops the number of leaks found in t7002 from 27 to 25. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-04 10:38:22 -07:00
Junio C Hamano	7b0defb391	The tenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-03 13:14:52 -07:00
Junio C Hamano	eb6392fb4f	Merge branch 'th/push-local-ff-check-without-lazy-fetch' When "git push" notices that the commit at the tip of the ref on the other side it is about to overwrite does not exist locally, it used to first try fetching it if the local repository is a partial clone. The command has been taught not to do so and immediately fail instead. * th/push-local-ff-check-without-lazy-fetch: push: don't fetch commit object when checking existence	2024-06-03 13:11:12 -07:00
Junio C Hamano	5c7c063c1f	Merge branch 'ps/fix-reinit-includeif-onbranch' "git init" in an already created directory, when the user configuration has includeif.onbranch, started to fail recently, which has been corrected. * ps/fix-reinit-includeif-onbranch: setup: fix bug with "includeIf.onbranch" when initializing dir	2024-06-03 13:11:11 -07:00
Junio C Hamano	9eaef5822c	Sync with 'maint'	2024-05-31 15:50:54 -07:00
Ian Wienand	291ef5b61c	run-command: show prepared command This adds a trace point in start_command so we can see the full command invocation without having to resort to strace/code inspection. For example: $ GIT_TRACE=1 git test foo git.c:755 trace: exec: git-test foo run-command.c:657 trace: run_command: git-test foo run-command.c:657 trace: run_command: 'echo $' foo run-command.c:749 trace: start_command: /bin/sh -c 'echo $ "$@"' 'echo $*' foo Prior changes have made the documentation around the internals of the alias command execution clearer, but I have still found this detailed view of the aliased command being run helpful for debugging purposes. A test case is added to ensure the full command output is present in the execution flow. Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-31 15:47:55 -07:00
Ian Wienand	d35a743659	Documentation: alias: add notes on shell expansion When writing inline shell for shell-expansion aliases (i.e. prefixed with "!"), there are some caveats around argument parsing to be aware of. This series of notes attempts to explain what is happening more clearly. Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-31 15:47:55 -07:00
Jeff King	a2bc523e1e	dir.c: skip .gitignore, etc larger than INT_MAX We use add_patterns() to read .gitignore, .git/info/exclude, etc, as well as other pattern-like files like sparse-checkout. The parser for these uses an "int" as an index, meaning that files over 2GB will generally cause signed integer overflow and out-of-bounds access. This is unlikely to happen in any real files, but we do read .gitignore files from the tree. A malicious tree could cause an out-of-bounds read and segfault (we also write NULs over newlines, so in theory it could be an out-of-bounds write, too, but as we go char-by-char, the first thing that happens is trying to read a negative 2GB offset). We could fix the most obvious issue by replacing one "int" with a "size_t". But there are tons of "int" sprinkled throughout this code for things like pattern lengths, number of patterns, and so on. Since nobody would actually want a 2GB .gitignore file, an easy defensive measure is to just refuse to parse them. The "int" in question is in add_patterns_from_buffer(), so we could catch it there. But by putting the checks in its two callers, we can produce more useful error messages. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-31 15:30:32 -07:00
Junio C Hamano	715ae27382	Post 2.45.2 updates Merge down a handful of topics to adjust tests and CI to make them work better, without changing Git itself, and a bit of developer docs update: * Tests that try to corrupt in-repository files in chunked format did not work well on macOS due to its broken "mv", which has been worked around. * Unbreak CI jobs so that we do not attempt to use Python 2 that has been removed from the platform. * Git 2.43 started using the tree of HEAD as the source of attributes in a bare repository, which has severe performance implications. For now, revert the change, without ripping out a more explicit support for the attr.tree configuration variable. * Windows CI running in GitHub Actions started complaining about the order of arguments given to calloc(); the imported regex code uses the wrong order almost consistently, which has been corrected. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-31 15:28:22 -07:00

... 2 3 4 5 6 ...

73809 Commits