1
0
mirror of https://github.com/git/git synced 2024-06-30 22:54:27 +00:00
Commit Graph

461 Commits

Author SHA1 Message Date
Junio C Hamano
f0a462ecd5 Merge branch 'tb/precompose-getcwd'
We forgot to normalize the result of getcwd() to NFC on macOS where
all other paths are normalized, which has been corrected.  This still
does not address the case where core.precomposeUnicode configuration
is not defined globally.

* tb/precompose-getcwd:
  macOS: ls-files path fails if path of workdir is NFD
2024-06-24 16:39:14 -07:00
Junio C Hamano
40a163f217 Merge branch 'ps/ref-storage-migration'
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.

* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version
2024-06-17 15:55:55 -07:00
Junio C Hamano
b8bdb2f283 Merge branch 'jc/safe-directory-leading-path'
The safe.directory configuration knob has been updated to
optionally allow leading path matches.

* jc/safe-directory-leading-path:
  safe.directory: allow "lead/ing/path/*" match
2024-06-12 13:37:16 -07:00
Junio C Hamano
cf792653ad Merge branch 'ps/leakfixes'
Leakfixes.

* ps/leakfixes:
  builtin/mv: fix leaks for submodule gitfile paths
  builtin/mv: refactor to use `struct strvec`
  builtin/mv duplicate string list memory
  builtin/mv: refactor `add_slash()` to always return allocated strings
  strvec: add functions to replace and remove strings
  submodule: fix leaking memory for submodule entries
  commit-reach: fix memory leak in `ahead_behind()`
  builtin/credential: clear credential before exit
  config: plug various memory leaks
  config: clarify memory ownership in `git_config_string()`
  builtin/log: stop using globals for format config
  builtin/log: stop using globals for log config
  convert: refactor code to clarify ownership of check_roundtrip_encoding
  diff: refactor code to clarify memory ownership of prefixes
  config: clarify memory ownership in `git_config_pathname()`
  http: refactor code to clarify memory ownership
  checkout: clarify memory ownership in `unique_tracking_name()`
  strbuf: fix leak when `appendwholeline()` fails with EOF
  transport-helper: fix leaking helper name
2024-06-06 12:49:23 -07:00
Patrick Steinhardt
318efb966b refs: convert ref storage format to an enum
The ref storage format is tracked as a simple unsigned integer, which
makes it harder than necessary to discover what that integer actually is
or where its values are defined.

Convert the ref storage format to instead be an enum.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:31 -07:00
Patrick Steinhardt
a83f7f51e1 setup: unset ref storage when reinitializing repository version
When reinitializing a repository's version we may end up unsetting the
hash algorithm when it matches the default hash algorithm. If we didn't
do that then the previously configured value might remain intact.

While the same issue exists for the ref storage extension, we don't do
this here. This has been fine for most of the part because it is not
supported to re-initialize a repository with a different ref storage
format anyway. We're about to introduce a new command to migrate ref
storages though, so this is about to become an issue there.

Prepare for this and unset the ref storage format when reinitializing a
repository with the "files" format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:31 -07:00
Junio C Hamano
5c7c063c1f Merge branch 'ps/fix-reinit-includeif-onbranch'
"git init" in an already created directory, when the user
configuration has includeif.onbranch, started to fail recently,
which has been corrected.

* ps/fix-reinit-includeif-onbranch:
  setup: fix bug with "includeIf.onbranch" when initializing dir
2024-06-03 13:11:11 -07:00
Torsten Bögershausen
71fa8d2212 macOS: ls-files path fails if path of workdir is NFD
Under macOS, `git ls-files path` does not work (gives an error)
if the absolute 'path' contains characters in NFD (decomposed).
This happens when core.precomposeunicode is true, which is the
most common case. The bug report says:

$ cd somewhere          # some safe place, /tmp or ~/tmp etc.
$ mkdir $'u\xcc\x88'    # ü in NFD
$ cd ü                  # or cd $'u\xcc\x88' or cd $'\xc3\xbc'
$ git init
$ git ls-files $'/somewhere/u\xcc\x88'   # NFD
  fatal: /somewhere/ü: '/somewhere/ü' is outside repository at '/somewhere/ü'
$ git ls-files $'/somewhere/\xc3\xbc'    # NFC
(the same error as above)

In the 'fatal:' error message, there are three ü;
the 1st and 2nd are in NFC, the 3rd is in NFD.

Add test cases that follows the bug report, with the simplification
that the 'ü' is replaced by an 'ä', which is already used as NFD and
NFC in t3910.

The solution is to add a call to precompose_string_if_needed()
to this code in setup.c :
`work_tree = precompose_string_if_needed(get_git_work_tree());`

There is, however, a limitation with this very usage of Git:
The (repo) local .gitconfig file is not used, only the global
"core.precomposeunicode" is taken into account, if it is set (or not).
To set it to true is a good recommendation anyway, and here is the
analyzes from Jun T :

The problem is the_repository->config->hash_initialized
is set to 1 before the_repository->commondir is set to ".git".
Due to this, .git/config is never read, and precomposed_unicode
is never set to 1 (remains -1).

run_builtin() {
    setup_git_directory() {
        strbuf_getcwd() {   # setup.c:1542
            precompose_{strbuf,string}_if_needed() {
                # precomposed_unicode is still -1
                git_congig_get_bool("core.precomposeunicode") {
                    git_config_check_init() {
                        repo_read_config() {
                            git_config_init() {
                                # !!!
                                the_repository->config->hash_initialized=1
                                # !!!
                            }
                            # does not read .git/config since
                            # the_repository->commondir is still NULL
                        }
                    }
                }
                returns without converting to NFC
            }
            returns cwd in NFD
        }

        setup_discovered_git_dir() {
            set_git_work_tree(".") {
                repo_set_worktree() {
                    # this function indirectly calls strbuf_getcwd()
                    # --> precompose_{strbuf,string}_if_needed() -->
                    # {git,repo}_config_get_bool("core.precomposeunicode"),
                    # but does not try to read .git/config since
                    # the_repository->config->hash_initialized
                    # is already set to 1 above. And it will not read
                    # .git/config even if hash_initialized is 0
                    # since the_repository->commondir is still NULL.

                    the_repository->worktree = NFD
                }
            }
        }

        setup_git_env() {
            repo_setup_gitdir() {
                repo_set_commondir() {
                    # finally commondir is set here
                    the_repository->commondir = ".git"
                }
            }
        }

    } // END setup_git_directory

Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-31 13:13:40 -07:00
Junio C Hamano
988499e295 Merge branch 'ps/refs-without-the-repository-updates'
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.

* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
2024-05-30 14:15:13 -07:00
Junio C Hamano
a60c21b720 Merge branch 'ps/undecided-is-not-necessarily-sha1'
Before discovering the repository details, We used to assume SHA-1
as the "default" hash function, which has been corrected. Hopefully
this will smoke out codepaths that rely on such an unwarranted
assumptions.

* ps/undecided-is-not-necessarily-sha1:
  repository: stop setting SHA1 as the default object hash
  oss-fuzz/commit-graph: set up hash algorithm
  builtin/shortlog: don't set up revisions without repo
  builtin/diff: explicitly set hash algo when there is no repo
  builtin/bundle: abort "verify" early when there is no repository
  builtin/blame: don't access potentially unitialized `the_hash_algo`
  builtin/rev-parse: allow shortening to more than 40 hex characters
  remote-curl: fix parsing of detached SHA256 heads
  attr: fix BUG() when parsing attrs outside of repo
  attr: don't recompute default attribute source
  parse-options-cb: only abbreviate hashes when hash algo is known
  path: move `validate_headref()` to its only user
  path: harden validation of HEAD with non-standard hashes
2024-05-30 14:15:11 -07:00
Junio C Hamano
313eec177a safe.directory: allow "lead/ing/path/*" match
When safe.directory was introduced in v2.30.3 timeframe, 8959555c
(setup_git_directory(): add an owner check for the top-level
directory, 2022-03-02), it only allowed specific opt-out
directories.  Immediately after an embargoed release that included
the change, 0f85c4a3 (setup: opt-out of check with safe.directory=*,
2022-04-13) was done as a response to loosen the check so that a
single '*' can be used to say "I trust all repositories" for folks
who host too many repositories to list individually.

Let's further loosen the check to allow people to say "everything
under this hierarchy is deemed safe" by specifying such a leading
directory with "/*" appended to it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-29 12:06:27 -07:00
Patrick Steinhardt
6073b3b5c3 config: clarify memory ownership in git_config_pathname()
The out parameter of `git_config_pathname()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:19:59 -07:00
Junio C Hamano
e55f364398 Merge branch 'ps/refs-without-the-repository-updates' into ps/ref-storage-migration
* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
2024-05-23 09:14:08 -07:00
Patrick Steinhardt
407997c1dd setup: fix bug with "includeIf.onbranch" when initializing dir
It was reported that git-init(1) can fail when initializing an existing
directory in case the config contains an "includeIf.onbranch:"
condition:

    $ mkdir repo
    $ git -c includeIf.onbranch:main.path=nonexistent init repo
    BUG: refs.c:2056: reference backend is unknown

The same error can also be triggered when re-initializing an already
existing repository.

The bug has been introduced in 173761e21b (setup: start tracking ref
storage format, 2023-12-29), which wired up the ref storage format. The
root cause is in `init_db()`, which tries to read the config before we
have initialized `the_repository` and most importantly its ref storage
format. We eventually end up calling `include_by_branch()` and execute
`refs_resolve_ref_unsafe()`, but because we have not initialized the ref
storage format yet this will trigger the above bug.

Interestingly, `include_by_branch()` has a mechanism that will only
cause us to resolve the ref when `the_repository->gitdir` is set. This
is also the reason why this only happens when we initialize an already
existing directory or repository: `gitdir` is set in those cases, but
not when creating a new directory.

Now there are two ways to address the issue:

  - We can adapt `include_by_branch()` to also make the code conditional
    on whether `the_repository->ref_storage_format` is set.

  - We can shift around code such that we initialize the repository
    format before we read the config.

While the first approach would be safe, it may also cause us to paper
over issues where a ref store should have been set up. In our case for
example, it may be reasonable to expect that re-initializing the repo
will cause the "onbranch:" condition to trigger, but we would not do
that if the ref storage format was not set up yet. This also used to
work before the above commit that introduced this bug.

Rearrange the code such that we set up the repository format before
reading the config. This fixes the bug and ensures that "onbranch:"
conditions can trigger.

Reported-by: Heghedus Razvan <heghedus.razvan@protonmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Tested-by: Heghedus Razvan <heghedus.razvan@protonmail.com>
[jc: fixed a test and backported to v2.44.0 codebase]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-22 18:24:48 -07:00
Junio C Hamano
4beb7a3b06 Merge branch 'kn/ref-transaction-symref'
Updates to symbolic refs can now be made as a part of ref
transaction.

* kn/ref-transaction-symref:
  refs: remove `create_symref` and associated dead code
  refs: rename `refs_create_symref()` to `refs_update_symref()`
  refs: use transaction in `refs_create_symref()`
  refs: add support for transactional symref updates
  refs: move `original_update_refname` to 'refs.c'
  refs: support symrefs in 'reference-transaction' hook
  files-backend: extract out `create_symref_lock()`
  refs: accept symref values in `ref_transaction_update()`
2024-05-20 11:20:04 -07:00
Patrick Steinhardt
97abaab5f6 refs: drop git_default_branch_name()
The `git_default_branch_name()` function is a thin wrapper around
`repo_default_branch_name()` with two differences:

  - We implicitly rely on `the_repository`.

  - We cache the default branch name.

None of the callsites of `git_default_branch_name()` are hot code paths
though, so the caching of the branch name is not really required.

Refactor the callsites to use `repo_default_branch_name()` instead and
drop `git_default_branch_name()`, thus getting rid of one more case
where we rely on `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:39 -07:00
Patrick Steinhardt
ed93ea1602 refs: rename init_db callback to avoid confusion
Reference backends have two callbacks `init` and `init_db`. The
similarity of these two callbacks has repeatedly confused me whenever I
was looking at them, where I always had to look up which of them does
what.

Rename the `init_db` callback to `create_on_disk`, which should
hopefully be clearer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:36 -07:00
Junio C Hamano
bca900904d Merge branch 'ps/refs-without-the-repository'
The refs API lost functions that implicitly assumes to work on the
primary ref_store by forcing the callers to pass a ref_store as an
argument.

* ps/refs-without-the-repository:
  refs: remove functions without ref store
  cocci: apply rules to rewrite callers of "refs" interfaces
  cocci: introduce rules to transform "refs" to pass ref store
  refs: add `exclude_patterns` parameter to `for_each_fullref_in()`
  refs: introduce missing functions that accept a `struct ref_store`
2024-05-16 10:10:14 -07:00
Junio C Hamano
46536278a8 Merge branch 'ps/refs-without-the-repository' into ps/refs-without-the-repository-updates
* ps/refs-without-the-repository:
  refs: remove functions without ref store
  cocci: apply rules to rewrite callers of "refs" interfaces
  cocci: introduce rules to transform "refs" to pass ref store
  refs: add `exclude_patterns` parameter to `for_each_fullref_in()`
  refs: introduce missing functions that accept a `struct ref_store`
2024-05-16 09:48:46 -07:00
Patrick Steinhardt
2e5c4758b7 cocci: apply rules to rewrite callers of "refs" interfaces
Apply the rules that rewrite callers of "refs" interfaces to explicitly
pass `struct ref_store`. The resulting patch has been applied with the
`--whitespace=fix` option.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 10:06:59 -07:00
Patrick Steinhardt
0c6bd2b81d path: move validate_headref() to its only user
While `validate_headref()` is only called from `is_git_directory()` in
"setup.c", it is currently implemented in "path.c". Move it over such
that it becomes clear that it is only really used during setup in order
to discover repositories.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-06 22:50:48 -07:00
Johannes Schindelin
1c00f92eb5 Sync with 2.44.1
* maint-2.44: (41 commits)
  Git 2.44.1
  Git 2.43.4
  Git 2.42.2
  Git 2.41.1
  Git 2.40.2
  Git 2.39.4
  fsck: warn about symlink pointing inside a gitdir
  core.hooksPath: add some protection while cloning
  init.templateDir: consider this config setting protected
  clone: prevent hooks from running during a clone
  Add a helper function to compare file contents
  init: refactor the template directory discovery into its own function
  find_hook(): refactor the `STRIP_EXTENSION` logic
  clone: when symbolic links collide with directories, keep the latter
  entry: report more colliding paths
  t5510: verify that D/F confusion cannot lead to an RCE
  submodule: require the submodule path to contain directories only
  clone_submodule: avoid using `access()` on directories
  submodules: submodule paths must not contain symlinks
  clone: prevent clashing git dirs when cloning submodule in parallel
  ...
2024-04-29 20:42:30 +02:00
Johannes Schindelin
e5e6663e69 Sync with 2.43.4
* maint-2.43: (40 commits)
  Git 2.43.4
  Git 2.42.2
  Git 2.41.1
  Git 2.40.2
  Git 2.39.4
  fsck: warn about symlink pointing inside a gitdir
  core.hooksPath: add some protection while cloning
  init.templateDir: consider this config setting protected
  clone: prevent hooks from running during a clone
  Add a helper function to compare file contents
  init: refactor the template directory discovery into its own function
  find_hook(): refactor the `STRIP_EXTENSION` logic
  clone: when symbolic links collide with directories, keep the latter
  entry: report more colliding paths
  t5510: verify that D/F confusion cannot lead to an RCE
  submodule: require the submodule path to contain directories only
  clone_submodule: avoid using `access()` on directories
  submodules: submodule paths must not contain symlinks
  clone: prevent clashing git dirs when cloning submodule in parallel
  t7423: add tests for symlinked submodule directories
  ...
2024-04-19 12:38:54 +02:00
Johannes Schindelin
8e97ec3662 Sync with 2.42.2
* maint-2.42: (39 commits)
  Git 2.42.2
  Git 2.41.1
  Git 2.40.2
  Git 2.39.4
  fsck: warn about symlink pointing inside a gitdir
  core.hooksPath: add some protection while cloning
  init.templateDir: consider this config setting protected
  clone: prevent hooks from running during a clone
  Add a helper function to compare file contents
  init: refactor the template directory discovery into its own function
  find_hook(): refactor the `STRIP_EXTENSION` logic
  clone: when symbolic links collide with directories, keep the latter
  entry: report more colliding paths
  t5510: verify that D/F confusion cannot lead to an RCE
  submodule: require the submodule path to contain directories only
  clone_submodule: avoid using `access()` on directories
  submodules: submodule paths must not contain symlinks
  clone: prevent clashing git dirs when cloning submodule in parallel
  t7423: add tests for symlinked submodule directories
  has_dir_name(): do not get confused by characters < '/'
  ...
2024-04-19 12:38:50 +02:00
Johannes Schindelin
be348e9815 Sync with 2.41.1
* maint-2.41: (38 commits)
  Git 2.41.1
  Git 2.40.2
  Git 2.39.4
  fsck: warn about symlink pointing inside a gitdir
  core.hooksPath: add some protection while cloning
  init.templateDir: consider this config setting protected
  clone: prevent hooks from running during a clone
  Add a helper function to compare file contents
  init: refactor the template directory discovery into its own function
  find_hook(): refactor the `STRIP_EXTENSION` logic
  clone: when symbolic links collide with directories, keep the latter
  entry: report more colliding paths
  t5510: verify that D/F confusion cannot lead to an RCE
  submodule: require the submodule path to contain directories only
  clone_submodule: avoid using `access()` on directories
  submodules: submodule paths must not contain symlinks
  clone: prevent clashing git dirs when cloning submodule in parallel
  t7423: add tests for symlinked submodule directories
  has_dir_name(): do not get confused by characters < '/'
  docs: document security issues around untrusted .git dirs
  ...
2024-04-19 12:38:46 +02:00
Johannes Schindelin
f5b2af06f5 Sync with 2.40.2
* maint-2.40: (39 commits)
  Git 2.40.2
  Git 2.39.4
  fsck: warn about symlink pointing inside a gitdir
  core.hooksPath: add some protection while cloning
  init.templateDir: consider this config setting protected
  clone: prevent hooks from running during a clone
  Add a helper function to compare file contents
  init: refactor the template directory discovery into its own function
  find_hook(): refactor the `STRIP_EXTENSION` logic
  clone: when symbolic links collide with directories, keep the latter
  entry: report more colliding paths
  t5510: verify that D/F confusion cannot lead to an RCE
  submodule: require the submodule path to contain directories only
  clone_submodule: avoid using `access()` on directories
  submodules: submodule paths must not contain symlinks
  clone: prevent clashing git dirs when cloning submodule in parallel
  t7423: add tests for symlinked submodule directories
  has_dir_name(): do not get confused by characters < '/'
  docs: document security issues around untrusted .git dirs
  upload-pack: disable lazy-fetching by default
  ...
2024-04-19 12:38:42 +02:00
Johannes Schindelin
4412a04fe6 init.templateDir: consider this config setting protected
The ability to configuring the template directory is a delicate feature:
It allows defining hooks that will be run e.g. during a `git clone`
operation, such as the `post-checkout` hook.

As such, it is of utmost importance that Git would not allow that config
setting to be changed during a `git clone` by mistake, allowing an
attacker a chance for a Remote Code Execution, allowing attackers to run
arbitrary code on unsuspecting users' machines.

As a defense-in-depth measure, to prevent minor vulnerabilities in the
`git clone` code from ballooning into higher-serverity attack vectors,
let's make this a protected setting just like `safe.directory` and
friends, i.e. ignore any `init.templateDir` entries from any local
config.

Note: This does not change the behavior of any recursive clone (modulo
bugs), as the local repository config is not even supposed to be written
while cloning the superproject, except in one scenario: If a config
template is configured that sets the template directory. This might be
done because `git clone --recurse-submodules --template=<directory>`
does not pass that template directory on to the submodules'
initialization.

Another scenario where this commit changes behavior is where
repositories are _not_ cloned recursively, and then some (intentional,
benign) automation configures the template directory to be used before
initializing the submodules.

So the caveat is that this could theoretically break existing processes.

In both scenarios, there is a way out, though: configuring the template
directory via the environment variable `GIT_TEMPLATE_DIR`.

This change in behavior is a trade-off between security and
backwards-compatibility that is struck in favor of security.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-04-19 12:38:24 +02:00
Johannes Schindelin
df93e407f0 init: refactor the template directory discovery into its own function
We will need to call this function from `hook.c` to be able to prevent
hooks from running that were written as part of a `clone` but did not
originate from the template directory.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-04-17 22:30:10 +02:00
Johannes Schindelin
f4aa8c8bb1 fetch/clone: detect dubious ownership of local repositories
When cloning from somebody else's repositories, it is possible that,
say, the `upload-pack` command is overridden in the repository that is
about to be cloned, which would then be run in the user's context who
started the clone.

To remind the user that this is a potentially unsafe operation, let's
extend the ownership checks we have already established for regular
gitdir discovery to extend also to local repositories that are about to
be cloned.

This protection extends also to file:// URLs.

The fixes in this commit address CVE-2024-32004.

Note: This commit does not touch the `fetch`/`clone` code directly, but
instead the function used implicitly by both: `enter_repo()`. This
function is also used by `git receive-pack` (i.e. pushes), by `git
upload-archive`, by `git daemon` and by `git http-backend`. In setups
that want to serve repositories owned by different users than the
account running the service, this will require `safe.*` settings to be
configured accordingly.

Also note: there are tiny time windows where a time-of-check-time-of-use
("TOCTOU") race is possible. The real solution to those would be to work
with `fstat()` and `openat()`. However, the latter function is not
available on Windows (and would have to be emulated with rather
expensive low-level `NtCreateFile()` calls), and the changes would be
quite extensive, for my taste too extensive for the little gain given
that embargoed releases need to pay extra attention to avoid introducing
inadvertent bugs.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-04-17 22:29:54 +02:00
Junio C Hamano
1002f28a52 Merge branch 'eb/hash-transition'
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.

* eb/hash-transition: (30 commits)
  t1016-compatObjectFormat: add tests to verify the conversion between objects
  t1006: test oid compatibility with cat-file
  t1006: rename sha1 to oid
  test-lib: compute the compatibility hash so tests may use it
  builtin/ls-tree: let the oid determine the output algorithm
  object-file: handle compat objects in check_object_signature
  tree-walk: init_tree_desc take an oid to get the hash algorithm
  builtin/cat-file: let the oid determine the output algorithm
  rev-parse: add an --output-object-format parameter
  repository: implement extensions.compatObjectFormat
  object-file: update object_info_extended to reencode objects
  object-file-convert: convert commits that embed signed tags
  object-file-convert: convert commit objects when writing
  object-file-convert: don't leak when converting tag objects
  object-file-convert: convert tag objects when writing
  object-file-convert: add a function to convert trees between algorithms
  object: factor out parse_mode out of fast-import and tree-walk into in object.h
  cache: add a function to read an OID of a specific algorithm
  tag: sign both hashes
  commit: export add_header_signature to support handling signatures on tags
  ...
2024-03-28 14:13:50 -07:00
Junio C Hamano
dc97afdcb9 Merge branch 'jc/safe-implicit-bare'
Users with safe.bareRepository=explicit can still work from within
$GIT_DIR of a seconary worktree (which resides at .git/worktrees/$name/)
of the primary worktree without explicitly specifying the $GIT_DIR
environment variable or the --git-dir=<path> option.

* jc/safe-implicit-bare:
  setup: notice more types of implicit bare repositories
2024-03-21 14:55:13 -07:00
Junio C Hamano
272fd9125a Merge branch 'gt/core-bare-in-templates'
Code simplification.

* gt/core-bare-in-templates:
  setup: remove unnecessary variable
2024-03-14 14:05:24 -07:00
Junio C Hamano
30b7c4bdca setup: notice more types of implicit bare repositories
Setting the safe.bareRepository configuration variable to explicit
stops git from using a bare repository, unless the repository is
explicitly specified, either by the "--git-dir=<path>" command line
option, or by exporting $GIT_DIR environment variable.  This may be
a reasonable measure to safeguard users from accidentally straying
into a bare repository in unexpected places, but often gets in the
way of users who need valid accesses to the repository.

Earlier, 45bb9162 (setup: allow cwd=.git w/ bareRepository=explicit,
2024-01-20) loosened the rule such that being inside the ".git"
directory of a non-bare repository does not really count as
accessing a "bare" repository.  The reason why such a loosening is
needed is because often hooks and third-party tools run from within
$GIT_DIR while working with a non-bare repository.

More importantly, the reason why this is safe is because a directory
whose contents look like that of a "bare" repository cannot be a
bare repository that came embedded within a checkout of a malicious
project, as long as its directory name is ".git", because ".git" is
not a name allowed for a directory in payload.

There are at least two other cases where tools have to work in a
bare-repository looking directory that is not an embedded bare
repository, and accesses to them are still not allowed by the recent
change.

 - A secondary worktree (whose name is $name) has its $GIT_DIR
   inside "worktrees/$name/" subdirectory of the $GIT_DIR of the
   primary worktree of the same repository.

 - A submodule worktree (whose name is $name) has its $GIT_DIR
   inside "modules/$name/" subdirectory of the $GIT_DIR of its
   superproject.

As long as the primary worktree or the superproject in these cases
are not bare, the pathname of these "looks like bare but not really"
directories will have "/.git/worktrees/" and "/.git/modules/" as a
substring in its leading part, and we can take advantage of the same
security guarantee allow git to work from these places.

Extend the earlier "in a directory called '.git' we are OK" logic
used for the primary worktree to also cover the secondary worktree's
and non-embedded submodule's $GIT_DIR, by moving the logic to a
helper function "is_implicit_bare_repo()".  We deliberately exclude
secondary worktrees and submodules of a bare repository, as these
are exactly what safe.bareRepository=explicit setting is designed to
forbid accesses to without an explicit GIT_DIR/--git-dir=<path>

Helped-by: Kyle Lippincott <spectral@google.com>
Helped-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-11 13:51:36 -07:00
Ghanshyam Thakkar
8145a8fd02 setup: remove unnecessary variable
The TODO comment suggested to heed core.bare from template config file
if no command line override given. And the prev_bare_repository
variable seems to have been placed for this sole purpose as it is not
used anywhere else.

However, it was clarified by Junio [1] that such values (including
core.bare) are ignored intentionally and does not make sense to
propagate them from template config to repository config. Also, the
directories for the worktree and repository are already created, and
therefore the bare/non-bare decision has already been made, by the
point we reach the codepath where the TODO comment is placed.
Therefore, prev_bare_repository does not have a usecase with/without
supporting core.bare from template. And the removal of
prev_bare_repository is safe as proved by the later part of the
comment:

    "Unfortunately, the line above is equivalent to
        is_bare_repository_cfg = !work_tree;
    which ignores the config entirely even if no `--[no-]bare`
    command line option was present.

    To see why, note that before this function, there was this call:
        prev_bare_repository = is_bare_repository()
    expanding the right hand side:
        = is_bare_repository_cfg && !get_git_work_tree()
        = is_bare_repository_cfg && !work_tree
    note that the last simplification above is valid because nothing
    calls repo_init() or set_git_work_tree() between any of the
    relevant calls in the code, and thus the !get_git_work_tree()
    calls will return the same result each time.  So, what we are
    interested in computing is the right hand side of the line of
    code just above this comment:
        prev_bare_repository || !work_tree
        = is_bare_repository_cfg && !work_tree || !work_tree
        = !work_tree
    because "A && !B || !B == !B" for all boolean values of A & B."

Therefore, remove the TODO comment and remove prev_bare_repository
variable. Also, update relevant testcases and remove one redundant
testcase.

[1]: https://lore.kernel.org/git/xmqqjzonpy9l.fsf@gitster.g/

Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:18:31 -08:00
Patrick Steinhardt
199f44cb2e builtin/clone: allow remote helpers to detect repo
In 18c9cb7524 (builtin/clone: create the refdb with the correct object
format, 2023-12-12), we have changed git-clone(1) so that it delays
creation of the refdb until after it has learned about the remote's
object format. This change was required for the reftable backend, which
encodes the object format into the tables. So if we pre-initialized the
refdb with the default object format, but the remote uses a different
object format than that, then the resulting tables would have encoded
the wrong object format.

This change unfortunately breaks remote helpers which try to access the
repository that is about to be created. Because the refdb has not yet
been initialized at the point where we spawn the remote helper, we also
don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by
the remote helper which try to access the repository would fail because
it cannot be discovered.

This is essentially a chicken-and-egg problem: we cannot initialize the
refdb because we don't know about the object format. But we cannot learn
about the object format because the remote helper may be unable to
access the partially-initialized repository.

Ideally, we would address this issue via capabilities. But the remote
helper protocol is not structured in a way that guarantees that the
capability announcement happens before the remote helper tries to access
the repository.

Instead, fix this issue by partially initializing the refdb up to the
point where it becomes discoverable by Git commands.

Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27 12:58:57 -08:00
Junio C Hamano
0f7a10a3aa Merge branch 'en/header-cleanup' into maint-2.43
Remove unused header "#include".

* en/header-cleanup:
  treewide: remove unnecessary includes in source files
  treewide: add direct includes currently only pulled in transitively
  trace2/tr2_tls.h: remove unnecessary include
  submodule-config.h: remove unnecessary include
  pkt-line.h: remove unnecessary include
  line-log.h: remove unnecessary include
  http.h: remove unnecessary include
  fsmonitor--daemon.h: remove unnecessary includes
  blame.h: remove unnecessary includes
  archive.h: remove unnecessary include
  treewide: remove unnecessary includes in source files
  treewide: remove unnecessary includes from header files
2024-02-08 16:22:10 -08:00
Junio C Hamano
a8bf3c0cac Merge branch 'kl/allow-working-in-dot-git-in-non-bare-repository'
The "disable repository discovery of a bare repository" check,
triggered by setting safe.bareRepository configuration variable to
'explicit', has been loosened to exclude the ".git/" directory inside
a non-bare repository from the check.  So you can do "cd .git &&
git cmd" to run a Git command that works on a bare repository without
explicitly specifying $GIT_DIR now.

* kl/allow-working-in-dot-git-in-non-bare-repository:
  setup: allow cwd=.git w/ bareRepository=explicit
2024-01-30 13:34:12 -08:00
Junio C Hamano
dc8ce995a2 Merge branch 'ps/worktree-refdb-initialization'
Instead of manually creating refs/ hierarchy on disk upon a
creation of a secondary worktree, which is only usable via the
files backend, use the refs API to populate it.

* ps/worktree-refdb-initialization:
  builtin/worktree: create refdb via ref backend
  worktree: expose interface to look up worktree by name
  builtin/worktree: move setup of commondir file earlier
  refs/files: skip creation of "refs/{heads,tags}" for worktrees
  setup: move creation of "refs/" into the files backend
  refs: prepare `refs_init_db()` for initializing worktree refs
2024-01-26 08:54:46 -08:00
Kyle Lippincott
45bb916248 setup: allow cwd=.git w/ bareRepository=explicit
The safe.bareRepository setting can be set to 'explicit' to disallow
implicit uses of bare repositories, preventing an attack [1] where an
artificial and malicious bare repository is embedded in another git
repository. Unfortunately, some tooling uses myrepo/.git/ as the cwd
when executing commands, and this is blocked when
safe.bareRepository=explicit. Blocking is unnecessary, as git already
prevents nested .git directories.

Teach git to not reject uses of git inside of the .git directory: check
if cwd is .git (or a subdirectory of it) and allow it even if
safe.bareRepository=explicit.

[1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md

Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-20 14:11:49 -08:00
Junio C Hamano
32c6fc3e30 Merge branch 'ps/refstorage-extension'
Introduce a new extension "refstorage" so that we can mark a
repository that uses a non-default ref backend, like reftable.

* ps/refstorage-extension:
  t9500: write "extensions.refstorage" into config
  builtin/clone: introduce `--ref-format=` value flag
  builtin/init: introduce `--ref-format=` value flag
  builtin/rev-parse: introduce `--show-ref-format` flag
  t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
  setup: introduce GIT_DEFAULT_REF_FORMAT envvar
  setup: introduce "extensions.refStorage" extension
  setup: set repository's formats on init
  setup: start tracking ref storage format
  refs: refactor logic to look up storage backends
  worktree: skip reading HEAD when repairing worktrees
  t: introduce DEFAULT_REPO_FORMAT prereq
2024-01-16 10:11:57 -08:00
Junio C Hamano
492ee03f60 Merge branch 'en/header-cleanup'
Remove unused header "#include".

* en/header-cleanup:
  treewide: remove unnecessary includes in source files
  treewide: add direct includes currently only pulled in transitively
  trace2/tr2_tls.h: remove unnecessary include
  submodule-config.h: remove unnecessary include
  pkt-line.h: remove unnecessary include
  line-log.h: remove unnecessary include
  http.h: remove unnecessary include
  fsmonitor--daemon.h: remove unnecessary includes
  blame.h: remove unnecessary includes
  archive.h: remove unnecessary include
  treewide: remove unnecessary includes in source files
  treewide: remove unnecessary includes from header files
2024-01-08 14:05:15 -08:00
Patrick Steinhardt
c358d165f2 setup: move creation of "refs/" into the files backend
When creating the ref database we unconditionally create the "refs/"
directory in "setup.c". This is a mandatory prerequisite for all Git
repositories regardless of the ref backend in use, because Git will be
unable to detect the directory as a repository if "refs/" doesn't exist.

We are about to add another new caller that will want to create a ref
database when creating worktrees. We would require the same logic to
create the "refs/" directory even though the caller really should not
care about such low-level details. Ideally, the ref database should be
fully initialized after calling `refs_init_db()`.

Move the code to create the directory into the files backend itself to
make it so. This means that future ref backends will also need to have
equivalent logic around to ensure that the directory exists, but it
seems a lot more sensible to have it this way round than to require
callers to create the directory themselves.

An alternative to this would be to create "refs/" in `refs_init_db()`
directly. This feels conceptually unclean though as the creation of the
refdb is now cluttered across different callsites. Furthermore, both the
"files" and the upcoming "reftable" backend write backend-specific data
into the "refs/" directory anyway, so splitting up this logic would only
make it harder to reason about.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08 13:17:30 -08:00
Patrick Steinhardt
2e573d61ff refs: prepare refs_init_db() for initializing worktree refs
The purpose of `refs_init_db()` is to initialize the on-disk files of a
new ref database. The function is quite inflexible right now though, as
callers can neither specify the `struct ref_store` nor can they pass any
flags.

Refactor the interface to accept both of these. This will be required so
that we can start initializing per-worktree ref databases via the ref
backend instead of open-coding the initialization in "worktree.c".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08 13:17:30 -08:00
Patrick Steinhardt
aa19619a98 setup: introduce GIT_DEFAULT_REF_FORMAT envvar
Introduce a new GIT_DEFAULT_REF_FORMAT environment variable that lets
users control the default ref format used by both git-init(1) and
git-clone(1). This is modeled after GIT_DEFAULT_OBJECT_FORMAT, which
does the same thing for the repository's object format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02 09:24:48 -08:00
Patrick Steinhardt
d7497a42b0 setup: introduce "extensions.refStorage" extension
Introduce a new "extensions.refStorage" extension that allows us to
specify the ref storage format used by a repository. For now, the only
supported format is the "files" format, but this list will likely soon
be extended to also support the upcoming "reftable" format.

There have been discussions on the Git mailing list in the past around
how exactly this extension should look like. One alternative [1] that
was discussed was whether it would make sense to model the extension in
such a way that backends are arbitrarily stackable. This would allow for
a combined value of e.g. "loose,packed-refs" or "loose,reftable", which
indicates that new refs would be written via "loose" files backend and
compressed into "packed-refs" or "reftable" backends, respectively.

It is arguable though whether this flexibility and the complexity that
it brings with it is really required for now. It is not foreseeable that
there will be a proliferation of backends in the near-term future, and
the current set of existing formats and formats which are on the horizon
can easily be configured with the much simpler proposal where we have a
single value, only.

Furthermore, if we ever see that we indeed want to gain the ability to
arbitrarily stack the ref formats, then we can adapt the current
extension rather easily. Given that Git clients will refuse any unknown
value for the "extensions.refStorage" extension they would also know to
ignore a stacked "loose,packed-refs" in the future.

So let's stick with the easy proposal for the time being and wire up the
extension.

[1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02 09:24:48 -08:00
Patrick Steinhardt
58be32fff9 setup: set repository's formats on init
The proper hash algorithm and ref storage format that will be used for a
newly initialized repository will be figured out in `init_db()` via
`validate_hash_algorithm()` and `validate_ref_storage_format()`. Until
now though, we never set up the hash algorithm or ref storage format of
`the_repository` accordingly.

There are only two callsites of `init_db()`, one in git-init(1) and one
in git-clone(1). The former function doesn't care for the formats to be
set up properly because it never access the repository after calling the
function in the first place.

For git-clone(1) it's a different story though, as we call `init_db()`
before listing remote refs. While we do indeed have the wrong hash
function in `the_repository` when `init_db()` sets up a non-default
object format for the repository, it never mattered because we adjust
the hash after learning about the remote's hash function via the listed
refs.

So the current state is correct for the hash algo, but it's not for the
ref storage format because git-clone(1) wouldn't know to set it up
properly. But instead of adjusting only the `ref_storage_format`, set
both the hash algo and the ref storage format so that `the_repository`
is in the correct state when `init_db()` exits. This is fine as we will
adjust the hash later on anyway and makes it easier to reason about the
end state of `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02 09:24:47 -08:00
Patrick Steinhardt
173761e21b setup: start tracking ref storage format
In order to discern which ref storage format a repository is supposed to
use we need to start setting up and/or discovering the format. This
needs to happen in two separate code paths.

  - The first path is when we create a repository via `init_db()`. When
    we are re-initializing a preexisting repository we need to retain
    the previously used ref storage format -- if the user asked for a
    different format then this indicates an error and we error out.
    Otherwise we either initialize the repository with the format asked
    for by the user or the default format, which currently is the
    "files" backend.

  - The second path is when discovering repositories, where we need to
    read the config of that repository. There is not yet any way to
    configure something other than the "files" backend, so we can just
    blindly set the ref storage format to this backend.

Wire up this logic so that we have the ref storage format always readily
available when needed. As there is only a single backend and because it
is not configurable we cannot yet verify that this tracking works as
expected via tests, but tests will be added in subsequent commits. To
countermand this ommission now though, raise a BUG() in case the ref
storage format is not set up properly in `ref_store_init()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02 09:24:47 -08:00
Junio C Hamano
94e8e404a7 Merge branch 'ps/clone-into-reftable-repository'
"git clone" has been prepared to allow cloning a repository with
non-default hash function into a repository that uses the reftable
backend.

* ps/clone-into-reftable-repository:
  builtin/clone: create the refdb with the correct object format
  builtin/clone: skip reading HEAD when retrieving remote
  builtin/clone: set up sparse checkout later
  builtin/clone: fix bundle URIs with mismatching object formats
  remote-curl: rediscover repository when fetching refs
  setup: allow skipping creation of the refdb
  setup: extract function to create the refdb
2023-12-27 14:52:28 -08:00
Elijah Newren
eea0e59ffb treewide: remove unnecessary includes in source files
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26 12:04:31 -08:00
Junio C Hamano
7a75e131d6 Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extension
* ps/clone-into-reftable-repository:
  builtin/clone: create the refdb with the correct object format
  builtin/clone: skip reading HEAD when retrieving remote
  builtin/clone: set up sparse checkout later
  builtin/clone: fix bundle URIs with mismatching object formats
  remote-curl: rediscover repository when fetching refs
  setup: allow skipping creation of the refdb
  setup: extract function to create the refdb
2023-12-20 10:19:58 -08:00