Commit graph

865 commits

Author SHA1 Message Date
Jeff King 02c3c59e62 hashmap: mark unused callback parameters
Hashmap comparison functions must conform to a particular callback
interface, but many don't use all of their parameters. Especially the
void cmp_data pointer, but some do not use keydata either (because they
can easily form a full struct to pass when doing lookups). Let's mark
these to make -Wunused-parameter happy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
Jeff King 783a86c142 config: mark unused callback parameters
The callback passed to git_config() must conform to a particular
interface. But most callbacks don't actually look at the extra "void
*data" parameter. Let's mark the unused parameters to make
-Wunused-parameter happy.

Note there's one unusual case here in get_remote_default() where we
actually ignore the "value" parameter. That's because it's only checking
whether the option is found at all, and not parsing its value.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
Glen Choo 776f184893 config.c: NULL check when reading protected config
In read_protected_config(), check whether each file name is NULL before
attempting to read it, and add a BUG() call to
git_config_from_file_with_options() to make this error easier to catch
in the future.

The NULL checks mirror what do_git_config_sequence() does (which
read_protected_config() is modeled after). Without these NULL checks,
multiple tests fail with "make SANITIZE=address", e.g. in the final test
of t4010, xdg_config is NULL causing us to call fopen(NULL).

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-26 23:46:01 -07:00
Glen Choo 5b3c650777 config: learn git_protected_config()
`uploadpack.packObjectsHook` is the only 'protected configuration only'
variable today, but we've noted that `safe.directory` and the upcoming
`safe.bareRepository` should also be 'protected configuration only'. So,
for consistency, we'd like to have a single implementation for protected
configuration.

The primary constraints are:

1. Reading from protected configuration should be fast. Nearly all "git"
   commands inside a bare repository will read both `safe.directory` and
   `safe.bareRepository`, so we cannot afford to be slow.

2. Protected configuration must be readable when the gitdir is not
   known. `safe.directory` and `safe.bareRepository` both affect
   repository discovery and the gitdir is not known at that point [1].

The chosen implementation in this commit is to read protected
configuration and cache the values in a global configset. This is
similar to the caching behavior we get with the_repository->config.

Introduce git_protected_config(), which reads protected configuration
and caches them in the global configset protected_config. Then, refactor
`uploadpack.packObjectsHook` to use git_protected_config().

The protected configuration functions are named similarly to their
non-protected counterparts, e.g. git_protected_config_check_init() vs
git_config_check_init().

In light of constraint 1, this implementation can still be improved.
git_protected_config() iterates through every variable in
protected_config, which is wasteful, but it makes the conversion simple
because it matches existing patterns. We will likely implement constant
time lookup functions for protected configuration in a future series
(such functions already exist for non-protected configuration, i.e.
repo_config_get_*()).

An alternative that avoids introducing another configset is to continue
to read all config using git_config(), but only accept values that have
the correct config scope [2]. This technically fulfills constraint 2,
because git_config() simply ignores the local and worktree config when
the gitdir is not known. However, this would read incomplete config into
the_repository->config, which would need to be reset when the gitdir is
known and git_config() needs to read the local and worktree config.
Resetting the_repository->config might be reasonable while we only have
these 'protected configuration only' variables, but it's not clear
whether this extends well to future variables.

[1] In this case, we do have a candidate gitdir though, so with a little
refactoring, it might be possible to provide a gitdir.
[2] This is how `uploadpack.packObjectsHook` was implemented prior to
this commit.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14 15:08:29 -07:00
Junio C Hamano 83937e9592 Merge branch 'ns/batch-fsync'
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.

* ns/batch-fsync:
  core.fsyncmethod: performance tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: tests for batch mode
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsync: use batch mode and sync loose objects by default on Windows
  unpack-objects: use the bulk-checkin infrastructure
  update-index: use the bulk-checkin infrastructure
  builtin/add: add ODB transaction around add_files_to_cache
  cache-tree: use ODB transaction around writing a tree
  core.fsyncmethod: batched disk flushes for loose-objects
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-06-03 14:30:34 -07:00
Junio C Hamano f49c478f62 Merge branch 'tk/simple-autosetupmerge'
"git -c branch.autosetupmerge=simple branch $A $B" will set the $B
as $A's upstream only when $A and $B shares the same name, and "git
-c push.default=simple" on branch $A would push to update the
branch $A at the remote $B came from.  Also more places use the
sole remote, if exists, before defaulting to 'origin'.

* tk/simple-autosetupmerge:
  push: new config option "push.autoSetupRemote" supports "simple" push
  push: default to single remote even when not named origin
  branch: new autosetupmerge option 'simple' for matching branches
2022-05-26 14:51:30 -07:00
Junio C Hamano 538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano 2b0a58d164 Merge branch 'ep/maint-equals-null-cocci' for maint-2.35
* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-02 10:06:04 -07:00
Junio C Hamano afe8a9070b tree-wide: apply equals-null.cocci
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02 09:50:37 -07:00
Tao Klerks bdaf1dfae7 branch: new autosetupmerge option 'simple' for matching branches
With the default push.default option, "simple", beginners are
protected from accidentally pushing to the "wrong" branch in
centralized workflows: if the remote tracking branch they would push
to does not have the same name as the local branch, and they try to do
a "default push", they get an error and explanation with options.

There is a particular centralized workflow where this often happens:
a user branches to a new local topic branch from an existing
remote branch, eg with "checkout -b feature1 origin/master". With
the default branch.autosetupmerge configuration (value "true"), git
will automatically add origin/master as the upstream tracking branch.

When the user pushes with a default "git push", with the intention of
pushing their (new) topic branch to the remote, they get an error, and
(amongst other things) a suggestion to run "git push origin HEAD".

If they follow this suggestion the push succeeds, but on subsequent
default pushes they continue to get an error - so eventually they
figure out to add "-u" to change the tracking branch, or they spelunk
the push.default config doc as proposed and set it to "current", or
some GUI tooling does one or the other of these things for them.

When one of their coworkers later works on the same topic branch,
they don't get any of that "weirdness". They just "git checkout
feature1" and everything works exactly as they expect, with the shared
remote branch set up as remote tracking branch, and push and pull
working out of the box.

The "stable state" for this way of working is that local branches have
the same-name remote tracking branch (origin/feature1 in this
example), and multiple people can work on that remote feature branch
at the same time, trusting "git pull" to merge or rebase as required
for them to be able to push their interim changes to that same feature
branch on that same remote.

(merging from the upstream "master" branch, and merging back to it,
are separate more involved processes in this flow).

There is a problem in this flow/way of working, however, which is that
the first user, when they first branched from origin/master, ended up
with the "wrong" remote tracking branch (different from the stable
state). For a while, before they pushed (and maybe longer, if they
don't use -u/--set-upstream), their "git pull" wasn't getting other
users' changes to the feature branch - it was getting any changes from
the remote "master" branch instead (a completely different class of
changes!)

An experienced git user might say "well yeah, that's what it means to
have the remote tracking branch set to origin/master!" - but the
original user above didn't *ask* to have the remote master branch
added as remote tracking branch - that just happened automatically
when they branched their feature branch. They didn't necessarily even
notice or understand the meaning of the "set up to track 'origin/master'"
message when they created the branch - especially if they are using a
GUI.

Looking at how to fix this, you might think "OK, so disable auto setup
of remote tracking - set branch.autosetupmerge to false" - but that
will inconvenience the *second* user in this story - the one who just
wanted to start working on the topic branch. The first and second
users swap roles at different points in time of course - they should
both have a sane configuration that does the right thing in both
situations.

Make this "branches have the same name locally as on the remote"
workflow less painful / more obvious by introducing a new
branch.autosetupmerge option called "simple", to match the same-name
"push.default" option that makes similar assumptions.

This new option automatically sets up tracking in a *subset* of the
current default situations: when the original ref is a remote tracking
branch *and* has the same branch name on the remote (as the new local
branch name).

Update the error displayed when the 'push.default=simple' configuration
rejects a mismatching-upstream-name default push, to offer this new
branch.autosetupmerge option that will prevent this class of error.

With this new configuration, in the example situation above, the first
user does *not* get origin/master set up as the tracking branch for
the new local branch. If they "git pull" in their new local-only
branch, they get an error explaining there is no upstream branch -
which makes sense and is helpful. If they "git push", they get an
error explaining how to push *and* suggesting they specify
--set-upstream - which is exactly the right thing to do for them.

This new option is likely not appropriate for users intentionally
implementing a "triangular workflow" with a shared upstream tracking
branch, that they "git pull" in and a "private" feature branch that
they push/force-push to just for remote safe-keeping until they are
ready to push up to the shared branch explicitly/separately. Such
users are likely to prefer keeping the current default
merge.autosetupmerge=true behavior, and change their push.default to
"current".

Also extend the existing branch tests with three new cases testing
this option - the obvious matching-name and non-matching-name cases,
and also a non-matching-ref-type case. The matching-name case needs to
temporarily create an independent repo to fetch from, as the general
strategy of using the local repo as the remote in these tests
precludes locally branching with the same name as in the "remote".

Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-29 11:20:55 -07:00
Neeraj Singh 8a94d83349 core.fsync: use batch mode and sync loose objects by default on Windows
Git for Windows has defaulted to core.fsyncObjectFiles=true since
September 2017. We turn on syncing of loose object files with batch mode
in upstream Git so that we can get broad coverage of the new code
upstream.

We don't actually do fsyncs in the most of the test suite, since
GIT_TEST_FSYNC is set to 0. However, we do exercise all of the
surrounding batch mode code since GIT_TEST_FSYNC merely makes the
maybe_fsync wrapper always appear to succeed.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06 13:13:26 -07:00
Neeraj Singh c0f4752ed2 core.fsyncmethod: batched disk flushes for loose-objects
When adding many objects to a repo with `core.fsync=loose-object`,
the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. This commit introduces
a new `core.fsyncMethod=batch` option that batches up hardware flushes.
It hooks into the bulk-checkin odb-transaction functionality, takes
advantage of tmp-objdir, and uses the writeout-only support code.

When the new mode is enabled, we do the following for each new object:
1a. Create the object in a tmp-objdir.
2a. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin:
1b. Issue an fsync against a dummy file to flush the log and hardware
   writeback cache, which should by now have seen the tmp-objdir writes.
2b. Rename all of the tmp-objdir files to their final names.
3b. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the default
   today, but the user now has the option of syncing the index and there
   is a separate patch series to implement syncing of refs.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns. This sequence also ensures that no object files
appear in the main object store unless they are fsync-durable.

Batch mode is only enabled if core.fsync includes loose-objects. If
the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does
not include loose-objects, we will use file-by-file fsyncing.

In step (1a) of the sequence, the tmp-objdir is created lazily to avoid
work if no loose objects are ever added to the ODB. We use a tmp-objdir
to maintain the invariant that no loose-objects are visible in the main
ODB unless they are properly fsync-durable. This is important since
future ODB operations that try to create an object with specific
contents will silently drop the new data if an object with the target
hash exists without checking that the loose-object contents match the
hash. Only a full git-fsck would restore the ODB to a functional state
where dataloss doesn't occur.

In step (1b) of the sequence, we issue a fsync against a dummy file
created specifically for the purpose. This method has a little higher
cost than using one of the input object files, but makes adding new
callers of this mechanism easier, since we don't need to figure out
which object file is "last" or risk sharing violations by caching the fd
of the last object file.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.

Adding 500 files to the repo with 'git add' Times reported in seconds.

object file syncing | Linux | Mac   | Windows
--------------------|-------|-------|--------
           disabled | 0.06  |  0.35 | 0.61
              fsync | 1.88  | 11.18 | 2.47
              batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06 13:13:01 -07:00
Junio C Hamano fca85986bb Merge branch 'ns/core-fsyncmethod' into ns/batch-fsync
* ns/core-fsyncmethod:
  configure.ac: fix HAVE_SYNC_FILE_RANGE definition
  core.fsyncmethod: correctly camel-case warning message
  core.fsync: fix incorrect expression for default configuration
  core.fsync: documentation and user-friendly aggregate options
  core.fsync: new option to harden the index
  core.fsync: add configuration parsing
  core.fsync: introduce granular fsync control infrastructure
  core.fsyncmethod: add writeout-only mode
  wrapper: make inclusion of Windows csprng header tightly scoped
2022-04-06 13:01:54 -07:00
Junio C Hamano 439c1e6d5d Merge branch 'jh/builtin-fsmonitor-part2'
Built-in fsmonitor (part 2).

* jh/builtin-fsmonitor-part2: (30 commits)
  t7527: test status with untracked-cache and fsmonitor--daemon
  fsmonitor: force update index after large responses
  fsmonitor--daemon: use a cookie file to sync with file system
  fsmonitor--daemon: periodically truncate list of modified files
  t/perf/p7519: add fsmonitor--daemon test cases
  t/perf/p7519: speed up test on Windows
  t/perf/p7519: fix coding style
  t/helper/test-chmtime: skip directories on Windows
  t/perf: avoid copying builtin fsmonitor files into test repo
  t7527: create test for fsmonitor--daemon
  t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon
  help: include fsmonitor--daemon feature flag in version info
  fsmonitor--daemon: implement handle_client callback
  compat/fsmonitor/fsm-listen-darwin: implement FSEvent listener on MacOS
  compat/fsmonitor/fsm-listen-darwin: add MacOS header files for FSEvent
  compat/fsmonitor/fsm-listen-win32: implement FSMonitor backend on Windows
  fsmonitor--daemon: create token-based changed path cache
  fsmonitor--daemon: define token-ids
  fsmonitor--daemon: add pathname classification
  fsmonitor--daemon: implement 'start' command
  ...
2022-04-04 10:56:24 -07:00
Junio C Hamano 27dd460799 Merge branch 'ns/core-fsyncmethod'
A couple of fix-up to a topic that is now in 'master'.

* ns/core-fsyncmethod:
  core.fsyncmethod: correctly camel-case warning message
  core.fsync: fix incorrect expression for default configuration
2022-04-04 10:56:22 -07:00
Neeraj Singh f12f3b9807 core.fsyncmethod: correctly camel-case warning message
The warning for an unrecognized fsyncMethod was not
camel-cased.

Reported-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-30 14:46:08 -07:00
Jeff Hostetler 1e0ea5c431 fsmonitor: config settings are repository-specific
Move fsmonitor config settings to a new and opaque
`struct fsmonitor_settings` structure.  Add a lazily-loaded pointer
to this into `struct repo_settings`

Create an `enum fsmonitor_mode` type in `struct fsmonitor_settings` to
represent the state of fsmonitor.  This lets us represent which, if
any, fsmonitor provider (hook or IPC) is enabled.

Create `fsm_settings__get_*()` getters to lazily look up fsmonitor-
related config settings.

Get rid of the `core_fsmonitor` global variable.  Move the code to
lookup the existing `core.fsmonitor` config value into the fsmonitor
settings.

Create a hook pathname variable in `struct fsmonitor-settings` and
only set it when in hook mode.

Extend the definition of `core.fsmonitor` to be either a boolean
or a hook pathname.  When true, the builtin FSMonitor is used.
When false or unset, no FSMonitor (neither builtin nor hook) is
used.

The existing `core_fsmonitor` global variable was used to store the
pathname to the fsmonitor hook *and* it was used as a boolean to see
if fsmonitor was enabled.  This dual usage and global visibility leads
to confusion when we add the IPC-based provider.  So lets hide the
details in fsmonitor-settings.c and let it decide which provider to
use in the case of multiple settings.  This avoids cluttering up
repo-settings.c with these private details.

A future commit in builtin-fsmonitor series will add the ability to
disqualify worktrees for various reasons, such as being mounted from a
remote volume, where fsmonitor should not be started.  Having the
config settings hidden in fsmonitor-settings.c allows such worktree
restrictions to override the config values used.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-25 16:04:15 -07:00
Patrick Steinhardt bc22d845c4 core.fsync: new option to harden references
When writing both loose and packed references to disk we first create a
lockfile, write the updated values into that lockfile, and on commit we
rename the file into place. According to filesystem developers, this
behaviour is broken because applications should always sync data to disk
before doing the final rename to ensure data consistency [1][2][3]. If
applications fail to do this correctly, a hard crash of the machine can
easily result in corrupted on-disk data.

This kind of corruption can in fact be easily observed with Git when the
machine hard-resets shortly after writing references to disk. On
machines with ext4, this will likely lead to the "empty files" problem:
the file has been renamed, but its data has not been synced to disk. The
result is that the reference is corrupt, and in the worst case this can
lead to data loss.

Implement a new option to harden references so that users and admins can
avoid this scenario by syncing locked loose and packed references to
disk before we rename them into place.

[1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
[2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename)
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-15 13:30:58 -07:00
Junio C Hamano 0099792400 Merge branch 'ns/core-fsyncmethod' into ps/fsync-refs
* ns/core-fsyncmethod:
  core.fsync: documentation and user-friendly aggregate options
  core.fsync: new option to harden the index
  core.fsync: add configuration parsing
  core.fsync: introduce granular fsync control infrastructure
  core.fsyncmethod: add writeout-only mode
  wrapper: make inclusion of Windows csprng header tightly scoped
2022-03-15 13:30:37 -07:00
Neeraj Singh b9f5d0358d core.fsync: documentation and user-friendly aggregate options
This commit adds aggregate options for the core.fsync setting that are
more user-friendly. These options are specified in terms of 'levels of
safety', indicating which Git operations are considered to be sync
points for durability.

The new documentation is also included here in its entirety for ease of
review.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-15 12:32:55 -07:00
Neeraj Singh ba95e96d4c core.fsync: new option to harden the index
This commit introduces the new ability for the user to harden
the index. In the event of a system crash, the index must be
durable for the user to actually find a file that has been added
to the repo and then deleted from the working tree.

We use the presence of the COMMIT_LOCK flag and absence of the
alternate_index_output as a proxy for determining whether we're
updating the persistent index of the repo or some temporary
index. We don't sync these temporary indexes.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10 15:10:22 -08:00
Neeraj Singh 844a8ad4f8 core.fsync: add configuration parsing
This change introduces code to parse the core.fsync setting and
configure the fsync_components variable.

core.fsync is configured as a comma-separated list of component names to
sync. Each time a core.fsync variable is encountered in the
configuration heirarchy, we start off with a clean state with the
platform default value. Passing 'none' resets the value to indicate
nothing will be synced. We gather all negative and positive entries from
the comma separated list and then compute the new value by removing all
the negative entries and adding all of the positive entries.

We issue a warning for components that are not recognized so that the
configuration code is compatible with configs from future versions of
Git with more repo components.

Complete documentation for the new setting is included in a later patch
in the series so that it can be reviewed once in final form.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10 15:10:22 -08:00
Neeraj Singh abf38abec2 core.fsyncmethod: add writeout-only mode
This commit introduces the `core.fsyncMethod` configuration
knob, which can currently be set to `fsync` or `writeout-only`.

The new writeout-only mode attempts to tell the operating system to
flush its in-memory page cache to the storage hardware without issuing a
CACHE_FLUSH command to the storage controller.

Writeout-only fsync is significantly faster than a vanilla fsync on
common hardware, since data is written to a disk-side cache rather than
all the way to a durable medium. Later changes in this patch series will
take advantage of this primitive to implement batching of hardware
flushes.

When git_fsync is called with FSYNC_WRITEOUT_ONLY, it may fail and the
caller is expected to do an ordinary fsync as needed.

On Apple platforms, the fsync system call does not issue a CACHE_FLUSH
directive to the storage controller. This change updates fsync to do
fcntl(F_FULLFSYNC) to make fsync actually durable. We maintain parity
with existing behavior on Apple platforms by setting the default value
of the new core.fsyncMethod option.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10 15:10:22 -08:00
Junio C Hamano 82386b4496 Merge branch 'en/present-despite-skipped'
In sparse-checkouts, files mis-marked as missing from the working tree
could lead to later problems.  Such files were hard to discover, and
harder to correct.  Automatically detecting and correcting the marking
of such files has been added to avoid these problems.

* en/present-despite-skipped:
  repo_read_index: add config to expect files outside sparse patterns
  Accelerate clear_skip_worktree_from_present_files() by caching
  Update documentation related to sparsity and the skip-worktree bit
  repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  unpack-trees: fix accidental loss of user changes
  t1011: add testcase demonstrating accidental loss of user modifications
2022-03-09 13:38:23 -08:00
Elijah Newren ecc7c8841d repo_read_index: add config to expect files outside sparse patterns
Typically with sparse checkouts, we expect files outside the sparsity
patterns to be marked as SKIP_WORKTREE and be missing from the working
tree.  Sometimes this expectation would be violated however; including
in cases such as:
  * users grabbing files from elsewhere and writing them to the worktree
    (perhaps by editing a cached copy in an editor, copying/renaming, or
     even untarring)
  * various git commands having incomplete or no support for the
    SKIP_WORKTREE bit[1,2]
  * users attempting to "abort" a sparse-checkout operation with a
    not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
    working tree is not atomic)[3].
When the SKIP_WORKTREE bit in the index did not reflect the presence of
the file in the working tree, it traditionally caused confusion and was
difficult to detect and recover from.  So, in a sparse checkout, since
af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present
in worktree, 2022-01-14), Git automatically clears the SKIP_WORKTREE
bit at index read time for entries corresponding to files that are
present in the working tree.

There is another workflow, however, where it is expected that paths
outside the sparsity patterns appear to exist in the working tree and
that they do not lose the SKIP_WORKTREE bit, at least until they get
modified.  A Git-aware virtual file system[4] takes advantage of its
position as a file system driver to expose all files in the working
tree, fetch them on demand using partial clone on access, and tell Git
to pay attention to them on demand by updating the sparse checkout
pattern on writes.  This means that commands like "git status" only have
to examine files that have potentially been modified, whereas commands
like "ls" are able to show the entire codebase without requiring manual
updates to the sparse checkout pattern.

Thus since af6a51875a, Git with such Git-aware virtual file systems
unsets the SKIP_WORKTREE bit for all files and commands like "git
status" have to fetch and examine them all.

Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to
allow limiting the tracked set of files to a small set once again.  A
Git-aware virtual file system or other application that wants to
maintain files outside of the sparse checkout can set this in a
repository to instruct Git not to check for the presence of
SKIP_WORKTREE files.  The setting defaults to false, so most users of
sparse checkout will still get the benefit of an automatically updating
index to recover from the variety of difficult issues detailed in
af6a51875a for paths with SKIP_WORKTREE set despite the path being
present.

[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] The three long paragraphs in the middle of
    https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
[4] such as the vfsd described in
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 23:37:48 -08:00
Junio C Hamano 0a01df08c0 Merge branch 'ab/date-mode-release'
Plug (some) memory leaks around parse_date_format().

* ab/date-mode-release:
  date API: add and use a date_mode_release()
  date API: add basic API docs
  date API: provide and use a DATE_MODE_INIT
  date API: create a date.h, split from cache.h
  cache.h: remove always unused show_date_human() declaration
2022-02-25 15:47:36 -08:00
Junio C Hamano 6249ce2d1b Merge branch 'ds/sparse-checkout-requires-per-worktree-config'
"git sparse-checkout" wants to work with per-worktree configuration,
but did not work well in a worktree attached to a bare repository.

* ds/sparse-checkout-requires-per-worktree-config:
  config: make git_configset_get_string_tmp() private
  worktree: copy sparse-checkout patterns and config on add
  sparse-checkout: set worktree-config correctly
  config: add repo_config_set_worktree_gently()
  worktree: create init_worktree_config()
  Documentation: add extensions.worktreeConfig details
2022-02-25 15:47:33 -08:00
Ævar Arnfjörð Bjarmason 88c7b4c3c8 date API: create a date.h, split from cache.h
Move the declaration of the date.c functions from cache.h, and adjust
the relevant users to include the new date.h header.

The show_ident_date() function belonged in pretty.h (it's defined in
pretty.c), its two users outside of pretty.c didn't strictly need to
include pretty.h, as they get it indirectly, but let's add it to them
anyway.

Similarly, the change to "builtin/{fast-import,show-branch,tag}.c"
isn't needed as far as the compiler is concerned, but since they all
use the "DATE_MODE()" macro we now define in date.h, let's have them
include it.

We could simply include this new header in "cache.h", but as this
change shows these functions weren't common enough to warrant
including in it in the first place. By moving them out of cache.h
changes to this API will no longer cause a (mostly) full re-build of
the project when "make" is run.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16 09:40:00 -08:00
Junio C Hamano 13ce8f9f14 Merge branch 'jt/conditional-config-on-remote-url'
The conditional inclusion mechanism of configuration files using
"[includeIf <condition>]" learns to base its decision on the
URL of the remote repository the repository interacts with.

* jt/conditional-config-on-remote-url:
  config: include file if remote URL matches a glob
  config: make git_config_include() static
2022-02-09 14:20:59 -08:00
Derrick Stolee 3ce1138272 config: make git_configset_get_string_tmp() private
This method was created in f1de981e8 (config: fix leaks from
git_config_get_string_const(), 2020-08-14) but its only use was in the
repo_config_get_string_tmp() method, also declared in config.h and
implemented in config.c. Since this is otherwise unused and is a very
similar implementation to git_configset_get_value(), let's remove this
declaration.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-08 09:49:21 -08:00
Derrick Stolee fe18733927 config: add repo_config_set_worktree_gently()
Some config settings, such as those for sparse-checkout, are likely
intended to only apply to one worktree at a time. To make this write
easier, add a new config API method, repo_config_set_worktree_gently().

This method will attempt to write to the worktree-specific config, but
will instead write to the common config file if worktree config is not
enabled.  The next change will introduce a consumer of this method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-08 09:49:20 -08:00
Jonathan Tan 399b198489 config: include file if remote URL matches a glob
This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager - it could, for example,
contain a file to be included conditionally and a post-install script
that adds the include directive to the system-wide config file).

In order to do this, Git reruns the config parsing mechanism upon
noticing the first URL-conditional include in order to find all remote
URLs, and these remote URLs are then used to determine if that first and
all subsequent includes are executed. Remote URLs are not allowed to be
configued in any URL-conditionally-included file.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-18 13:55:53 -08:00
Jonathan Tan ed69e11b89 config: make git_config_include() static
It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-18 13:55:53 -08:00
Junio C Hamano 0669bdf4eb Merge branch 'js/branch-track-inherit'
"git -c branch.autosetupmerge=inherit branch new old" makes "new"
to have the same upstream as the "old" branch, instead of marking
"old" itself as its upstream.

* js/branch-track-inherit:
  config: require lowercase for branch.*.autosetupmerge
  branch: add flags and config to inherit tracking
  branch: accept multiple upstream branches for tracking
2022-01-10 11:52:54 -08:00
Josh Steadmon 44f14a9d24 config: require lowercase for branch.*.autosetupmerge
Although we only documented that branch.*.autosetupmerge would accept
"always" as a value, the actual implementation would accept any
combination of upper- or lower-case. Fix this to be consistent with
documentation and with other values of this config variable.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-20 22:40:21 -08:00
Josh Steadmon d3115660b4 branch: add flags and config to inherit tracking
It can be helpful when creating a new branch to use the existing
tracking configuration from the branch point. However, there is
currently not a method to automatically do so.

Teach git-{branch,checkout,switch} an "inherit" argument to the
"--track" option. When this is set, creating a new branch will cause the
tracking configuration to default to the configuration of the branch
point, if set.

For example, if branch "main" tracks "origin/main", and we run
`git checkout --track=inherit -b feature main`, then branch "feature"
will track "origin/main". Thus, `git status` will show us how far
ahead/behind we are from origin, and `git pull` will pull from origin.

This is particularly useful when creating branches across many
submodules, such as with `git submodule foreach ...` (or if running with
a patch such as [1], which we use at $job), as it avoids having to
manually set tracking info for each submodule.

Since we've added an argument to "--track", also add "--track=direct" as
another way to explicitly get the original "--track" behavior ("--track"
without an argument still works as well).

Finally, teach branch.autoSetupMerge a new "inherit" option. When this
is set, "--track=inherit" becomes the default behavior.

[1]: https://lore.kernel.org/git/20180927221603.148025-1-sbeller@google.com/

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-20 22:40:21 -08:00
Ævar Arnfjörð Bjarmason f5c39c3268 config API: use get_error_routine(), not vreportf()
Change the git_die_config() function added in 5a80e97c82 (config: add
`git_die_config()` to the config-set API, 2014-08-07) to use the
public callbacks in the usage.[ch] API instead of the the underlying
vreportf() function.

In preceding commits the rest of the vreportf() users outside of
usage.c was migrated to die_message(), so we can now make it "static".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-07 13:25:16 -08:00
Ævar Arnfjörð Bjarmason 25ad722126 config.c: don't leak memory in handle_path_include()
Fix a memory leak in the error() path in handle_path_include(), this
allows us to run t1305-config-include.sh under SANITIZE=leak,
previously 4 tests there would fail. This fixes up a leak in
9b25a0b52e (config: add include directive, 2012-02-06).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-21 16:26:45 -07:00
Ævar Arnfjörð Bjarmason 73c5f67071 config.c: remove unused git_config_key_is_valid()
The git_config_key_is_valid() function got left behind in a
refactoring in a9bcf6586d (alias: use the early config machinery to
expand aliases, 2017-06-14),

It previously had two users when it was added in 9e9de18f1a (config:
silence warnings for command names with invalid keys, 2015-08-24), and
after 6a1e1bc0a1 (pager: use callbacks instead of configset,
2016-09-12) only one remained.

By removing it we can get rid of the "quiet" branches in this
function, as well as cases where "store_key" is NULL, for which there
are no other users.

Out of the 5 callers of git_config_parse_key() only one needs to pass
a non-NULL "size_t *baselen_", so we could remove the third parameter
from the public interface. I did not find that potential
simplification to be worthwhile.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 14:54:15 -07:00
Junio C Hamano e3b77a2d03 Merge branch 'rs/drop-core-compression-vars'
Code clean-up.

* rs/drop-core-compression-vars:
  compression: drop write-only core_compression_* variables
2021-09-23 13:44:46 -07:00
Junio C Hamano 11e5d0a262 Merge branch 'jt/grep-wo-submodule-odb-as-alternate'
The code to make "git grep" recurse into submodules has been
updated to migrate away from the "add submodule's object store as
an alternate object store" mechanism (which is suboptimal).

* jt/grep-wo-submodule-odb-as-alternate:
  t7814: show lack of alternate ODB-adding
  submodule-config: pass repo upon blob config read
  grep: add repository to OID grep sources
  grep: allocate subrepos on heap
  grep: read submodule entry with explicit repo
  grep: typesafe versions of grep_source_init
  grep: use submodule-ODB-as-alternate lazy-addition
  submodule: lazily add submodule ODBs as alternates
2021-09-20 15:20:39 -07:00
René Scharfe 8f0f110156 compression: drop write-only core_compression_* variables
Since 8de7eeb54b (compression: unify pack.compression configuration
parsing, 2016-11-15) the variables core_compression_level and
core_compression_seen are only set, but never read.  Remove them.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-12 16:23:28 -07:00
Jonathan Tan e3e8bf046e submodule-config: pass repo upon blob config read
When reading the config of a submodule, if reading from a blob, read
using an explicitly specified repository instead of by adding the
submodule's ODB as an alternate and then reading an object from
the_repository.

This makes the "grep --recurse-submodules with submodules without
.gitmodules in the working tree" test in t7814 work when
GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB is true.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:48:09 -07:00
Junio C Hamano aab0eeaba5 Merge branch 'js/expand-runtime-prefix'
Pathname expansion (like "~username/") learned a way to specify a
location relative to Git installation (e.g. its $sharedir which is
$(prefix)/share), with "%(prefix)".

* js/expand-runtime-prefix:
  expand_user_path: allow in-flight topics to keep using the old name
  interpolate_path(): allow specifying paths relative to the runtime prefix
  Use a better name for the function interpolating paths
  expand_user_path(): clarify the role of the `real_home` parameter
  expand_user_path(): remove stale part of the comment
  tests: exercise the RUNTIME_PREFIX feature
2021-08-24 15:32:38 -07:00
Johannes Schindelin a03b097d63 Use a better name for the function interpolating paths
It is not immediately clear what `expand_user_path()` means, so let's
rename it to `interpolate_path()`. This also opens the path for
interpolating more than just a home directory.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26 12:17:16 -07:00
Junio C Hamano 8e62a85352 Merge branch 'ds/gender-neutral-doc'
Update the documentation not to assume users are of certain gender
and adds to guidelines to do so.

* ds/gender-neutral-doc:
  *: fix typos
  comments: avoid using the gender of our users
  doc: avoid using the gender of other people
2021-07-16 17:42:53 -07:00
Junio C Hamano bd4232fac3 Merge branch 'ab/struct-init'
Code cleanup around struct_type_init() functions.

* ab/struct-init:
  string-list.h users: change to use *_{nodup,dup}()
  string-list.[ch]: add a string_list_init_{nodup,dup}()
  dir.[ch]: replace dir_init() with DIR_INIT
  *.c *_init(): define in terms of corresponding *_INIT macro
  *.h: move some *_INIT to designated initializers
2021-07-16 17:42:53 -07:00
Junio C Hamano a93c6fd677 Merge branch 'ew/mmap-failures'
Error message update.

* ew/mmap-failures:
  xmmap: inform Linux users of tuning knobs on ENOMEM
2021-07-16 17:42:47 -07:00
Ævar Arnfjörð Bjarmason bc40dfb10a string-list.h users: change to use *_{nodup,dup}()
Change all in-tree users of the string_list_init(LIST, BOOL) API to
use string_list_init_{nodup,dup}(LIST) instead.

As noted in the preceding commit let's leave the now-unused
string_list_init() wrapper in-place for any in-flight users, it can be
removed at some later date.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-01 12:32:22 -07:00
Eric Wong dc05929411 xmmap: inform Linux users of tuning knobs on ENOMEM
Linux users may benefit from additional information on how to
avoid ENOMEM from mmap despite the system having enough RAM to
accomodate them.  We can't reliably unmap pack windows to work
around the issue since malloc and other library routines may
mmap without our knowledge.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-29 23:14:25 -07:00
Johannes Schindelin e355307692 config: normalize the path of the system gitconfig
Git for Windows is compiled with a runtime prefix, and that runtime
prefix is typically `C:/Program Files/Git/mingw64`. As we want the
system gitconfig to live in the sibling directory `etc`, we define the
relative path as `../etc/gitconfig`.

However, as reported by Philip Oakley, the output of `git config
--show-origin --system -l` looks rather ugly, as it shows the path as
`file:C:/Program Files/Git/mingw64/../etc/gitconfig`, i.e. with the
`mingw64/../` part.

By normalizing the path, we get a prettier path.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28 20:11:51 -07:00
Derrick Stolee 46a237f42f *: fix typos
These typos were found while searching the codebase for gendered
pronouns. In the case of t9300-fast-import.sh, remove a confusing
comment that is unnecessary to the understanding of the test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-16 11:26:20 +09:00
Junio C Hamano e706aaf3bc Merge branch 'ps/config-global-override'
Replace GIT_CONFIG_NOSYSTEM mechanism to decline from reading the
system-wide configuration file with GIT_CONFIG_SYSTEM that lets
users specify from which file to read the system-wide configuration
(setting it to an empty file would essentially be the same as
setting NOSYSTEM), and introduce GIT_CONFIG_GLOBAL to override the
per-user configuration in $HOME/.gitconfig.

* ps/config-global-override:
  t1300: fix unset of GIT_CONFIG_NOSYSTEM leaking into subsequent tests
  config: allow overriding of global and system configuration
  config: unify code paths to get global config paths
  config: rename `git_etc_config()`
2021-05-07 12:47:39 +09:00
Patrick Steinhardt 4179b4897f config: allow overriding of global and system configuration
In order to have git run in a fully controlled environment without any
misconfiguration, it may be desirable for users or scripts to override
global- and system-level configuration files. We already have a way of
doing this, which is to unset both HOME and XDG_CONFIG_HOME environment
variables and to set `GIT_CONFIG_NOGLOBAL=true`. This is quite kludgy,
and unsetting the first two variables likely has an impact on other
executables spawned by such a script.

The obvious way to fix this would be to introduce `GIT_CONFIG_NOGLOBAL`
as an equivalent to `GIT_CONFIG_NOSYSTEM`. But in the past, it has
turned out that this design is inflexible: we cannot test system-level
parsing of the git configuration in our test harness because there is no
way to change its location, so all tests run with `GIT_CONFIG_NOSYSTEM`
set.

Instead of doing the same mistake with `GIT_CONFIG_NOGLOBAL`, introduce
two new variables `GIT_CONFIG_GLOBAL` and `GIT_CONFIG_SYSTEM`:

    - If unset, git continues to use the usual locations.

    - If set to a specific path, we skip reading the normal
      configuration files and instead take the path. By setting the path
      to `/dev/null`, no configuration will be loaded for the respective
      level.

This implements the usecase where we want to execute code in a sanitized
environment without any potential misconfigurations via `/dev/null`, but
is more flexible and allows for more usecases than simply adding
`GIT_CONFIG_NOGLOBAL`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19 14:16:59 -07:00
Patrick Steinhardt 1e06eb9b5d config: unify code paths to get global config paths
There's two callsites which assemble global config paths, once in the
config loading code and once in the git-config(1) builtin. We're about
to implement a way to override global config paths via an environment
variable which would require us to adjust both sites.

Unify both code paths into a single `git_global_config()` function which
returns both paths for `~/.gitconfig` and the XDG config file. This will
make the subsequent patch which introduces the new envvar easier to
implement.

No functional changes are expected from this patch.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19 14:16:59 -07:00
Patrick Steinhardt c62a999c6e config: rename git_etc_config()
The `git_etc_gitconfig()` function retrieves the system-level path of
the configuration file. We're about to introduce a way to override it
via an environment variable, at which point the name of this function
would start to become misleading.

Rename the function to `git_system_config()` as a preparatory step.
While at it, the function is also refactored to pass memory ownership to
the caller. This is done to better match semantics of
`git_global_config()`, which is going to be introduced in the next
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19 14:16:59 -07:00
Ævar Arnfjörð Bjarmason 39e12650d7 config.c: remove last remnant of GIT_TEST_GETTEXT_POISON
Remove a use of GIT_TEST_GETTEXT_POISON added in f276e2a469 (config:
improve error message for boolean config, 2021-02-11).

This was simultaneously in-flight with my d162b25f95 (tests: remove
support for GIT_TEST_GETTEXT_POISON, 2021-01-20) which removed the
rest of the GIT_TEST_GETTEXT_POISON code.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08 10:54:08 -07:00
René Scharfe ca56dadb4b use CALLOC_ARRAY
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13 16:00:09 -08:00
Junio C Hamano 483e09e810 Merge branch 'ak/config-bad-bool-error'
The error message given when a configuration variable that is
expected to have a boolean value has been improved.

* ak/config-bad-bool-error:
  config: improve error message for boolean config
2021-02-17 17:21:43 -08:00
Andrew Klotz f276e2a469 config: improve error message for boolean config
Currently invalid boolean config values return messages about 'bad
numeric', which is slightly misleading when the error was due to a
boolean value. We can improve the developer experience by returning a
boolean error message when we know the value is neither a bool text or
int.

before with an invalid boolean value of `non-boolean`, its unclear what
numeric is referring to:
  fatal: bad numeric config value 'non-boolean' for 'commit.gpgsign': invalid unit

now the error message mentions `non-boolean` is a bad boolean value:
  fatal: bad boolean config value 'non-boolean' for 'commit.gpgsign'

Signed-off-by: Andrew Klotz <agc.klotz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-11 13:44:55 -08:00
Junio C Hamano 466f94ec45 Merge branch 'ab/detox-gettext-tests'
Get rid of "GETTEXT_POISON" support altogether, which may or may
not be controversial.

* ab/detox-gettext-tests:
  tests: remove uses of GIT_TEST_GETTEXT_POISON=false
  tests: remove support for GIT_TEST_GETTEXT_POISON
  ci: remove GETTEXT_POISON jobs
2021-02-10 14:48:33 -08:00
Junio C Hamano 294e949fa2 Merge branch 'ps/config-env-pairs'
Introduce two new ways to feed configuration variable-value pairs
via environment variables, and tweak the way GIT_CONFIG_PARAMETERS
encodes variable/value pairs to make it more robust.

* ps/config-env-pairs:
  config: allow specifying config entries via envvar pairs
  environment: make `getenv_safe()` a public function
  config: store "git -c" variables using more robust format
  config: parse more robust format in GIT_CONFIG_PARAMETERS
  config: extract function to parse config pairs
  quote: make sq_dequote_step() a public function
  config: add new way to pass config via `--config-env`
  git: add `--super-prefix` to usage string
2021-01-25 14:19:19 -08:00
Ævar Arnfjörð Bjarmason d162b25f95 tests: remove support for GIT_TEST_GETTEXT_POISON
This removes the ability to inject "poison" gettext() messages via the
GIT_TEST_GETTEXT_POISON special test setup.

I initially added this as a compile-time option in bb946bba76 (i18n:
add GETTEXT_POISON to simulate unfriendly translator, 2011-02-22), and
most recently modified to be toggleable at runtime in
6cdccfce1e (i18n: make GETTEXT_POISON a runtime option, 2018-11-08)..

The reason for its removal is that the trade-off of maintaining it
v.s. what it's getting us has long since flipped. When gettext was
integrated in 5e9637c629 (i18n: add infrastructure for translating
Git with gettext, 2011-11-18) there was understandable concern on the
Git ML that in marking messages for translation en-masse we'd
inadvertently mark plumbing messages. The GETTEXT_POISON facility was
a way to smoke those out via our test suite.

Nowadays however we're done (or almost entirely done) with any marking
of messages for translation. New messages are usually marked by their
authors, who'll know whether it makes sense to translate them or
not. If not any errors in marking the messages are much more likely to
be spotted in review than in the the initial deluge of i18n patches in
the 2011-2012 era.

So let's just remove this. This leaves the test suite in a state where
we still have a lot of test_i18n, C_LOCALE_OUTPUT
etc. uses. Subsequent commits will remove those too.

The change to t/lib-rebase.sh is a selective revert of the relevant
part of f2d17068fd (i18n: rebase-interactive: mark comments of squash
for translation, 2016-06-17), and the comment in
t/t3406-rebase-message.sh is from c7108bf9ed (i18n: rebase: mark
messages for translation, 2012-07-25).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21 15:50:01 -08:00
Junio C Hamano 6dbbae17d9 Merge branch 'ew/decline-core-abbrev'
The configuration variable 'core.abbrev' can be set to 'no' to
force no abbreviation regardless of the hash algorithm.

* ew/decline-core-abbrev:
  core.abbrev=no disables abbreviations
2021-01-15 15:20:28 -08:00
Patrick Steinhardt d8d77153ea config: allow specifying config entries via envvar pairs
While we currently have the `GIT_CONFIG_PARAMETERS` environment variable
which can be used to pass runtime configuration data to git processes,
it's an internal implementation detail and not supposed to be used by
end users.

Next to being for internal use only, this way of passing config entries
has a major downside: the config keys need to be parsed as they contain
both key and value in a single variable. As such, it is left to the user
to escape any potentially harmful characters in the value, which is
quite hard to do if values are controlled by a third party.

This commit thus adds a new way of adding config entries via the
environment which gets rid of this shortcoming. If the user passes the
`GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment
variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each
`i` in `[0,n)`.

While the same can be achieved with `git -c <name>=<value>`, one may
wish to not do so for potentially sensitive information. E.g. if one
wants to set `http.extraHeader` to contain an authentication token,
doing so via `-c` would trivially leak those credentials via e.g. ps(1),
which typically also shows command arguments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:45 -08:00
Patrick Steinhardt 1ff21c05ba config: store "git -c" variables using more robust format
The previous commit added a new format for $GIT_CONFIG_PARAMETERS which
is able to robustly handle subsections with "=" in them. Let's start
writing the new format. Unfortunately, this does much less than you'd
hope, because "git -c" itself has the same ambiguity problem! But it's
still worth doing:

  - we've now pushed the problem from the inter-process communication
    into the "-c" command-line parser. This would free us up to later
    add an unambiguous format there (e.g., separate arguments like "git
    --config key value", etc).

  - for --config-env, the parser already disallows "=" in the
    environment variable name. So:

      git --config-env section.with=equals.key=ENVVAR

    will robustly set section.with=equals.key to the contents of
    $ENVVAR.

The new test shows the improvement for --config-env.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:18 -08:00
Jeff King f9dbb64fad config: parse more robust format in GIT_CONFIG_PARAMETERS
When we stuff config options into GIT_CONFIG_PARAMETERS, we shell-quote
each one as a single unit, like:

  'section.one=value1' 'section.two=value2'

On the reading side, we de-quote to get the individual strings, and then
parse them by splitting on the first "=" we find. This format is
ambiguous, because an "=" may appear in a subsection. So the config
represented in a file by both:

  [section "subsection=with=equals"]
  key = value

and:

  [section]
  subsection = with=equals.key=value

ends up in this flattened format like:

  'section.subsection=with=equals.key=value'

and we can't tell which was desired. We have traditionally resolved this
by taking the first "=" we see starting from the left, meaning that we
allowed arbitrary content in the value, but not in the subsection.

Let's make our environment format a bit more robust by separately
quoting the key and value. That turns those examples into:

  'section.subsection=with=equals.key'='value'

and:

  'section.subsection'='with=equals.key=value'

respectively, and we can tell the difference between them. We can detect
which format is in use for any given element of the list based on the
presence of the unquoted "=". That means we can continue to allow the
old format to work to support any callers which manually used the old
format, and we can even intermingle the two formats. The old format
wasn't documented, and nobody was supposed to be using it. But it's
likely that such callers exist in the wild, so it's nice if we can avoid
breaking them. Likewise, it may be possible to trigger an older version
of "git -c" that runs a script that calls into a newer version of "git
-c"; that new version would see the intermingled format.

This does create one complication, which is that the obvious format in
the new scheme for

  [section]
  some-bool

is:

  'section.some-bool'

with no equals. We'd mistake that for an old-style variable. And it even
has the same meaning in the old style, but:

  [section "with=equals"]
  some-bool

does not. It would be:

  'section.with=equals=some-bool'

which we'd take to mean:

  [section]
  with = equals=some-bool

in the old, ambiguous style. Likewise, we can't use:

  'section.some-bool'=''

because that's ambiguous with an actual empty string. Instead, we'll
again use the shell-quoting to give us a hint, and use:

  'section.some-bool'=

to show that we have no value.

Note that this commit just expands the reading side. We'll start writing
the new format via "git -c" in a future patch. In the meantime, the
existing "git -c" tests will make sure we didn't break reading the old
format. But we'll also add some explicit coverage of the two formats to
make sure we continue to handle the old one after we move the writing
side over.

And one final note: since we're now using the shell-quoting as a
semantically meaningful hint, this closes the door to us ever allowing
arbitrary shell quoting, like:

  'a'shell'would'be'ok'with'this'.key=value

But we have never supported that (only what sq_quote() would produce),
and we are probably better off keeping things simple, robust, and
backwards-compatible, than trying to make it easier for humans. We'll
continue not to advertise the format of the variable to users, and
instead keep "git -c" as the recommended mechanism for setting config
(even if we are trying to be kind not to break users who may be relying
on the current undocumented format).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:18 -08:00
Patrick Steinhardt b342ae61b3 config: extract function to parse config pairs
The function `git_config_parse_parameter` is responsible for parsing a
`foo.bar=baz`-formatted configuration key, sanitizing the key and then
processing it via the given callback function. Given that we're about to
add a second user which is going to process keys which already has keys
and values separated, this commit extracts a function
`config_parse_pair` which only does the sanitization and processing
part as a preparatory step.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12 12:03:18 -08:00
Patrick Steinhardt ce81b1da23 config: add new way to pass config via --config-env
While it's already possible to pass runtime configuration via `git -c
<key>=<value>`, it may be undesirable to use when the value contains
sensitive information. E.g. if one wants to set `http.extraHeader` to
contain an authentication token, doing so via `-c` would trivially leak
those credentials via e.g. ps(1), which typically also shows command
arguments.

To enable this usecase without leaking credentials, this commit
introduces a new switch `--config-env=<key>=<envvar>`. Instead of
directly passing a value for the given key, it instead allows the user
to specify the name of an environment variable. The value of that
variable will then be used as value of the key.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12 12:03:18 -08:00
Eric Wong a9ecaa06a7 core.abbrev=no disables abbreviations
This allows users to write hash-agnostic scripts and configs by
disabling abbreviations.  Using "-c core.abbrev=40" will be
insufficient with SHA-256, and "-c core.abbrev=64" won't work with
SHA-1 repos today.

Signed-off-by: Eric Wong <e@80x24.org>
[jc: tweaked implementation, added doc and a test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-23 13:40:09 -08:00
Junio C Hamano a10e7842ab Merge branch 'ds/config-literal-value'
Various subcommands of "git config" that takes value_regex
learn the "--literal-value" option to take the value_regex option
as a literal string.

* ds/config-literal-value:
  config doc: value-pattern is not necessarily a regexp
  config: implement --fixed-value with --get*
  config: plumb --fixed-value into config API
  config: add --fixed-value option, un-implemented
  t1300: add test for --replace-all with value-pattern
  t1300: test "set all" mode with value-pattern
  config: replace 'value_regex' with 'value_pattern'
  config: convert multi_replace to flags
2020-12-08 15:11:19 -08:00
Derrick Stolee c90702a1f6 config: plumb --fixed-value into config API
The git_config_set_multivar_in_file_gently() and related methods now
take a 'flags' bitfield, so add a new bit representing the --fixed-value
option from 'git config'. This alters the purpose of the value_pattern
parameter to be an exact string match. This requires some initialization
changes in git_config_set_multivar_in_file_gently() and a new strcmp()
call in the matches() method.

The new CONFIG_FLAGS_FIXED_VALUE flag is initialized in builtin/config.c
based on the --fixed-value option, and that needs to be updated in
several callers.

This patch only affects some of the modes of 'git config', and the rest
will be completed in the next change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 14:43:48 -08:00
Derrick Stolee 247e2f822e config: replace 'value_regex' with 'value_pattern'
The 'value_regex' argument in the 'git config' builtin is poorly named,
especially related to an upcoming change that allows exact string
matches instead of ERE pattern matches.

Perform a mostly mechanical change of every instance of 'value_regex' to
'value_pattern' in the codebase. This is only critical for documentation
and error messages, but it is best to be consistent inside the codebase,
too.

For documentation, use 'value-pattern' which is better punctuation. This
affects Documentation/git-config.txt and the usage in builtin/config.c,
which was already mixed between 'value_regex' and 'value-regex'.

I gave some thought to leaving the value_regex variables inside config.c
that are regex_t pointers. However, it is probably best to keep the name
consistent with the rest of the variables.

This does not update the translations inside the po/ directory, as that
creates conflicts with ongoing work. The input strings should
automatically update through automation, and a few of the output strings
currently use "[value_regex]" directly.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 14:43:48 -08:00
Derrick Stolee 504ee1290e config: convert multi_replace to flags
We will extend the flexibility of the config API. Before doing so, let's
take an existing 'int multi_replace' parameter and replace it with a new
'unsigned flags' parameter that can take multiple options as a bit field.

Update all callers that specified multi_replace to now specify the
CONFIG_FLAGS_MULTI_REPLACE flag. To add more clarity, extend the
documentation of git_config_set_multivar_in_file() including a clear
labeling of its arguments. Other config API methods in config.h require
only a change of the final parameter from 'int' to 'unsigned'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 14:43:47 -08:00
Elijah Newren 6da1a25814 hashmap: provide deallocation function names
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Junio C Hamano 0d9a8e33f9 Merge branch 'jk/leakfix'
Code clean-up.

* jk/leakfix:
  submodule--helper: fix leak of core.worktree value
  config: fix leak in git_config_get_expiry_in_days()
  config: drop git_config_get_string_const()
  config: fix leaks from git_config_get_string_const()
  checkout: fix leak of non-existent branch names
  submodule--helper: use strbuf_release() to free strbufs
  clear_pattern_list(): clear embedded hashmaps
2020-08-27 14:04:49 -07:00
Jeff King 1c890016a1 config: fix leak in git_config_get_expiry_in_days()
We use git_config_get_string() to retrieve the expiry value in a newly
allocated string. But after parsing it, we never free it, leaking the
memory.

We could fix this with a free() obviously, but there's an even better
solution: we can use the non-allocating "tmp" variant of the function;
we only need it to be valid for the lifetime of our parse function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 15:35:47 -07:00
Jeff King 9a53219f69 config: drop git_config_get_string_const()
As evidenced by the leak fixes in the previous commit, the "const" in
git_config_get_string_const() clearly misleads people into thinking that
it does not allocate a copy of the string. We can fix this by renaming
it, but it's easier still to just drop it. Of the four remaining
callers:

  - The one in git_config_parse_expiry() still needs to allocate, since
    that's what its callers expect. We can just use the non-const
    version and cast our pointer. Slightly ugly, but the damage is
    contained in one spot.

  - The two in apply are writing to global "const char *" variables, and
    need to continue allocating. We often mark these as const because we
    assign default string literals to them. But in this case we don't do
    that, so we can just declare them as real "char *" pointers and use
    the non-const version.

  - The call in checkout doesn't actually need a copy; it can just use
    the non-allocating "tmp" version of the function.

The function is also mentioned in the MyFirstContribution document. We
can swap that call out for the non-allocating "tmp" variant, which fits
well in the example given.

We'll drop the "configset" and "repo" variants, as well (which are
unused).

Note that this frees up the "const" name, so we could rename the "tmp"
variant back to that. But let's give some time for topics in flight to
adapt to the new code before doing so (if we do it too soon, the
function semantics will change but the compiler won't alert us).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 15:35:47 -07:00
Jeff King f1de981e8b config: fix leaks from git_config_get_string_const()
There are two functions to get a single config string:

  - git_config_get_string()

  - git_config_get_string_const()

One might naively think that the first one allocates a new string and
the second one just points us to the internal configset storage. But
in fact they both allocate a new copy; the second one exists only to
avoid having to cast when using it with a const global which we never
intend to free.

The documentation for the function explains that clearly, but it seems
I'm not alone in being surprised by this. Of 17 calls to the function,
13 of them leak the resulting value.

We could obviously fix these by adding the appropriate free(). But it
would be simpler still if we actually had a non-allocating way to get
the string. There's git_config_get_value() but that doesn't quite do
what we want. If the config key is present but is a boolean with no
value (e.g., "[foo]bar" in the file), then we'll get NULL (whereas the
string versions will print an error and die).

So let's introduce a new variant, git_config_get_string_tmp(), that
behaves as these callers expect. We need a new name because we have new
semantics but the same function signature (so even if we converted the
four remaining callers, topics in flight might be surprised). The "tmp"
is because this value should only be held onto for a short time. In
practice it's rare for us to clear and refresh the configset,
invalidating the pointer, but hopefully the "tmp" makes callers think
about the lifetime. In each of the converted cases here the value only
needs to last within the local function or its immediate caller.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14 10:52:04 -07:00
Jeff King aec0bba106 config: work around gcc-10 -Wstringop-overflow warning
Compiling with gcc-10, -O2, and -fsanitize=undefined results in a
compiler warning:

  config.c: In function ‘git_config_copy_or_rename_section_in_file’:
  config.c:3170:17: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
   3170 |       output[0] = '\t';
        |       ~~~~~~~~~~^~~~~~
  config.c:3076:7: note: at offset -1 to object ‘buf’ with size 1024 declared here
   3076 |  char buf[1024];
        |       ^~~

This is a false positive. The interesting lines of code are:

  int i;
  char *output = buf;
  ...
  for (i = 0; buf[i] && isspace(buf[i]); i++)
          ; /* do nothing */
  ...
  int offset;
  offset = section_name_match(&buf[i], old_name);
  if (offset > 0) {
          ...
          output += offset + i;
          if (strlen(output) > 0) {
		  /*
		   * More content means there's
		   * a declaration to put on the
		   * next line; indent with a
		   * tab
		   */
		  output -= 1;
		  output[0] = '\t';
	  }
  }

So we do assign output to buf initially. Later we increment it based on
"offset" and "i" and then subtract "1" from it. That latter step is what
the compiler is complaining about; it could lead to going off the left
side of the array if "output == buf" at the moment of the subtraction.
For that to be the case, then "offset + i" would have to be 0. But that
can't happen:

  - we know that "offset" is at least 1, since we're in a conditional
    block that checks that

  - we know that "i" is not negative, since it started at 0 and only
    incremented over whitespace

So the sum must be at least 1, and therefore it's OK to subtract one
from "output".

But that's not quite the whole story. Since "i" is an int, it could in
theory be possible to overflow to negative (when counting whitespace on
a very large string). But we know that's impossible because we're
counting the 1024-byte buffer we just fed to fgets(), so it can never be
larger than that.

Switching the type of "i" to "unsigned" makes the warning go away, so
let's do that.

Arguably size_t is an even better type (for this and for the other
length fields), but switching to it produces a similar but distinct
warning:

  config.c: In function ‘git_config_copy_or_rename_section_in_file’:
  config.c:3170:13: error: array subscript -1 is outside array bounds of ‘char[1024]’ [-Werror=array-bounds]
   3170 |       output[0] = '\t';
        |       ~~~~~~^~~
  config.c:3076:7: note: while referencing ‘buf’
   3076 |  char buf[1024];
        |       ^~~

If we were to ever switch off of fgets() to strbuf_getline() or similar,
we'd probably need to use size_t to avoid other overflow problems. But
for now we know we're safe because of the small fixed size of our
buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 09:31:01 -07:00
Jeff King 348482dede config: reject parsing of files over INT_MAX
While the last few commits have made it possible for the config parser
to handle config files up to the limits of size_t, the rest of the code
isn't really ready for this. In particular, we often feed the keys as
strings into printf "%s" format specifiers. And because the printf
family of functions must return an int to specify the result, they
complain. Here are two concrete examples (using glibc; we're in
uncharted territory here so results may vary):

Generate a gigantic .gitmodules file like this:

   git submodule add /some/other/repo foo
   {
           printf '[submodule "'
           perl -e 'print "a" x 2**31'
	   echo '"]path = foo'
   } >.gitmodules
   git commit -m 'huge gitmodule'

then try this:

   $ git show
   BUG: strbuf.c:397: your vsnprintf is broken (returned -1)

The problem is that we end up calling:

   strbuf_addf(&sb, "submodule.%s.ignore", submodule_name);

which relies on vsnprintf(), and that function has no way to report back
a size larger than INT_MAX.

Taking that same file, try this:

  git config --file=.gitmodules --list --name-only

On my system it produces an output with exactly 4GB of spaces. I
confirmed in a debugger that we reach the config callback with the key
intact: it's 2147483663 bytes and full of a's. But when we print it with
this call:

  printf("%s%c", key_, term);

we just get the spaces.

So given the fact that these are insane cases which we have no need to
support, the weird behavior from feeding the results to printf even if
the code is careful, and the possibility of uncareful code introducing
its own integer truncation issues, let's just declare INT_MAX as a limit
for parsing config files.

We'll enforce the limit in get_next_char(), which generalizes over all
sources (blobs, files, etc) and covers any element we're parsing
(whether section, key, value, etc). For simplicity, the limit is over
the length of the _whole_ file, so you couldn't have two 1GB values in
the same file. This should be perfectly fine, as the expected size for
config files is generally kilobytes at most.

With this patch both cases above will yield:

  fatal: bad config line 1 in file .gitmodules

That's not an amazing error message, but the parser isn't set up to
provide specific messages (it just breaks out of the parsing loop and
gives that generic error even if see a syntactic issue). And we really
wouldn't expect to see this case outside of somebody maliciously probing
the limits of the config system.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:58:21 -07:00
Jeff King 6a9c235eb4 config: use size_t to store parsed variable baselen
Most of the config parsing infrastructure is limited in what it can
parse only by the size of memory, because it parses character by
character, building up strbufs for keys, values, etc. One exception is
the "baselen" value we keep in git_parse_source(), which is an int.

That stores the length of the section.subsection base, to which we can
then append individual key names (by truncating back to the baselen with
strbuf_setlen(), and then appending characters for the key name). But
because it's an int, if we see an absurdly long section or subsection,
we may overflow the integer, wrapping negative. That negative value is
then implicitly cast to a size_t when we pass it to strbuf_setlen(),
creating a very large value and triggering a BUG. For example:

  $ {
       printf '[foo "'
       perl -e 'print "a" x 2**31'
       echo '"]bar = value'
    } >huge
  $ git config --file=huge --list
  fatal: BUG: strbuf_setlen() beyond buffer

While this is obviously a silly case that we don't care about
supporting, it's worth fixing it by switching to a size_t for a few
reasons:

  - we should try to avoid hitting BUG assertions at all

  - avoiding integer truncation or overflow sets a good example and
    makes it easier to audit the code for more important issues

  - the BUG outcome is what happens in _this_ instance, because we wrap
    negative. If we used a 2**32 subsection, we'd wrap to a small
    positive value and actually generate wrong output (the subsection of
    our key would be truncated).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:56:45 -07:00
Jeff King f011a9654d git_config_parse_key(): return baselen as size_t
As with the recent change to parse_config_key(), the best type to return
a string length is a size_t, as it won't cause integer truncation for a
gigantic key. And as with that change, this is mostly a clarity /
hygiene issue for now, as our config parser would choke on such a large
key anyway.

There are a few ripple effects within the config code, as callers switch
to using size_t. I also adjusted a few related variables that iterate
over strings. The most unexpected change is that a call to strbuf_addf()
had to switch to strbuf_add(). We can't use a size_t with "%.*s",
because printf precisions must have type "int" (we could cast, of
course, but that would miss the point of using size_t in the first
place).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:52:22 -07:00
Jeff King 6c7e6963c1 config: drop useless length variable in write_pair()
We compute the length of a subset of a string, but then use that length
only to feed a "%.*s" printf placeholder for the same string. We can
just use "%s" to achieve the same thing.

The variable became useless in cb891a5989 (Use a strbuf for building up
section header and key/value pair strings., 2007-12-14), which swapped
out a write() which _did_ use the length for a strbuf_addf() call.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:47:36 -07:00
Jeff King f5914f4b6b parse_config_key(): return subsection len as size_t
We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.

Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).

When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:44:29 -07:00
Junio C Hamano d0038f4b31 Merge branch 'bw/remote-rename-update-config'
"git remote rename X Y" needs to adjust configuration variables
(e.g. branch.<name>.remote) whose value used to be X to Y.
branch.<name>.pushRemote is now also updated.

* bw/remote-rename-update-config:
  remote rename/remove: gently handle remote.pushDefault config
  config: provide access to the current line number
  remote rename/remove: handle branch.<name>.pushRemote config values
  remote: clean-up config callback
  remote: clean-up by returning early to avoid one indentation
  pull --rebase/remote rename: document and honor single-letter abbreviations rebase types
2020-02-25 11:18:32 -08:00
Junio C Hamano 5d55554b1d Merge branch 'mr/show-config-scope'
"git config" learned to show in which "scope", in addition to in
which file, each config setting comes from.

* mr/show-config-scope:
  config: add '--show-scope' to print the scope of a config value
  submodule-config: add subomdule config scope
  config: teach git_config_source to remember its scope
  config: preserve scope in do_git_config_sequence
  config: clarify meaning of command line scoping
  config: split repo scope to local and worktree
  config: make scope_name non-static and rename it
  t1300: create custom config file without special characters
  t1300: fix over-indented HERE-DOCs
  config: fix typo in variable name
2020-02-17 13:22:17 -08:00
Bert Wesarg f2a2327a4a config: provide access to the current line number
Users are nowadays trained to see message from CLI tools in the form

    <file>:<lno>: …

To be able to give such messages when notifying the user about
configurations in any config file, it is currently only possible to get
the file name (if the value originates from a file to begin with) via
`current_config_name()`. Now it is also possible to query the current line
number for the configuration.

Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:52:10 -08:00
Matthew Rogers 9a83d088ee submodule-config: add subomdule config scope
Before the changes to teach git_config_source to remember scope
information submodule-config.c never needed to consider the question of
config scope.  Even though zeroing out git_config_source is still
correct and preserved the previous behavior of setting the scope to
CONFIG_SCOPE_UNKNOWN, it's better to be explicit about such situations
by explicitly setting the scope.  As none of the current config_scope
enumerations make sense we create CONFIG_SCOPE_SUBMODULE to describe the
situation.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:49:12 -08:00
Matthew Rogers e37efa40e1 config: teach git_config_source to remember its scope
There are many situations where the scope of a config command is known
beforehand, such as passing of '--local', '--file', etc. to an
invocation of git config.  However, this information is lost when moving
from builtin/config.c to /config.c.  This historically hasn't been a big
deal, but to prepare for the upcoming --show-scope option we teach
git_config_source to keep track of the source and the config machinery
to use that information to set current_parsing_scope appropriately.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:49:10 -08:00
Matthew Rogers 5c105a842e config: preserve scope in do_git_config_sequence
do_git_config_sequence operated under the assumption that it was correct
to set current_parsing_scope to CONFIG_SCOPE_UNKNOWN as part of the
cleanup it does after it finishes execution.  This is incorrect, as it
blows away the current_parsing_scope if do_git_config_sequence is called
recursively.  As such situations are rare (git config running with the
'--blob' option is one example) this has yet to cause a problem, but the
upcoming '--show-scope' option will experience issues in that case, lets
teach do_git_config_sequence to preserve the current_parsing_scope from
before it started execution.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:39:04 -08:00
Matthew Rogers 6766e41b8a config: clarify meaning of command line scoping
CONFIG_SCOPE_CMDLINE is generally used in the code to refer to config
values passed in via the -c option.  Options passed in using this
mechanism share similar scoping characteristics with the --file and
--blob options of the 'config' command, namely that they are only in use
for that single invocation of git, and that they supersede the normal
system/global/local hierarchy.  This patch introduces
CONFIG_SCOPE_COMMAND to reflect this new idea, which also makes
CONFIG_SCOPE_CMDLINE redundant.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:39:02 -08:00
Matthew Rogers 6dc905d974 config: split repo scope to local and worktree
Previously when iterating through git config variables, worktree config
and local config were both considered "CONFIG_SCOPE_REPO".  This was
never a problem before as no one had needed to differentiate between the
two cases, but future functionality may care whether or not the config
options come from a worktree or from the repository's actual local
config file.  For example, the planned feature to add a '--show-scope'
to config to allow a user to see which scope listed config options come
from would confuse users if it just printed 'repo' rather than 'local'
or 'worktree' as the documentation would lead them to expect.  As well
as the additional benefit of making the implementation look more like
how the documentation describes the interface.

To accomplish this we split out what was previously considered repo
scope to be local and worktree.

The clients of 'current_config_scope()' who cared about
CONFIG_SCOPE_REPO are also modified to similarly care about
CONFIG_SCOPE_WORKTREE and CONFIG_SCOPE_LOCAL to preserve previous behavior.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:32:20 -08:00
Matthew Rogers a5cb4204b6 config: make scope_name non-static and rename it
To prepare for the upcoming --show-scope option, we require the ability
to convert a config_scope enum to a string.  As this was originally
implemented as a static function 'scope_name()' in
t/helper/test-config.c, we expose it via config.h and give it a less
ambiguous name 'config_scope_name()'

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:32:18 -08:00
René Scharfe a91cc7fad0 strbuf: add and use strbuf_insertstr()
Add a function for inserting a C string into a strbuf.  Use it
throughout the source to get rid of magic string length constants and
explicit strlen() calls.

Like strbuf_addstr(), implement it as an inline function to avoid the
implicit strlen() calls to cause runtime overhead.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 09:04:45 -08:00
Derrick Stolee 879321eb0b sparse-checkout: add 'cone' mode
The sparse-checkout feature can have quadratic performance as
the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can
be very significant.

Create a new Boolean config option, core.sparseCheckoutCone, to
indicate that we expect the sparse-checkout file to contain a
more limited set of patterns. This is a separate config setting
from core.sparseCheckout to avoid breaking older clients by
introducing a tri-state option.

The config option does nothing right now, but will be expanded
upon in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Junio C Hamano 5efabc7ed9 Merge branch 'ew/hashmap'
Code clean-up of the hashmap API, both users and implementation.

* ew/hashmap:
  hashmap_entry: remove first member requirement from docs
  hashmap: remove type arg from hashmap_{get,put,remove}_entry
  OFFSETOF_VAR macro to simplify hashmap iterators
  hashmap: introduce hashmap_free_entries
  hashmap: hashmap_{put,remove} return hashmap_entry *
  hashmap: use *_entry APIs for iteration
  hashmap_cmp_fn takes hashmap_entry params
  hashmap_get{,_from_hash} return "struct hashmap_entry *"
  hashmap: use *_entry APIs to wrap container_of
  hashmap_get_next returns "struct hashmap_entry *"
  introduce container_of macro
  hashmap_put takes "struct hashmap_entry *"
  hashmap_remove takes "const struct hashmap_entry *"
  hashmap_get takes "const struct hashmap_entry *"
  hashmap_add takes "struct hashmap_entry *"
  hashmap_get_next takes "const struct hashmap_entry *"
  hashmap_entry_init takes "struct hashmap_entry *"
  packfile: use hashmap_entry in delta_base_cache_entry
  coccicheck: detect hashmap_entry.hash assignment
  diff: use hashmap_entry_init on moved_entry.ent
2019-10-15 13:48:02 +09:00
Junio C Hamano 676278f8ea Merge branch 'bc/object-id-part17'
Preparation for SHA-256 upgrade continues.

* bc/object-id-part17: (26 commits)
  midx: switch to using the_hash_algo
  builtin/show-index: replace sha1_to_hex
  rerere: replace sha1_to_hex
  builtin/receive-pack: replace sha1_to_hex
  builtin/index-pack: replace sha1_to_hex
  packfile: replace sha1_to_hex
  wt-status: convert struct wt_status to object_id
  cache: remove null_sha1
  builtin/worktree: switch null_sha1 to null_oid
  builtin/repack: write object IDs of the proper length
  pack-write: use hash_to_hex when writing checksums
  sequencer: convert to use the_hash_algo
  bisect: switch to using the_hash_algo
  sha1-lookup: switch hard-coded constants to the_hash_algo
  config: use the_hash_algo in abbrev comparison
  combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo
  bundle: switch to use the_hash_algo
  connected: switch GIT_SHA1_HEXSZ to the_hash_algo
  show-index: switch hard-coded constants to the_hash_algo
  blame: remove needless comparison with GIT_SHA1_HEXSZ
  ...
2019-10-11 14:24:46 +09:00
Eric Wong 404ab78e39 hashmap: remove type arg from hashmap_{get,put,remove}_entry
Since these macros already take a `keyvar' pointer of a known type,
we can rely on OFFSETOF_VAR to get the correct offset without
relying on non-portable `__typeof__' and `offsetof'.

Argument order is also rearranged, so `keyvar' and `member' are
sequential as they are used as: `keyvar->member'

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:12 +09:00
Eric Wong 23dee69f53 OFFSETOF_VAR macro to simplify hashmap iterators
While we cannot rely on a `__typeof__' operator being portable
to use with `offsetof'; we can calculate the pointer offset
using an existing pointer and the address of a member using
pointer arithmetic for compilers without `__typeof__'.

This allows us to simplify usage of hashmap iterator macros
by not having to specify a type when a pointer of that type
is already given.

In the future, list iterator macros (e.g. list_for_each_entry)
may also be implemented using OFFSETOF_VAR to save hackers the
trouble of using container_of/list_entry macros and without
relying on non-portable `__typeof__'.

v3: use `__typeof__' to avoid clang warnings

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong c8e424c9c9 hashmap: introduce hashmap_free_entries
`hashmap_free_entries' behaves like `container_of' and passes
the offset of the hashmap_entry struct to the internal
`hashmap_free_' function, allowing the function to free any
struct pointer regardless of where the hashmap_entry field
is located.

`hashmap_free' no longer takes any arguments aside from
the hashmap itself.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong 87571c3f71 hashmap: use *_entry APIs for iteration
Inspired by list_for_each_entry in the Linux kernel.
Once again, these are somewhat compromised usability-wise
by compilers lacking __typeof__ support.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong 939af16eac hashmap_cmp_fn takes hashmap_entry params
Another step in eliminating the requirement of hashmap_entry
being the first member of a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong f23a465132 hashmap_get{,_from_hash} return "struct hashmap_entry *"
Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash
or container_of as appropriate.

This is another step towards eliminating the requirement of
hashmap_entry being the first field in a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong b6c5241606 hashmap_get takes "const struct hashmap_entry *"
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong b94e5c1df6 hashmap_add takes "struct hashmap_entry *"
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong d22245a2e3 hashmap_entry_init takes "struct hashmap_entry *"
C compilers do type checking to make life easier for us.  So
rely on that and update all hashmap_entry_init callers to take
"struct hashmap_entry *" to avoid future bugs while improving
safety and readability.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:09 +09:00
Junio C Hamano b9ac6c59b8 Merge branch 'cc/multi-promisor'
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.

* cc/multi-promisor:
  Move core_partial_clone_filter_default to promisor-remote.c
  Move repository_format_partial_clone to promisor-remote.c
  Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
  remote: add promisor and partial clone config to the doc
  partial-clone: add multiple remotes in the doc
  t0410: test fetching from many promisor remotes
  builtin/fetch: remove unique promisor remote limitation
  promisor-remote: parse remote.*.partialclonefilter
  Use promisor_remote_get_direct() and has_promisor_remote()
  promisor-remote: use repository_format_partial_clone
  promisor-remote: add promisor_remote_reinit()
  promisor-remote: implement promisor_remote_get_direct()
  Add initial support for many promisor remotes
  fetch-object: make functions return an error code
  t0410: remove pipes after git commands
2019-09-18 11:50:09 -07:00
Junio C Hamano f4f8dfe127 Merge branch 'ds/feature-macros'
A mechanism to affect the default setting for a (related) group of
configuration variables is introduced.

* ds/feature-macros:
  repo-settings: create feature.experimental setting
  repo-settings: create feature.manyFiles setting
  repo-settings: parse core.untrackedCache
  commit-graph: turn on commit-graph by default
  t6501: use 'git gc' in quiet mode
  repo-settings: consolidate some config settings
2019-09-09 12:26:36 -07:00
brian m. carlson fe9fec45f6 config: use the_hash_algo in abbrev comparison
Switch one use of a hard-coded 40 constant to use the_hash_algo.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 15:04:58 -07:00
Derrick Stolee ad0fb65999 repo-settings: parse core.untrackedCache
The core.untrackedCache config setting is slightly complicated,
so clarify its use and centralize its parsing into the repo
settings.

The default value is "keep" (returned as -1), which persists the
untracked cache if it exists.

If the value is set as "false" (returned as 0), then remove the
untracked cache if it exists.

If the value is set as "true" (returned as 1), then write the
untracked cache and persist it.

Instead of relying on magic values of -1, 0, and 1, split these
options into an enum. This allows the use of "-1" as a
default value. After parsing the config options, if the value is
unset we can initialize it to UNTRACKED_CACHE_KEEP.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13 13:33:55 -07:00
Jeff King 22932d9169 config: stop checking whether the_repository is NULL
Since the previous commit, our invariant that the_repository is never
NULL is restored, and we can stop being defensive in include_by_branch().

We can confirm the fix by showing that an onbranch config include will
not cause a segfault when run outside a git repository. I've put this in
t1309-early-config since it's related to the case added by 85fe0e800c
(config: work around bug with includeif:onbranch and early config,
2019-07-31), though technically the issue was with
read_very_early_config() and not read_early_config().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-06 13:09:01 -07:00
Johannes Schindelin 85fe0e800c config: work around bug with includeif:onbranch and early config
Since 07b2c0eaca (config: learn the "onbranch:" includeIf condition,
2019-06-05), there is a potential catch-22 in the early config path: if
the `include.onbranch:` feature is used, Git assumes that the Git
directory has been initialized already. However, in the early config
code path that is not true.

One way to trigger this is to call the following commands in any
repository:

	git config includeif.onbranch:refs/heads/master.path broken
	git help -a

The symptom triggered by the `git help -a` invocation reads like this:

BUG: refs.c:1851: attempting to get main_ref_store outside of repository

Let's work around this, simply by ignoring the `includeif.onbranch:`
setting when parsing the config when the ref store has not been
initialized (yet).

Technically, there is a way to solve this properly: teach the refs
machinery to initialize the ref_store from a given gitdir/commondir pair
(which we _do_ have in the early config code path), and then use that in
`include_by_branch()`. This, however, is a pretty involved project, and
we're already in the feature freeze for Git v2.23.0.

Note: when calling above-mentioned two commands _outside_ of any Git
worktree (passing the `--global` flag to `git config`, as there is
obviously no repository config available), at the point when
`include_by_branch()` is called, `the_repository` is `NULL`, therefore
we have to be extra careful not to dereference it in that case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-31 15:20:56 -07:00
Junio C Hamano 023ff4cdf5 Merge branch 'ab/test-env'
Many GIT_TEST_* environment variables control various aspects of
how our tests are run, but a few followed "non-empty is true, empty
or unset is false" while others followed the usual "there are a few
ways to spell true, like yes, on, etc., and also ways to spell
false, like no, off, etc." convention.

* ab/test-env:
  env--helper: mark a file-local symbol as static
  tests: make GIT_TEST_FAIL_PREREQS a boolean
  tests: replace test_tristate with "git env--helper"
  tests README: re-flow a previously changed paragraph
  tests: make GIT_TEST_GETTEXT_POISON a boolean
  t6040 test: stop using global "script" variable
  config.c: refactor die_bad_number() to not call gettext() early
  env--helper: new undocumented builtin wrapping git_env_*()
  config tests: simplify include cycle test
2019-07-25 13:59:20 -07:00
Junio C Hamano 9331bdbaf4 Merge branch 'rs/config-unit-parsing'
The code to parse scaled numbers out of configuration files has
been made more robust and also easier to follow.

* rs/config-unit-parsing:
  config: simplify parsing of unit factors
  config: don't multiply in parse_unit_factor()
  config: use unsigned_mult_overflows to check for overflows
2019-07-09 15:25:41 -07:00
Junio C Hamano 7785f9ba03 Merge branch 'js/gcc-8-and-9'
Code clean-up for new compilers.

* js/gcc-8-and-9:
  config: avoid calling `labs()` on too-large data type
  winansi: simplify loading the GetCurrentConsoleFontEx() function
  kwset: allow building with GCC 8
  poll (mingw): allow compiling with GCC 8 and DEVELOPER=1
2019-07-09 15:25:41 -07:00
Christian Couder 4ca9474efa Move core_partial_clone_filter_default to promisor-remote.c
Now that we can have a different default partial clone filter for
each promisor remote, let's hide core_partial_clone_filter_default
as a static in promisor-remote.c to avoid it being use for
anything other than managing backward compatibility.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:38 -07:00
René Scharfe 39c575c969 config: simplify parsing of unit factors
Just return the value of the factor or zero for unrecognized strings
instead of using an output reference and a separate return value to
indicate success.  This is shorter and simpler.

It basically reverts that function to before c8deb5a146 ("Improve error
messages when int/long cannot be parsed from config", 2007-12-25), while
keeping the better messages, so restore its old name, get_unit_factor(),
as well.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-24 12:34:20 -07:00
René Scharfe 664178e8e2 config: don't multiply in parse_unit_factor()
parse_unit_factor() multiplies the number that is passed to it with the
value of a recognized unit factor (K, M or G for 2^10, 2^20 and 2^30,
respectively).  All callers pass in 1 as a number, though, which allows
them to check the actual multiplication for overflow before they are
doing it themselves.

Ignore the passed in number and don't multiply, as this feature of
parse_unit_factor() is not used anymore.  Rename the output parameter to
reflect that it's not about the end result anymore, but just about the
unit factor.

Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-24 12:34:19 -07:00
René Scharfe 3fb72c7b4d config: use unsigned_mult_overflows to check for overflows
parse_unit_factor() checks if a K, M or G is present after a number and
multiplies it by 2^10, 2^20 or 2^30, respectively.  One of its callers
checks if the result is smaller than the number alone to detect
overflows.  The other one passes 1 as the number and does multiplication
and overflow check itself in a similar manner.

This works, but is inconsistent, and it would break if we added support
for a bigger unit factor.  E.g. 16777217T is 2^64 + 2^40, i.e. too big
for a 64-bit number.  Modulo 2^64 we get 2^40 == 1TB, which is bigger
than the raw number 16777217 == 2^24 + 1, so the overflow would go
undetected by that method.

Let both callers pass 1 and handle overflow check and multiplication
themselves.  Do the check before the multiplication, using
unsigned_mult_overflows, which is simpler and can deal with larger unit
factors.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-24 12:34:17 -07:00
Ævar Arnfjörð Bjarmason 1ff750b128 tests: make GIT_TEST_GETTEXT_POISON a boolean
Change the GIT_TEST_GETTEXT_POISON variable from being "non-empty?" to
being a more standard boolean variable.

Since it needed to be checked in both C code and shellscript (via test
-n) it was one of the remaining shellscript-like variables. Now that
we have "env--helper" we can change that.

There's a couple of tricky edge cases that arise because we're using
git_env_bool() early, and the config-reading "env--helper".

If GIT_TEST_GETTEXT_POISON is set to an invalid value die_bad_number()
will die, but to do so it would usually call gettext(). Let's detect
the special case of GIT_TEST_GETTEXT_POISON and always emit that
message in the C locale, lest we infinitely loop.

As seen in the updated tests in t0017-env-helper.sh there's also a
caveat related to "env--helper" needing to read the config for trace2
purposes.

Since the C_LOCALE_OUTPUT prerequisite is lazy and relies on
"env--helper" we could get invalid results if we failed to read the
config (e.g. because we'd loop on includes) when combined with
e.g. "test_i18ngrep" wanting to check with "env--helper" if
GIT_TEST_GETTEXT_POISON was true or not.

I'm crossing my fingers and hoping that a test similar to the one I
removed in the earlier "config tests: simplify include cycle test"
change in this series won't happen again, and testing for this
explicitly in "env--helper"'s own tests.

This change breaks existing uses of
e.g. GIT_TEST_GETTEXT_POISON=YesPlease, which we've documented in
po/README and other places. As noted in [1] we might want to consider
also accepting "YesPlease" in "env--helper" as a special-case.

But as the lack of uproar over 6cdccfce1e ("i18n: make GETTEXT_POISON
a runtime option", 2018-11-08) demonstrates the audience for this
option is a really narrow set of git developers, who shouldn't have
much trouble modifying their test scripts, so I think it's better to
deal with that minor headache now and make all the relevant GIT_TEST_*
variables boolean in the same way than carry the "YesPlease"
special-case forward.

1. https://public-inbox.org/git/xmqqtvckm3h8.fsf@gitster-ct.c.googlers.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-21 09:42:49 -07:00
Ævar Arnfjörð Bjarmason 2e43cd4caa config.c: refactor die_bad_number() to not call gettext() early
Prepare die_bad_number() for a change to specially handle
GIT_TEST_GETTEXT_POISON calling git_env_bool() by making
die_bad_number() not call gettext() early, which would in turn call
git_env_bool().

There's no meaningful change here yet, just a re-arrangement of the
current code to make that subsequent change easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-21 09:42:49 -07:00
Johannes Schindelin 9dae4fe79f config: avoid calling labs() on too-large data type
The `labs()` function operates, as the initial `l` suggests, on `long`
parameters. However, in `config.c` we tried to use it on values of type
`intmax_t`.

This problem was found by GCC v9.x.

To fix it, let's just "unroll" the function (i.e. negate the value if it
is negative).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 09:34:17 -07:00
Denton Liu 07b2c0eaca config: learn the "onbranch:" includeIf condition
Currently, if a user wishes to have individual settings per branch, they
are required to manually keep track of the settings in their head and
manually set the options on the command-line or change the config at
each branch.

Teach config the "onbranch:" includeIf condition so that it can
conditionally include configuration files if the branch that is checked
out in the current worktree matches the pattern given.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-05 14:38:28 -07:00
Junio C Hamano 5b2d1c0c6e Merge branch 'jh/trace2-sid-fix'
Polishing of the new trace2 facility continues.  The system-level
configuration can specify site-wide trace2 settings, which can be
overridden with per-user configuration and environment variables.

* jh/trace2-sid-fix:
  trace2: fixup access problem on /etc/gitconfig in read_very_early_config
  trace2: update docs to describe system/global config settings
  trace2: make SIDs more unique
  trace2: clarify UTC datetime formatting
  trace2: report peak memory usage of the process
  trace2: use system/global config for default trace2 settings
  config: add read_very_early_config()
  trace2: find exec-dir before trace2 initialization
  trace2: add absolute elapsed time to start event
  trace2: refactor setting process starting time
  config: initialize opts structure in repo_read_config()
2019-05-13 23:50:31 +09:00
Jeff Hostetler f672deec2d trace2: fixup access problem on /etc/gitconfig in read_very_early_config
Teach do_git_config_sequence() to optionally gently check for access to
the system config.  Use this option in read_very_early_config() when
initializing trace2.

In [1] SZEDER Gábor reported that my changes in [2] introduced a
regression when the user does not have permission to read the system
config.

This commit addresses that problem by optionally ignoring that error.

[1] https://public-inbox.org/git/285beb2b2d740ce20fdd8af1becf371ab39703db.1554995916.git.gitgitgadget@gmail.com/T/#m342e839289aec515523a98b5e34d7f42d3f1fd79
[2] https://public-inbox.org/git/285beb2b2d740ce20fdd8af1becf371ab39703db.1554995916.git.gitgitgadget@gmail.com/T/#m11b59c9228c698442f750ee8f9b10c629399ae48

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-07 10:13:54 +09:00
Junio C Hamano 078b254deb Merge branch 'nd/include-if-wildmatch'
A buglet in configuration parser has been fixed.

* nd/include-if-wildmatch:
  config: correct '**' matching in includeIf patterns
2019-04-22 11:14:46 +09:00
Jeff Hostetler 800a7f99a8 config: add read_very_early_config()
Created an even lighter version of read_early_config() that
only looks at system and global config settings.  It omits
repo-local, worktree-local, and command-line settings.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16 13:37:06 +09:00
Jeff Hostetler 1703751f21 config: initialize opts structure in repo_read_config()
Initialize opts structure in repo_read_config().

This change fixes a crash in later commit after a new field is added
to the structure.

In commit 3b256228a6, repo_read_config()
was added.  It only initializes 3 fields in the opts structure.  It is
passed to config_with_options() and then to do_git_config_sequence().
However, do_git_config_sequence() drops the opts on the floor and calls
git_config_from_file() rather than git_config_from_file_with_options(),
so that may be why this hasn't been a problem in the past.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16 13:37:06 +09:00
Nguyễn Thái Ngọc Duy 19e7fdaa58 config: correct '**' matching in includeIf patterns
The current wildmatch() call for includeIf's gitdir pattern does not
pass the WM_PATHNAME flag. Without this flag, '*' is treated _almost_
the same as '**' (because '*' also matches slashes) with one exception:

'/**/' can match a single slash. The pattern 'foo/**/bar' matches
'foo/bar'.

But '/*/', which is essentially what wildmatch engine sees without
WM_PATHNAME, has to match two slashes (and '*' matches nothing). Which
means 'foo/*/bar' cannot match 'foo/bar'. It can only match 'foo//bar'.

The result of this is the current wildmatch() call works most of the
time until the user depends on '/**/' matching no path component. And
also '*' matches slashes while it should not, but people probably
haven't noticed this yet. The fix is straightforward.

Reported-by: Jason Karns <jason.karns@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 14:19:47 +09:00
Junio C Hamano 32038fef00 Merge branch 'jh/trace2'
A more structured way to obtain execution trace has been added.

* jh/trace2:
  trace2: add for_each macros to clang-format
  trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh
  trace2:data: add subverb for rebase
  trace2:data: add subverb to reset command
  trace2:data: add subverb to checkout command
  trace2:data: pack-objects: add trace2 regions
  trace2:data: add trace2 instrumentation to index read/write
  trace2:data: add trace2 hook classification
  trace2:data: add trace2 transport child classification
  trace2:data: add trace2 sub-process classification
  trace2:data: add editor/pager child classification
  trace2:data: add trace2 regions to wt-status
  trace2: collect Windows-specific process information
  trace2: create new combined trace facility
  trace2: Documentation/technical/api-trace2.txt
2019-03-07 09:59:56 +09:00
Junio C Hamano 4e021dc28e Merge branch 'wh/author-committer-ident-config'
Four new configuration variables {author,committer}.{name,email}
have been introduced to override user.{name,email} in more specific
cases.

* wh/author-committer-ident-config:
  config: allow giving separate author and committer idents
2019-03-07 09:59:53 +09:00
Jeff Hostetler ee4512ed48 trace2: create new combined trace facility
Create a new unified tracing facility for git.  The eventual intent is to
replace the current trace_printf* and trace_performance* routines with a
unified set of git_trace2* routines.

In addition to the usual printf-style API, trace2 provides higer-level
event verbs with fixed-fields allowing structured data to be written.
This makes post-processing and analysis easier for external tools.

Trace2 defines 3 output targets.  These are set using the environment
variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT".  These may be
set to "1" or to an absolute pathname (just like the current GIT_TRACE).

* GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command
  summary data.

* GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE.
  It extends the output with columns for the command process, thread,
  repo, absolute and relative elapsed times.  It reports events for
  child process start/stop, thread start/stop, and per-thread function
  nesting.

* GIT_TR2_EVENT is a new structured format. It writes event data as a
  series of JSON records.

Calls to trace2 functions log to any of the 3 output targets enabled
without the need to call different trace_printf* or trace_performance*
routines.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22 15:27:59 -08:00
William Hubbs 39ab4d0951 config: allow giving separate author and committer idents
The author.email, author.name, committer.email and committer.name
settings are analogous to the GIT_AUTHOR_* and GIT_COMMITTER_*
environment variables, but for the git config system. This allows them
to be set separately for each repository.

Git supports setting different authorship and committer
information with environment variables. However, environment variables
are set in the shell, so if different authorship and committer
information is needed for different repositories an external tool is
required.

This adds support to git config for author.email, author.name,
committer.email and committer.name  settings so this information
can be set per repository.

Also, it generalizes the fmt_ident function so it can handle author vs
committer identification.

Signed-off-by: William Hubbs <williamh@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-04 12:18:13 -08:00
Jeff King a263ea84d1 config: drop unused parameter from maybe_remove_section()
We don't need the contents buffer to drop a section; the parse
information in the config_store_data parameter is enough for our logic.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24 12:35:44 -08:00
Jonathan Nieder 2a9dedef2e index: make index.threads=true enable ieot and eoie
If a user explicitly sets

	[index]
		threads = true

to read the index using multiple threads, ensure that index writes
include the offset table by default to make that possible.  This
ensures that the user's intent of turning on threading is respected.

In other words, permit the following configurations:

- index.threads and index.recordOffsetTable unspecified: do not write
  the offset table yet (to avoid alarming the user with "ignoring IEOT
  extension" messages when an older version of Git accesses the
  repository) but do make use of multiple threads to read the index if
  the supporting offset table is present.

  This can also be requested explicitly by setting index.threads=true,
  0, or >1 and index.recordOffsetTable=false.

- index.threads=false or 1: do not write the offset table, and do not
  make use of the offset table.

  One can set index.recordOffsetTable=false as well, to be more
  explicit.

- index.threads=true, 0, or >1 and index.recordOffsetTable unspecified:
  write the offset table and make use of threads at read time.

  This can also be requested by setting index.threads=true, 0, >1, or
  unspecified and index.recordOffsetTable=true.

Fortunately the complication is temporary: once most Git installations
have upgraded to a version with support for the IEOT and EOIE
extensions, we can flip the defaults for index.recordEndOfIndexEntries
and index.recordOffsetTable to true and eliminate the settings.

Helped-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21 16:46:54 +09:00
Johannes Schindelin 44004872c8 config: report a bug if git_dir exists without commondir
This did happen at some stage, and was fixed relatively quickly. Make
sure that we detect very quickly, too, should that happen again.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-16 11:54:01 +09:00
Junio C Hamano 6c268fdda9 Merge branch 'js/mingw-perl5lib'
Windows fix.

* js/mingw-perl5lib:
  mingw: unset PERL5LIB by default
  config: move Windows-specific config settings into compat/mingw.c
  config: allow for platform-specific core.* config settings
  config: rename `dummy` parameter to `cb` in git_default_config()
2018-11-13 22:37:20 +09:00
Johannes Schindelin bdfbb0ea93 config: move Windows-specific config settings into compat/mingw.c
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-31 12:46:27 +09:00
Johannes Schindelin 70fc5793df config: allow for platform-specific core.* config settings
In the Git for Windows project, we have ample precendent for config
settings that apply to Windows, and to Windows only.

Let's formalize this concept by introducing a platform_core_config()
function that can be #define'd in a platform-specific manner.

This will allow us to contain platform-specific code better, as the
corresponding variables no longer need to be exported so that they can
be defined in environment.c and be set in config.c

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-31 12:46:21 +09:00
Johannes Schindelin 409670f34e config: rename dummy parameter to cb in git_default_config()
This is the convention elsewhere (and prepares for the case where we may
need to pass callback data).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-31 12:45:30 +09:00
Nguyễn Thái Ngọc Duy 58b284a2e9 worktree: add per-worktree config files
A new repo extension is added, worktreeConfig. When it is present:

 - Repository config reading by default includes $GIT_DIR/config _and_
   $GIT_DIR/config.worktree. "config" file remains shared in multiple
   worktree setup.

 - The special treatment for core.bare and core.worktree, to stay
   effective only in main worktree, is gone. These config settings are
   supposed to be in config.worktree.

This extension is most useful in multiple worktree setup because you
now have an option to store per-worktree config (which is either
.git/config.worktree for main worktree, or
.git/worktrees/xx/config.worktree for linked ones).

This extension can be used in single worktree mode, even though it's
pretty much useless (but this can happen after you remove all linked
worktrees and move back to single worktree).

"git config" reads from both "config" and "config.worktree" by default
(i.e. without either --user, --file...) when this extension is
present. Default writes still go to "config", not "config.worktree". A
new option --worktree is added for that (*).

Since a new repo extension is introduced, existing git binaries should
refuse to access to the repo (both from main and linked worktrees). So
they will not misread the config file (i.e. skip the config.worktree
part). They may still accidentally write to the config file anyway if
they use with "git config --file <path>".

This design places a bet on the assumption that the majority of config
variables are shared so it is the default mode. A safer move would be
default writes go to per-worktree file, so that accidental changes are
isolated.

(*) "git config --worktree" points back to "config" file when this
    extension is not present and there is only one worktree so that it
    works in any both single and multiple worktree setups.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-22 13:17:04 +09:00
Junio C Hamano e27bfaaee3 Merge branch 'bp/read-cache-parallel'
A new extension to the index file has been introduced, which allows
the file to be read in parallel.

* bp/read-cache-parallel:
  read-cache: load cache entries on worker threads
  ieot: add Index Entry Offset Table (IEOT) extension
  read-cache: load cache extensions on a worker thread
  config: add new index.threads config setting
  eoie: add End of Index Entry (EOIE) extension
  read-cache: clean up casting and byte decoding
  read-cache.c: optimize reading index format v4
2018-10-19 13:34:03 +09:00
Ben Peart c780b9cfe8 config: add new index.threads config setting
Add support for a new index.threads config setting which will be used to
control the threading code in do_read_index().  A value of 0 will tell the
index code to automatically determine the correct number of threads to use.
A value of 1 will make the code single threaded.  A value greater than 1
will set the maximum number of threads to use.

For testing purposes, this setting can be overwritten by setting the
GIT_TEST_INDEX_THREADS=<n> environment variable to a value greater than 0.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11 15:32:48 +09:00
Ben Peart 4cb54d0aa8 fsmonitor: update GIT_TEST_FSMONITOR support
Rename GIT_FSMONITOR_TEST to GIT_TEST_FSMONITOR for consistency with the
other GIT_TEST_ special setups and properly document its use.

Add logic in t/test-lib.sh to give a warning when the old variable is set to
let people know they need to update their environment to use the new
variable.

Remove the outdated instructions on how to run the test suite utilizing
fsmonitor now that it is properly documented in t/README.

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-28 11:40:38 -07:00
Jean-Noël Avila 27c929edd6 i18n: fix mistakes in translated strings
Fix typos and convert a question which does not expect to be replied
to a simple advice.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 14:29:12 -07:00
Junio C Hamano 2a2c18f1c3 Merge branch 'sb/config-write-fix'
Recent update to "git config" broke updating variable in a
subsection, which has been corrected.

* sb/config-write-fix:
  git-config: document accidental multi-line setting in deprecated syntax
  config: fix case sensitive subsection names on writing
  t1300: document current behavior of setting options
2018-08-20 12:41:32 -07:00
Junio C Hamano 5ade034464 Merge branch 'en/incl-forward-decl'
Code hygiene improvement for the header files.

* en/incl-forward-decl:
  Remove forward declaration of an enum
  compat/precompose_utf8.h: use more common include guard style
  urlmatch.h: fix include guard
  Move definition of enum branch_track from cache.h to branch.h
  alloc: make allocate_alloc_state and clear_alloc_state more consistent
  Add missing includes and forward declarations
2018-08-20 12:41:32 -07:00
Junio C Hamano c5d276cb18 Merge branch 'mk/http-backend-content-length'
The http-backend (used for smart-http transport) used to slurp the
whole input until EOF, without paying attention to CONTENT_LENGTH
that is supplied in the environment and instead expecting the Web
server to close the input stream.  This has been fixed.

* mk/http-backend-content-length:
  t5562: avoid non-portable "export FOO=bar" construct
  http-backend: respect CONTENT_LENGTH for receive-pack
  http-backend: respect CONTENT_LENGTH as specified by rfc3875
  http-backend: cleanup writing to child process
2018-08-17 13:09:57 -07:00
Junio C Hamano 4bea8485e3 Merge branch 'nd/i18n'
Many more strings are prepared for l10n.

* nd/i18n: (23 commits)
  transport-helper.c: mark more strings for translation
  transport.c: mark more strings for translation
  sha1-file.c: mark more strings for translation
  sequencer.c: mark more strings for translation
  replace-object.c: mark more strings for translation
  refspec.c: mark more strings for translation
  refs.c: mark more strings for translation
  pkt-line.c: mark more strings for translation
  object.c: mark more strings for translation
  exec-cmd.c: mark more strings for translation
  environment.c: mark more strings for translation
  dir.c: mark more strings for translation
  convert.c: mark more strings for translation
  connect.c: mark more strings for translation
  config.c: mark more strings for translation
  commit-graph.c: mark more strings for translation
  builtin/replace.c: mark more strings for translation
  builtin/pack-objects.c: mark more strings for translation
  builtin/grep.c: mark strings for translation
  builtin/config.c: mark more strings for translation
  ...
2018-08-15 15:08:23 -07:00