Commit graph

61950 commits

Author SHA1 Message Date
Jeff King ee4e22554f run-command: document use_shell option
It's unclear how run-command's use_shell option should impact the
arguments fed to a command. Plausibly it could mean that we glue all of
the arguments together into a string to pass to the shell, in which case
that opens the question of whether the caller needs to quote them.

But in fact we don't implement it that way (and even if we did, we'd
probably auto-quote the arguments as part of the glue step). And we must
not receive quoted arguments, because we might actually optimize out the
shell entirely (i.e., the caller does not even know if a shell will be
involved in the end or not).

Since this ambiguity may have been the cause of a recent bug, let's
document the option a bit.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-22 14:21:32 -08:00
Phil Hord 8198907795 use delete_refs when deleting tags or branches
'git tag -d' accepts one or more tag refs to delete, but each deletion
is done by calling `delete_ref` on each argv. This is very slow when
removing from packed refs. Use delete_refs instead so all the removals
can be done inside a single transaction with a single update.

Do the same for 'git branch -d'.

Since delete_refs performs all the packed-refs delete operations
inside a single transaction, if any of the deletes fail then all
them will be skipped. In practice, none of them should fail since
we verify the hash of each one before calling delete_refs, but some
network error or odd permissions problem could have different results
after this change.

Also, since the file-backed deletions are not performed in the same
transaction, those could succeed even when the packed-refs transaction
fails.

After deleting branches, remove the branch config only if the branch
ref was removed and was not subsequently added back in.

A manual test deleting 24,000 tags took about 30 minutes using
delete_ref.  It takes about 5 seconds using delete_refs.

Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21 16:05:05 -08:00
Jeff King 36a317929b refs: switch peel_ref() to peel_iterated_oid()
The peel_ref() interface is confusing and error-prone:

  - it's typically used by ref iteration callbacks that have both a
    refname and oid. But since they pass only the refname, we may load
    the ref value from the filesystem again. This is inefficient, but
    also means we are open to a race if somebody simultaneously updates
    the ref. E.g., this:

      int some_ref_cb(const char *refname, const struct object_id *oid, ...)
      {
              if (!peel_ref(refname, &peeled))
                      printf("%s peels to %s",
                             oid_to_hex(oid), oid_to_hex(&peeled);
      }

    could print nonsense. It is correct to say "refname peels to..."
    (you may see the "before" value or the "after" value, either of
    which is consistent), but mentioning both oids may be mixing
    before/after values.

    Worse, whether this is possible depends on whether the optimization
    to read from the current iterator value kicks in. So it is actually
    not possible with:

      for_each_ref(some_ref_cb);

    but it _is_ possible with:

      head_ref(some_ref_cb);

    which does not use the iterator mechanism (though in practice, HEAD
    should never peel to anything, so this may not be triggerable).

  - it must take a fully-qualified refname for the read_ref_full() code
    path to work. Yet we routinely pass it partial refnames from
    callbacks to for_each_tag_ref(), etc. This happens to work when
    iterating because there we do not call read_ref_full() at all, and
    only use the passed refname to check if it is the same as the
    iterator. But the requirements for the function parameters are quite
    unclear.

Instead of taking a refname, let's instead take an oid. That fixes both
problems. It's a little funny for a "ref" function not to involve refs
at all. The key thing is that it's optimizing under the hood based on
having access to the ref iterator. So let's change the name to make it
clear why you'd want this function versus just peel_object().

There are two other directions I considered but rejected:

  - we could pass the peel information into the each_ref_fn callback.
    However, we don't know if the caller actually wants it or not. For
    packed-refs, providing it is essentially free. But for loose refs,
    we actually have to peel the object, which would be wasteful in most
    cases. We could likewise pass in a flag to the callback indicating
    whether the peeled information is known, but that complicates those
    callbacks, as they then have to decide whether to manually peel
    themselves. Plus it requires changing the interface of every
    callback, whether they care about peeling or not, and there are many
    of them.

  - we could make a function to return the peeled value of the current
    iterated ref (computing it if necessary), and BUG() otherwise. I.e.:

      int peel_current_iterated_ref(struct object_id *out);

    Each of the current callers is an each_ref_fn callback, so they'd
    mostly be happy. But:

      - we use those callbacks with functions like head_ref(), which do
        not use the iteration code. So we'd need to handle the fallback
        case there, anyway.

      - it's possible that a caller would want to call into generic code
        that sometimes is used during iteration and sometimes not. This
        encapsulates the logic to do the fast thing when possible, and
        fallback when necessary.

The implementation is mostly obvious, but I want to call out a few
things in the patch:

  - the test-tool coverage for peel_ref() is now meaningless, as it all
    collapses to a single peel_object() call (arguably they were pretty
    uninteresting before; the tricky part of that function is the
    fast-path we see during iteration, but these calls didn't trigger
    that). I've just dropped it entirely, though note that some other
    tests relied on the tags we created; I've moved that creation to the
    tests where it matters.

  - we no longer need to take a ref_store parameter, since we'd never
    look up a ref now. We do still rely on a global "current iterator"
    variable which _could_ be kept per-ref-store. But in practice this
    is only useful if there are multiple recursive iterations, at which
    point the more appropriate solution is probably a stack of
    iterators. No caller used the actual ref-store parameter anyway
    (they all call the wrapper that passes the_repository).

  - the original only kicked in the optimization when the "refname"
    pointer matched (i.e., not string comparison). We do likewise with
    the "oid" parameter here, but fall back to doing an actual oideq()
    call. This in theory lets us kick in the optimization more often,
    though in practice no current caller cares. It should never be
    wrong, though (peeling is a property of an object, so two refs
    pointing to the same object would peel identically).

  - the original took care not to touch the peeled out-parameter unless
    we found something to put in it. But no caller cares about this, and
    anyway, it is enforced by peel_object() itself (and even in the
    optimized iterator case, that's where we eventually end up). We can
    shorten the code and avoid an extra copy by just passing the
    out-parameter through the stack.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21 15:51:31 -08:00
Johannes Schindelin 4a5ec7d166 SKIP_DASHED_BUILT_INS: respect config.mak
When `SKIP_DASHED_BUILT_INS` is specified in `config.mak`, the dashed
form of the built-ins was still generated.

By moving the `SKIP_DASHED_BUILT_INS` handling after `config.mak` was
read, this can be avoided.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21 14:59:55 -08:00
Ævar Arnfjörð Bjarmason 28cc00a13d fsck doc: remove ancient out-of-date diagnostics
Remove diagnostics that haven't been emitted by "fsck" or its
predecessors for around 15 years. This documentation was added in
c64b9b8860 (Reference documentation for the core git commands.,
2005-05-05), but was out-of-date quickly after that.

Notes on individual diagnostics:

 - "expect dangling commits": Added in bcee6fd8e7 (Make 'fsck' able
   to[...], 2005-04-13), documented in c64b9b8860. Not emitted since
   1024932f01 (fsck-cache: walk the 'refs' directory[...],
   2005-05-18).

 - "missing sha1 directory": Added in 20222118ae (Add first cut at
   "fsck-cache"[...], 2005-04-08), documented in c64b9b8860. Not
   emitted since 230f13225d (Create object subdirectories on demand,
   2005-10-08).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20 19:10:42 -08:00
Jonathan Tan bfc2a36ff2 Doc: clarify contents of packfile sent as URI
Clarify that, when the packfile-uri feature is used, the client should
not assume that the extra packfiles downloaded would only contain a
single blob, but support packfiles containing multiple objects of all
types.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20 19:06:50 -08:00
Derrick Stolee 3cf5f221be t7900: clean up some broken refs
The tests for the 'prefetch' task create remotes and fetch refs into
'refs/prefetch/<remote>/' and tags into 'refs/tags/'. These tests use
the remotes to create objects not intended to be seen by the "local"
repository.

In that sense, the incrmental-repack tasks did not have these objects
and refs in mind. That test replaces the object directory with a
specific pack-file layout for testing the batch-size logic. However,
this causes some operations to start showing warnings such as:

 error: refs/prefetch/remote1/one does not point to a valid object!
 error: refs/tags/one does not point to a valid object!

This only shows up if you run the tests verbosely and watch the output.
It caught my eye and I _thought_ that there was a bug where 'git gc' or
'git repack' wouldn't check 'refs/prefetch/' before pruning objects.
That is incorrect. Those commands do handle 'refs/prefetch/' correctly.

All that is left is to clean up the tests in t7900-maintenance.sh to
remove these tags and refs that are not being repacked for the
incremental-repack tests. Use update-ref to ensure this works with all
ref backends.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20 18:46:22 -08:00
Derrick Stolee 96eaffebbf maintenance: set log.excludeDecoration durin prefetch
The 'prefetch' task fetches refs from all remotes and places them in the
refs/prefetch/<remote>/ refspace. As this task is intended to run in the
background, this allows users to keep their local data very close to the
remote servers' data while not updating the users' understanding of the
remote refs in refs/remotes/<remote>/.

However, this can clutter 'git log' decorations with copies of the refs
with the full name 'refs/prefetch/<remote>/<branch>'.

The log.excludeDecoration config option was added in a6be5e67 (log: add
log.excludeDecoration config option, 2020-05-16) for exactly this
purpose.

Ensure we set this only for users that would benefit from it by
assigning it at the beginning of the prefetch task. Other alternatives
would be during 'git maintenance register' or 'git maintenance start',
but those might assign the config even when the prefetch task is
disabled by existing config. Further, users could run 'git maintenance
run --task=prefetch' using their own scripting or scheduling. This
provides the best coverage to automatically update the config when
valuable.

It is improbable, but possible, that users might want to run the
prefetch task _and_ see these refs in their log decorations. This seems
incredibly unlikely to me, but users can always opt-in on a
command-by-command basis using --decorate-refs=refs/prefetch/.

Test that this works in a few cases. In particular, ensure that our
assignment of log.excludeDecoration=refs/prefetch/ is additive to other
existing exclusions. Further, ensure we do not add multiple copies in
multiple runs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20 18:46:22 -08:00
Derrick Stolee a4b6d202ca cache-tree: speed up consecutive path comparisons
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:

  git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

  Benchmark #1: v2.30.0
    Time (mean ± σ):     854.5 ms ±  18.2 ms
    Range (min … max):   825.0 ms … 892.8 ms

  Benchmark #2: Previous change
    Time (mean ± σ):     833.2 ms ±  10.3 ms
    Range (min … max):   815.8 ms … 849.7 ms

  Benchmark #3: This change
    Time (mean ± σ):     815.5 ms ±  18.1 ms
    Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than
v2.30.0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:05:13 -08:00
René Scharfe 0b72536a0b cache-tree: use ce_namelen() instead of strlen()
Use the name length field of cache entries instead of calculating its
value anew.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:05:13 -08:00
Derrick Stolee 4bdde337f4 index-format: discuss recursion of cache-tree better
The end of the cache tree index extension format trails off with
ellipses ever since 23fcc98 (doc: technical details about the index
file format, 2011-03-01). While an intuitive reader could gather what
this means, it could be better to use "and so on" instead.

Really, this is only justified because I also wanted to point out that
the number of subtrees in the index format is used to determine when the
recursive depth-first-search stack should be "popped." This should help
to add clarity to the format.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:04:59 -08:00
Derrick Stolee 22ad8600c1 index-format: update preamble to cache tree extension
I had difficulty in my efforts to learn about the cache tree extension
based on the documentation and code because I had an incorrect
assumption about how it behaved. This might be due to some ambiguity in
the documentation, so this change modifies the beginning of the cache
tree format by expanding the description of the feature.

My hope is that this documentation clarifies a few things:

1. There is an in-memory recursive tree structure that is constructed
   from the extension data. This structure has a few differences, such
   as where the name is stored.

2. What does it mean for an entry to be invalid?

3. When exactly are "new" trees created?

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:04:46 -08:00
Derrick Stolee 845d15d4d0 index-format: use 'cache tree' over 'cached tree'
The index has a "cache tree" extension. This corresponds to a
significant API implemented in cache-tree.[ch]. However, there are a few
places that refer to this erroneously as "cached tree". These are rare,
but notably the index-format.txt file itself makes this error.

The only other reference is in t7104-reset-hard.sh.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:04:38 -08:00
Derrick Stolee 0e5c950267 cache-tree: trace regions for prime_cache_tree
Commands such as "git reset --hard" rebuild the in-memory representation
of the cache tree index extension by parsing tree objects starting at a
known root tree. The performance of this operation can vary widely
depending on the width and depth of the repository's working directory
structure. Measure the time in this operation using trace2 regions in
prime_cache_tree().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:04:32 -08:00
Derrick Stolee 4c3e18723c cache-tree: trace regions for I/O
As we write or read the cache tree index extension, it can be good to
isolate how much of the file I/O time is spent constructing this
in-memory tree from the existing index or writing it out again to the
new index file. Use trace2 regions to indicate that we are spending time
on this operation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 23:04:21 -08:00
Junio C Hamano 66e871b664 The third batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 21:48:47 -08:00
Junio C Hamano 49656f9445 Merge branch 'jc/macos-install-dependencies-fix'
Fix for procedure to building CI test environment for mac.

* jc/macos-install-dependencies-fix:
  ci/install-depends: attempt to fix "brew cask" stuff
2021-01-15 21:48:47 -08:00
Junio C Hamano 8782bfbf01 Merge branch 'tb/local-clone-race-doc'
Doc update.

* tb/local-clone-race-doc:
  Documentation/git-clone.txt: document race with --local
2021-01-15 21:48:47 -08:00
Junio C Hamano 644d85e751 Merge branch 'bc/doc-status-short'
Doc update.

* bc/doc-status-short:
  docs: rephrase and clarify the git status --short format
2021-01-15 21:48:47 -08:00
Junio C Hamano 453e149c8a Merge branch 'dl/p4-encode-after-kw-expansion'
Text encoding fix for "git p4".

* dl/p4-encode-after-kw-expansion:
  git-p4: fix syncing file types with pattern
2021-01-15 21:48:47 -08:00
Junio C Hamano cf2870adda Merge branch 'ab/gettext-charset-comment-fix'
Comments update.

* ab/gettext-charset-comment-fix:
  gettext.c: remove/reword a mostly-useless comment
  Makefile: remove a warning about old GETTEXT_POISON flag
2021-01-15 21:48:46 -08:00
Junio C Hamano eecc5f0775 Merge branch 'ug/doc-lose-dircache'
Doc update.

* ug/doc-lose-dircache:
  doc: remove "directory cache" from man pages
2021-01-15 21:48:46 -08:00
Junio C Hamano d9e1cd555d Merge branch 'ad/t4129-setfacl-target-fix'
Test fix.

* ad/t4129-setfacl-target-fix:
  t4129: fix setfacl-related permissions failure
2021-01-15 21:48:46 -08:00
Junio C Hamano 2b8cef2307 Merge branch 'jk/t5516-deflake'
Test fix.

* jk/t5516-deflake:
  t5516: loosen "not our ref" error check
2021-01-15 21:48:46 -08:00
Junio C Hamano 788f488b33 Merge branch 'vv/send-email-with-less-secure-apps-access'
Doc update.

* vv/send-email-with-less-secure-apps-access:
  git-send-email.txt: mention less secure app access with Gmail
2021-01-15 21:48:46 -08:00
Junio C Hamano 073552d7ae Merge branch 'pb/mergetool-tool-help-fix'
Fix 2.29 regression where "git mergetool --tool-help" fails to list
all the available tools.

* pb/mergetool-tool-help-fix:
  mergetool--lib: fix '--tool-help' to correctly show available tools
2021-01-15 21:48:46 -08:00
Junio C Hamano aa08688362 Merge branch 'ds/for-each-repo-noopfix'
"git for-each-repo --config=<var> <cmd>" should not run <cmd> for
any repository when the configuration variable <var> is not defined
even once.

* ds/for-each-repo-noopfix:
  for-each-repo: do nothing on empty config
2021-01-15 21:48:46 -08:00
Junio C Hamano 6a393f36d9 Merge branch 'jc/sign-off'
Doc update.

* jc/sign-off:
  SubmittingPatches: tighten wording on "sign-off" procedure
2021-01-15 21:48:45 -08:00
Junio C Hamano 8dbabb31df Merge branch 'mt/t4129-with-setgid-dir'
Some tests expect that "ls -l" output has either '-' or 'x' for
group executable bit, but setgid bit can be inherited from parent
directory and make these fields 'S' or 's' instead, causing test
failures.

* mt/t4129-with-setgid-dir:
  t4129: don't fail if setgid is set in the test directory
2021-01-15 21:48:45 -08:00
Junio C Hamano b2ace18759 Merge branch 'ds/maintenance-part-4'
Follow-up on the "maintenance part-3" which introduced scheduled
maintenance tasks to support platforms whose native scheduling
methods are not 'cron'.

* ds/maintenance-part-4:
  maintenance: use Windows scheduled tasks
  maintenance: use launchctl on macOS
  maintenance: include 'cron' details in docs
  maintenance: extract platform-specific scheduling
2021-01-15 21:48:45 -08:00
Junio C Hamano 4151fdb1c7 The second batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 15:20:30 -08:00
Junio C Hamano f9fb9063fd Merge branch 'fc/completion-aliases-support'
Bash completion (in contrib/) update to make it easier for
end-users to add completion for their custom "git" subcommands.

* fc/completion-aliases-support:
  completion: add proper public __git_complete
  test: completion: add tests for __git_complete
  completion: bash: improve function detection
  completion: bash: add __git_have_func helper
2021-01-15 15:20:30 -08:00
Junio C Hamano 62fb47a4d3 Merge branch 'en/stash-apply-sparse-checkout'
"git stash" did not work well in a sparsely checked out working
tree.

* en/stash-apply-sparse-checkout:
  stash: fix stash application in sparse-checkouts
  stash: remove unnecessary process forking
  t7012: add a testcase demonstrating stash apply bugs in sparse checkouts
2021-01-15 15:20:29 -08:00
Junio C Hamano 1ee70a916d Merge branch 'ar/t6016-modernise'
Test update.

* ar/t6016-modernise:
  t6016: move to lib-log-graph.sh framework
2021-01-15 15:20:29 -08:00
Junio C Hamano 2ce8de6bf9 Merge branch 'zh/arg-help-format'
Clean up option descriptions in "git cmd --help".

* zh/arg-help-format:
  builtin/*: update usage format
  parse-options: format argh like error messages
2021-01-15 15:20:29 -08:00
Junio C Hamano 7bfa022993 Merge branch 'nk/perf-fsmonitor-cleanup'
Test fix.

* nk/perf-fsmonitor-cleanup:
  p7519: allow running without watchman prereq
2021-01-15 15:20:29 -08:00
Junio C Hamano 02feca721e Merge branch 'ds/trace2-topo-walk'
The topological walk codepath is covered by new trace2 stats.

* ds/trace2-topo-walk:
  revision: trace topo-walk statistics
2021-01-15 15:20:29 -08:00
Junio C Hamano df26861c56 Merge branch 'rs/rebase-commit-validation'
Diagnose command line error of "git rebase" early.

* rs/rebase-commit-validation:
  rebase: verify commit parameter
2021-01-15 15:20:29 -08:00
Junio C Hamano 8b327f1784 Merge branch 'ma/sha1-is-a-hash'
Retire more names with "sha1" in it.

* ma/sha1-is-a-hash:
  hash-lookup: rename from sha1-lookup
  sha1-lookup: rename `sha1_pos()` as `hash_pos()`
  object-file.c: rename from sha1-file.c
  object-name.c: rename from sha1-name.c
2021-01-15 15:20:29 -08:00
Junio C Hamano 16a8055dae Merge branch 'ma/doc-pack-format-varint-for-sizes'
Doc update.

* ma/doc-pack-format-varint-for-sizes:
  pack-format.txt: document sizes at start of delta data
2021-01-15 15:20:29 -08:00
Junio C Hamano a11571bb7f Merge branch 'ma/t1300-cleanup'
Code clean-up.

* ma/t1300-cleanup:
  t1300: don't needlessly work with `core.foo` configs
  t1300: remove duplicate test for `--file no-such-file`
  t1300: remove duplicate test for `--file ../foo`
2021-01-15 15:20:28 -08:00
Junio C Hamano 40876260ef Merge branch 'pb/doc-modules-git-work-tree-typofix'
Doc fix.

* pb/doc-modules-git-work-tree-typofix:
  gitmodules.txt: fix 'GIT_WORK_TREE' variable name
2021-01-15 15:20:28 -08:00
Junio C Hamano b17eb5b4e4 Merge branch 'ta/doc-typofix'
Doc fix.

* ta/doc-typofix:
  doc: fix some typos
2021-01-15 15:20:28 -08:00
Junio C Hamano 9ba366f12b Merge branch 'bc/rev-parse-path-format'
"git rev-parse" can be explicitly told to give output as absolute
or relative path with the `--path-format=(absolute|relative)` option.

* bc/rev-parse-path-format:
  rev-parse: add option for absolute or relative path formatting
  abspath: add a function to resolve paths with missing components
2021-01-15 15:20:28 -08:00
Junio C Hamano 6dbbae17d9 Merge branch 'ew/decline-core-abbrev'
The configuration variable 'core.abbrev' can be set to 'no' to
force no abbreviation regardless of the hash algorithm.

* ew/decline-core-abbrev:
  core.abbrev=no disables abbreviations
2021-01-15 15:20:28 -08:00
Patrick Steinhardt d8d77153ea config: allow specifying config entries via envvar pairs
While we currently have the `GIT_CONFIG_PARAMETERS` environment variable
which can be used to pass runtime configuration data to git processes,
it's an internal implementation detail and not supposed to be used by
end users.

Next to being for internal use only, this way of passing config entries
has a major downside: the config keys need to be parsed as they contain
both key and value in a single variable. As such, it is left to the user
to escape any potentially harmful characters in the value, which is
quite hard to do if values are controlled by a third party.

This commit thus adds a new way of adding config entries via the
environment which gets rid of this shortcoming. If the user passes the
`GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment
variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each
`i` in `[0,n)`.

While the same can be achieved with `git -c <name>=<value>`, one may
wish to not do so for potentially sensitive information. E.g. if one
wants to set `http.extraHeader` to contain an authentication token,
doing so via `-c` would trivially leak those credentials via e.g. ps(1),
which typically also shows command arguments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:45 -08:00
Patrick Steinhardt b9d147fb15 environment: make getenv_safe() a public function
The `getenv_safe()` helper function helps to safely retrieve multiple
environment values without the need to depend on platform-specific
behaviour for the return value's lifetime. We'll make use of this
function in a following patch, so let's make it available by making it
non-static and adding a declaration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:45 -08:00
Patrick Steinhardt 1ff21c05ba config: store "git -c" variables using more robust format
The previous commit added a new format for $GIT_CONFIG_PARAMETERS which
is able to robustly handle subsections with "=" in them. Let's start
writing the new format. Unfortunately, this does much less than you'd
hope, because "git -c" itself has the same ambiguity problem! But it's
still worth doing:

  - we've now pushed the problem from the inter-process communication
    into the "-c" command-line parser. This would free us up to later
    add an unambiguous format there (e.g., separate arguments like "git
    --config key value", etc).

  - for --config-env, the parser already disallows "=" in the
    environment variable name. So:

      git --config-env section.with=equals.key=ENVVAR

    will robustly set section.with=equals.key to the contents of
    $ENVVAR.

The new test shows the improvement for --config-env.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:18 -08:00
Jeff King f9dbb64fad config: parse more robust format in GIT_CONFIG_PARAMETERS
When we stuff config options into GIT_CONFIG_PARAMETERS, we shell-quote
each one as a single unit, like:

  'section.one=value1' 'section.two=value2'

On the reading side, we de-quote to get the individual strings, and then
parse them by splitting on the first "=" we find. This format is
ambiguous, because an "=" may appear in a subsection. So the config
represented in a file by both:

  [section "subsection=with=equals"]
  key = value

and:

  [section]
  subsection = with=equals.key=value

ends up in this flattened format like:

  'section.subsection=with=equals.key=value'

and we can't tell which was desired. We have traditionally resolved this
by taking the first "=" we see starting from the left, meaning that we
allowed arbitrary content in the value, but not in the subsection.

Let's make our environment format a bit more robust by separately
quoting the key and value. That turns those examples into:

  'section.subsection=with=equals.key'='value'

and:

  'section.subsection'='with=equals.key=value'

respectively, and we can tell the difference between them. We can detect
which format is in use for any given element of the list based on the
presence of the unquoted "=". That means we can continue to allow the
old format to work to support any callers which manually used the old
format, and we can even intermingle the two formats. The old format
wasn't documented, and nobody was supposed to be using it. But it's
likely that such callers exist in the wild, so it's nice if we can avoid
breaking them. Likewise, it may be possible to trigger an older version
of "git -c" that runs a script that calls into a newer version of "git
-c"; that new version would see the intermingled format.

This does create one complication, which is that the obvious format in
the new scheme for

  [section]
  some-bool

is:

  'section.some-bool'

with no equals. We'd mistake that for an old-style variable. And it even
has the same meaning in the old style, but:

  [section "with=equals"]
  some-bool

does not. It would be:

  'section.with=equals=some-bool'

which we'd take to mean:

  [section]
  with = equals=some-bool

in the old, ambiguous style. Likewise, we can't use:

  'section.some-bool'=''

because that's ambiguous with an actual empty string. Instead, we'll
again use the shell-quoting to give us a hint, and use:

  'section.some-bool'=

to show that we have no value.

Note that this commit just expands the reading side. We'll start writing
the new format via "git -c" in a future patch. In the meantime, the
existing "git -c" tests will make sure we didn't break reading the old
format. But we'll also add some explicit coverage of the two formats to
make sure we continue to handle the old one after we move the writing
side over.

And one final note: since we're now using the shell-quoting as a
semantically meaningful hint, this closes the door to us ever allowing
arbitrary shell quoting, like:

  'a'shell'would'be'ok'with'this'.key=value

But we have never supported that (only what sq_quote() would produce),
and we are probably better off keeping things simple, robust, and
backwards-compatible, than trying to make it easier for humans. We'll
continue not to advertise the format of the variable to users, and
instead keep "git -c" as the recommended mechanism for setting config
(even if we are trying to be kind not to break users who may be relying
on the current undocumented format).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:18 -08:00
Junio C Hamano 2d02bc91c0 t4203: make blame output massaging more robust
In the "git blame --porcelain" output, lines that ends with three
integers may not be the line that shows a commit object with line
numbers and block length (the contents from the blamed file or the
summary field can have a line that happens to match).  Also, the
names of the author may have more than three SP separated tokens
("git blame -L242,+1 cf6de18aab Documentation/SubmittingPatches"
gives an example).  The existing "grep -E | cut" pipeline is a bit
too loose on these two points.

While they can be assumed on the test data, it is not so hard to
use the right pattern from the documented format, so let's do so.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-14 21:54:52 -08:00