Commit graph

66232 commits

Author SHA1 Message Date
Victoria Dye 2c66a7c8ce read-tree: integrate with sparse index
Enable use of sparse index in 'git read-tree'. The integration in this patch
is limited only to usage of 'read-tree' that does not need additional
functional changes for the sparse index to behave as expected (i.e., produce
the same user-facing results as a non-sparse index sparse-checkout). To
ensure no unexpected behavior occurs, the index is explicitly expanded when:

* '--no-sparse-checkout' is specified (because it disables sparse-checkout)
* '--prefix' is specified (if the prefix is inside a sparse directory, the
  prefixed tree cannot be properly traversed)
* two or more <tree-ish> arguments are specified ('twoway_merge' and
  'threeway_merge' do not yet support merging sparse directories)

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:36:01 -08:00
Victoria Dye 14bf38cfcf read-tree: expand sparse checkout test coverage
Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
emphasis is placed on interactions with files outside the sparse cone, e.g.
merges with out-of-cone conflicts.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:36:01 -08:00
Victoria Dye cc89331ddc read-tree: explicitly disallow prefixes with a leading '/'
Exit with an error if a prefix provided to `git read-tree --prefix` begins
with '/'. In most cases, prefixes like this result in an "invalid path"
error; however, the repository root would be interpreted as valid when
specified as '--prefix=/'. This is due to leniency around trailing directory
separators on prefixes (e.g., allowing both '--prefix=my-dir' and
'--prefix=my-dir/') - the '/' in the prefix is actually the *trailing*
slash, although it could be misinterpreted as a *leading* slash.

To remove the confusing repo root-as-'/' case and make it clear that
prefixes should not begin with '/', exit with an error if the first
character of the provided prefix is '/'.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:36:00 -08:00
Victoria Dye 2c521b0e49 status: fix nested sparse directory diff in sparse index
Enable the 'recursive' diff option for the diff executed as part of 'git
status'. Without the 'recursive' enabled, 'git status' reports index
changes incorrectly when the following conditions were met:

* sparse index is enabled
* there is a difference between the index and HEAD in a file inside a
  *subdirectory* of a sparse directory
* the sparse directory index entry is *not* expanded in-core

Because it is not recursive by default, the diff in 'git status' reports
changes only at the level of files and directories that are immediate
children of a sparse directory, rather than recursing into directories with
changes to identify the modified file(s). As a result, 'git status' reports
the immediate subdirectory itself as "modified".

Example:

$ git init
$ mkdir -p sparse/sub
$ echo test >sparse/sub/foo
$ git add .
$ git commit -m "commit 1"
$ echo somethingelse >sparse/sub/foo
$ git add .
$ git commit -a -m "commit 2"
$ git sparse-checkout set --cone --sparse-index 'sparse'
$ git reset --soft HEAD~1
$ git status
On branch master
You are in a sparse checkout.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   sparse/sub

Enabling the 'recursive' diff option in 'wt_status_collect_changes_index'
corrects this issue by allowing the diff to recurse into subdirectories of
sparse directories to find modified files. Given the same repository setup
as the example above, the corrected result of `git status` is:

$ git status
On branch master
You are in a sparse checkout.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   sparse/sub/foo

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:36:00 -08:00
Victoria Dye 287fd17e3a sparse-index: prevent repo root from becoming sparse
Prevent the repository root from being collapsed into a sparse directory by
treating an empty path as "inside the sparse-checkout". When collapsing a
sparse index (e.g. in 'git sparse-checkout reapply'), the root directory
typically could not become a sparse directory due to the presence of in-cone
root-level files and directories. However, if no such in-cone files or
directories were present, there was no explicit check signaling that the
"repository root path" (an empty string, in the case of
'convert_to_sparse(...)') was in-cone, and a sparse directory index entry
would be created from the repository root directory.

The documentation in Documentation/git-sparse-checkout.txt explicitly states
that the files in the root directory are expected to be in-cone for a
cone-mode sparse-checkout. Collapsing the root into a sparse directory entry
violates that assumption, as sparse directory entries are expected to be
outside the sparse cone and have SKIP_WORKTREE enabled. This invalid state
in turn causes issues with commands that interact with the index, e.g.
'git status'.

Treating an empty (root) path as in-cone prevents the creation of a root
sparse directory in 'convert_to_sparse(...)'. Because the repository root is
otherwise never compared with sparse patterns (in both cone-mode and
non-cone sparse-checkouts), the new check does not cause additional changes
to how sparse patterns are applied.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:36:00 -08:00
Derrick Stolee c8d67b9a68 commit-graph: fix generation number v2 overflow values
The Generation Data Chunk was implemented and tested in e8b63005c
(commit-graph: implement generation data chunk, 2021-01-16), but the
test was carefully constructed to work on systems with 32-bit dates.
Since the corrected commit date offsets still required more than 31
bits, this triggered writing the generation_data_overflow chunk.

However, upon closer look, the
write_graph_chunk_generation_data_overflow() method writes the offsets
to the chunk (as dictated by the format) but fill_commit_graph_info()
treats the value in the chunk as if it is the full corrected commit date
(not an offset). For some reason, this does not cause an issue when
using the FUTURE_DATE specified in t5318-commit-graph.sh, but it does
show up as a failure in 'git commit-graph verify' if we increase that
FUTURE_DATE to be above four billion.

Fix this error and create a 64-bit timestamp version of the test so we
can test these larger values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:15:06 -08:00
Derrick Stolee 3b0199d4c3 commit-graph: start parsing generation v2 (again)
The 'read_generation_data' member of 'struct commit_graph' was
introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire
chain does, 2021-01-16). The intention was to avoid using corrected
commit dates if not all layers of a commit-graph had that data stored.
The logic in validate_mixed_generation_chain() at that point incorrectly
initialized read_generation_data to 1 if and only if the tip
commit-graph contained the Corrected Commit Date chunk.

This was "fixed" in 448a39e65 (commit-graph: validate layers for
generation data, 2021-02-02) to validate that read_generation_data was
either non-zero for all layers, or it would set read_generation_data to
zero for all layers.

The problem here is that read_generation_data is not initialized to be
non-zero anywhere!

This change initializes read_generation_data immediately after the chunk
is parsed, so each layer will have its value present as soon as
possible.

The read_generation_data member is used in fill_commit_graph_info() to
determine if we should use the corrected commit date or the topological
levels stored in the Commit Data chunk. Due to this bug, all previous
versions of Git were defaulting to topological levels in all cases!

This can be measured with some performance tests. Using the Linux kernel
as a testbed, I generated a complete commit-graph containing corrected
commit dates and tested the 'new' version against the previous, 'old'
version.

First, rev-list with --topo-order demonstrates a 26% improvement using
corrected commit dates:

hyperfine \
	-n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \
	-n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \
	--warmup=10

Benchmark 1: old
  Time (mean ± σ):      57.1 ms ±   3.1 ms
  Range (min … max):    52.9 ms …  62.0 ms    55 runs

Benchmark 2: new
  Time (mean ± σ):      45.5 ms ±   3.3 ms
  Range (min … max):    39.9 ms …  51.7 ms    59 runs

Summary
  'new' ran
    1.26 ± 0.11 times faster than 'old'

These performance improvements are due to the algorithmic improvements
given by walking fewer commits due to the higher cutoffs from corrected
commit dates.

However, this comes at a cost. The additional I/O cost of parsing the
corrected commit dates is visible in case of merge-base commands that do
not reduce the overall number of walked commits.

hyperfine \
        -n "old" "$OLD_GIT merge-base v4.8 v4.9" \
        -n "new" "$NEW_GIT merge-base v4.8 v4.9" \
        --warmup=10

Benchmark 1: old
  Time (mean ± σ):     110.4 ms ±   6.4 ms
  Range (min … max):    96.0 ms … 118.3 ms    25 runs

Benchmark 2: new
  Time (mean ± σ):     150.7 ms ±   1.1 ms
  Range (min … max):   149.3 ms … 153.4 ms    19 runs

Summary
  'old' ran
    1.36 ± 0.08 times faster than 'new'

Performance issues like this are what motivated 702110aac (commit-graph:
use config to specify generation type, 2021-02-25).

In the future, we could fix this performance problem by inserting the
corrected commit date offsets into the Commit Date chunk instead of
having that data in an extra chunk.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:15:06 -08:00
Derrick Stolee 75979d9460 commit-graph: fix ordering bug in generation numbers
When computing the generation numbers for a commit-graph, we compute
the corrected commit dates and then check if their offsets from the
actual dates is too large to fit in the 32-bit Generation Data chunk.
However, there is a problem with this approach: if we have parsed the
generation data from the previous commit-graph, then we continue the
loop because the corrected commit date is already computed. This causes
an under-count in the number of overflow values.

It is incorrect to add an increment to num_generation_data_overflows
next to this 'continue' statement, because we might start
double-counting commits that are computed because of the depth-first
search walk from a commit with an earlier OID.

Instead, iterate over the full commit list at the end, checking the
offsets to see how many grow beyond the maximum value.

Create a new t5328-commit-graph-64-bit-time.sh test script to handle
special cases of testing 64-bit timestamps. This helps demonstrate this
bug in more cases. It still won't hit all potential cases until the next
change, which reenables reading generation numbers. Use the skip_all
trick from 0a2bfccb9c (t0051: use "skip_all" under !MINGW in
single-test file, 2022-02-04) to make the output clean when run on a
32-bit system.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:14:57 -08:00
Derrick Stolee 17925e0602 t5318: extract helpers to lib-commit-graph.sh
The graph_git_behavior helper is useful for testing that certain Git
commands behave the same when using the commit-graph and when not using
the commit-graph. Extract it to a new lib-commit-graph.sh file for use
in new test scripts that will split out from t5318.

While doing this extraction, also extract graph_read_expect and the
logic for priming the test_oid_cache.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:09:55 -08:00
Derrick Stolee c78c7a959c test-read-graph: include extra post-parse info
It can be helpful to verify that the 'struct commit_graph' that results
from parsing a commit-graph is correctly structured. The existence of
different chunks is not enough to verify that all of the optional
features are correctly enabled.

Update 'test-tool read-graph' to output an "options:" line that includes
information for different parts of the struct commit_graph.

In particular, this change demonstrates that the read_generation_data
option is never being enabled, which will be fixed in a later change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 12:09:55 -08:00
Patrick Steinhardt 0a7b38707d refs/files-backend: optimize reading of symbolic refs
When reading references via `files_read_raw_ref()` we always consult
both the loose reference, and if that wasn't found, we also consult the
packed-refs file. While this makes sense to read a generic reference, it
is wasteful in the case where we only care about symbolic references:
the packed-refs backend does not support them, and thus it cannot ever
return one for us.

Special-case reading of symbolic references for the files backend such
that we always skip asking the packed-refs backend.

We use `refs_read_symbolic_ref()` extensively to determine whether we
need to skip updating local symbolic references during a fetch, which is
why the change results in a significant speedup when doing fetches in
repositories with huge numbers of references. The following benchmark
executes a mirror-fetch in a repository with about 2 million references
via `git fetch --prune --no-write-fetch-head +refs/*:refs/*`:

    Benchmark 1: HEAD~
      Time (mean ± σ):     68.372 s ±  2.344 s    [User: 65.629 s, System: 8.786 s]
      Range (min … max):   65.745 s … 70.246 s    3 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     60.259 s ±  0.343 s    [User: 61.019 s, System: 7.245 s]
      Range (min … max):   60.003 s … 60.649 s    3 runs

    Summary
      'HEAD' ran
        1.13 ± 0.04 times faster than 'HEAD~'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:13:46 -08:00
Patrick Steinhardt 1553f5e76c remote: read symbolic refs via refs_read_symbolic_ref()
We have two cases in the remote code where we check whether a reference
is symbolic or not, but don't mind in case it doesn't exist or in case
it exists but is a non-symbolic reference. Convert these two callsites
to use the new `refs_read_symbolic_ref()` function, whose intent is to
implement exactly that usecase.

No change in behaviour is expected from this change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:13:46 -08:00
Patrick Steinhardt cd475b3b03 refs: add ability for backends to special-case reading of symbolic refs
Reading of symbolic and non-symbolic references is currently treated the
same in reference backends: we always call `refs_read_raw_ref()` and
then decide based on the returned flags what type it is. This has one
downside though: symbolic references may be treated different from
normal references in a backend from normal references. The packed-refs
backend for example doesn't even know about symbolic references, and as
a result it is pointless to even ask it for one.

There are cases where we really only care about whether a reference is
symbolic or not, but don't care about whether it exists at all or may be
a non-symbolic reference. But it is not possible to optimize for this
case right now, and as a consequence we will always first check for a
loose reference to exist, and if it doesn't, we'll query the packed-refs
backend for a known-to-not-be-symbolic reference. This is inefficient
and requires us to search all packed references even though we know to
not care for the result at all.

Introduce a new function `refs_read_symbolic_ref()` which allows us to
fix this case. This function will only ever return symbolic references
and can thus optimize for the scenario layed out above. By default, if
the backend doesn't provide an implementation for it, we just use the
old code path and fall back to `read_raw_ref()`. But in case the backend
provides its own, more efficient implementation, we will use that one
instead.

Note that this function is explicitly designed to not distinguish
between missing references and non-symbolic references. If it did, we'd
be forced to always search the packed-refs backend to see whether the
symbolic reference the user asked for really doesn't exist, or if it
exists as a non-symbolic reference.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:13:46 -08:00
Patrick Steinhardt 8e55634b47 fetch: avoid lookup of commits when not appending to FETCH_HEAD
When fetching from a remote repository we will by default write what has
been fetched into the special FETCH_HEAD reference. The order in which
references are written depends on whether the reference is for merge or
not, which, despite some other conditions, is also determined based on
whether the old object ID the reference is being updated from actually
exists in the repository.

To write FETCH_HEAD we thus loop through all references thrice: once for
the references that are about to be merged, once for the references that
are not for merge, and finally for all references that are ignored. For
every iteration, we then look up the old object ID to determine whether
the referenced object exists so that we can label it as "not-for-merge"
if it doesn't exist. It goes without saying that this can be expensive
in case where we are fetching a lot of references.

While this is hard to avoid in the case where we're writing FETCH_HEAD,
users can in fact ask us to skip this work via `--no-write-fetch-head`.
In that case, we do not care for the result of those lookups at all
because we don't have to order writes to FETCH_HEAD in the first place.

Skip this busywork in case we're not writing to FETCH_HEAD. The
following benchmark performs a mirror-fetch in a repository with about
two million references via `git fetch --prune --no-write-fetch-head
+refs/*:refs/*`:

    Benchmark 1: HEAD~
      Time (mean ± σ):     75.388 s ±  1.942 s    [User: 71.103 s, System: 8.953 s]
      Range (min … max):   73.184 s … 76.845 s    3 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     69.486 s ±  1.016 s    [User: 65.941 s, System: 8.806 s]
      Range (min … max):   68.864 s … 70.659 s    3 runs

    Summary
      'HEAD' ran
        1.08 ± 0.03 times faster than 'HEAD~'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:13:46 -08:00
Patrick Steinhardt 4de656263a upload-pack: look up "want" lines via commit-graph
During packfile negotiation the client will send "want" and "want-ref"
lines to the server to tell it which objects it is interested in. The
server-side parses each of those and looks them up to see whether it
actually has requested objects. This lookup is performed by calling
`parse_object()` directly, which thus hits the object database. In the
general case though most of the objects the client requests will be
commits. We can thus try to look up the object via the commit-graph
opportunistically, which is much faster than doing the same via the
object database.

Refactor parsing of both "want" and "want-ref" lines to do so.

The following benchmark is executed in a repository with a huge number
of references. It uses cached request from git-fetch(1) as input to
git-upload-pack(1) that contains about 876,000 "want" lines:

    Benchmark 1: HEAD~
      Time (mean ± σ):      7.113 s ±  0.028 s    [User: 6.900 s, System: 0.662 s]
      Range (min … max):    7.072 s …  7.168 s    10 runs

    Benchmark 2: HEAD
      Time (mean ± σ):      6.622 s ±  0.061 s    [User: 6.452 s, System: 0.650 s]
      Range (min … max):    6.535 s …  6.727 s    10 runs

    Summary
      'HEAD' ran
        1.07 ± 0.01 times faster than 'HEAD~'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:13:45 -08:00
Junio C Hamano 34363403a2 Merge branch 'ps/fetch-atomic' into ps/fetch-mirror-optim
* ps/fetch-atomic:
  fetch: make `--atomic` flag cover pruning of refs
  fetch: make `--atomic` flag cover backfilling of tags
  refs: add interface to iterate over queued transactional updates
  fetch: report errors when backfilling tags fails
  fetch: control lifecycle of FETCH_HEAD in a single place
  fetch: backfill tags before setting upstream
  fetch: increase test coverage of fetches
2022-03-01 10:11:00 -08:00
Ævar Arnfjörð Bjarmason 71f26798f2 test-lib: add "fast_unwind_on_malloc=0" to LSAN_OPTIONS
Add "fast_unwind_on_malloc=0" to LSAN_OPTIONS to get more meaningful
stack traces from LSAN. This isn't required under ASAN which will emit
traces such as this one for a leak in "t/t0006-date.sh":

    $ ASAN_OPTIONS=detect_leaks=1 ./t0006-date.sh -vixd
    [...]
    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x488b94 in strdup (t/helper/test-tool+0x488b94)
        #1 0x9444a4 in xstrdup wrapper.c:29:14
        #2 0x5995fa in parse_date_format date.c:991:24
        #3 0x4d2056 in show_dates t/helper/test-date.c:39:2
        #4 0x4d174a in cmd__date t/helper/test-date.c:116:3
        #5 0x4cce89 in cmd_main t/helper/test-tool.c:127:11
        #6 0x4cd1e3 in main common-main.c:52:11
        #7 0x7fef3c695e49 in __libc_start_main csu/../csu/libc-start.c:314:16
        #8 0x422b09 in _start (t/helper/test-tool+0x422b09)

    SUMMARY: AddressSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

Whereas LSAN would emit this instead:

    $ ./t0006-date.sh -vixd
    [...]
    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x4323b8 in malloc (t/helper/test-tool+0x4323b8)
        #1 0x7f2be1d614aa in strdup string/strdup.c:42:15

    SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

Now we'll instead git this sensible stack trace under
LSAN. I.e. almost the same one (but starting with "malloc", as is
usual for LSAN) as under ASAN:

    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x4323b8 in malloc (t/helper/test-tool+0x4323b8)
        #1 0x7f012af5c4aa in strdup string/strdup.c:42:15
        #2 0x5cb164 in xstrdup wrapper.c:29:14
        #3 0x495ee9 in parse_date_format date.c:991:24
        #4 0x453aac in show_dates t/helper/test-date.c:39:2
        #5 0x453782 in cmd__date t/helper/test-date.c:116:3
        #6 0x451d95 in cmd_main t/helper/test-tool.c:127:11
        #7 0x451f1e in main common-main.c:52:11
        #8 0x7f012aef5e49 in __libc_start_main csu/../csu/libc-start.c:314:16
        #9 0x42e0a9 in _start (t/helper/test-tool+0x42e0a9)

    SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

As the option name suggests this does make things slower, e.g. for
t0001-init.sh we're around 10% slower:

    $ hyperfine -L v 0,1 'LSAN_OPTIONS=fast_unwind_on_malloc={v} make T=t0001-init.sh' -r 3
    Benchmark 1: LSAN_OPTIONS=fast_unwind_on_malloc=0 make T=t0001-init.sh
      Time (mean ± σ):      2.135 s ±  0.015 s    [User: 1.951 s, System: 0.554 s]
      Range (min … max):    2.122 s …  2.152 s    3 runs

    Benchmark 2: LSAN_OPTIONS=fast_unwind_on_malloc=1 make T=t0001-init.sh
      Time (mean ± σ):      1.981 s ±  0.055 s    [User: 1.769 s, System: 0.488 s]
      Range (min … max):    1.941 s …  2.044 s    3 runs

    Summary
      'LSAN_OPTIONS=fast_unwind_on_malloc=1 make T=t0001-init.sh' ran
        1.08 ± 0.03 times faster than 'LSAN_OPTIONS=fast_unwind_on_malloc=0 make T=t0001-init.sh'

I think that's more than worth it to get the more meaningful stack
traces, we can always provide LSAN_OPTIONS=fast_unwind_on_malloc=0 for
one-off "fast" runs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-28 13:35:56 -08:00
Ævar Arnfjörð Bjarmason b9638d7286 test-lib: make $GIT_BUILD_DIR an absolute path
Change the GIT_BUILD_DIR from a path like "/path/to/build/t/.." to
"/path/to/build". The "TEST_DIRECTORY" here is already made an
absolute path a few lines above this.

We could simply do $(cd "$TEST_DIRECTORY"/.." && pwd) here, but as
noted in the preceding commit the "$TEST_DIRECTORY" can't be anything
except the path containing this test-lib.sh file at this point, so we
can more cheaply and equally strip the "/t" off the end.

This change will be helpful to LSAN_OPTIONS which will want to strip
the build directory path from filenames, which we couldn't do if we
had a "/.." in there.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-28 13:35:56 -08:00
Ævar Arnfjörð Bjarmason 9dbf20e7f6 test-lib: correct and assert TEST_DIRECTORY overriding
Correct a misleading comment added by me in 62f539043c (test-lib:
Allow overriding of TEST_DIRECTORY, 2010-08-19), and add an assertion
that TEST_DIRECTORY cannot point to any directory except the "t"
directory in the top-level of git.git.

This assertion is in effect not new, since we'd already die if that
wasn't the case[1], but it and the updated commentary help to make
that clearer.

The existing comments were also on the wrong arms of the
"if". I.e. the "allow tests to override this" was on the "test -z"
arm. That came about due to a combination of 62f539043c and
85176d7251 (test-lib.sh: convert $TEST_DIRECTORY to an absolute path,
2013-11-17).

Those earlier comments could be read as allowing the "$TEST_DIRECTORY"
to be some path outside of t/. As explained in the updated comment
that's impossible, rather it was meant for *tests* that ran outside of
t/, i.e. the "t0000-basic.sh" tests that use "lib-subtest.sh".

Those tests have a different working directory, but they set the
"TEST_DIRECTORY" to the same path for bootstrapping. The comments now
reflect that, and further comment on why we have a hard dependency on
this.

1. https://lore.kernel.org/git/220222.86o82z8als.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-28 13:35:56 -08:00
Ævar Arnfjörð Bjarmason 66c1a56870 test-lib: add GIT_SAN_OPTIONS, inherit [AL]SAN_OPTIONS
Change our ASAN_OPTIONS and LSAN_OPTIONS to set defaults for those
variables, rather than punting out entirely if we already have them in
the environment.

We want to take any user-provided settings over our own, but we can do
that by prepending our defaults to the variable. The libsanitizer
options parsing has "last option wins" semantics.

It's now possible to do e.g.:

    LSAN_OPTIONS=report_objects=1 ./t0006-date.sh

And not have the "report_objects=1" setting overwrite our sensible
default of "abort_on_error=1", but by prepending to the list we ensure
that:

    LSAN_OPTIONS=report_objects=1:abort_on_error=0 ./t0006-date.sh

Will take the desired "abort_on_error=0" over our default.

See b0f4c9087e (t: support clang/gcc AddressSanitizer, 2014-12-08)
for the original pattern being altered here, and
85b81b35ff (test-lib: set LSAN_OPTIONS to abort by default,
2017-09-05) for when LSAN_OPTIONS was added in addition to the
then-existing ASAN_OPTIONS.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-28 13:35:56 -08:00
Tao Klerks 317956d912 untracked-cache: write index when populating empty untracked cache
It is expected that an empty/unpopulated untracked cache structure can
be written to the index - by update-index, or by a "git status" call
that sees the untracked cache should be enabled and is not, but is
running with options that make the untracked cache non-applicable in
that run (eg a pathspec).

Currently, if that happens, then subsequent "git status" calls end up
populating the untracked cache, but not writing the index (not saving
their work) - so the performance outcome is almost identical to the
cache being altogether disabled.

This continues until the index gets written with the untracked cache
populated, for some *other* reason, such as a working tree change.

Detect the condition where an empty untracked cache exists in the
index and we will collect the list of untracked paths, and queue an
index write under that condition, so that the collected untracked
paths can be written out to the untracked cache extension in the
index.

This change depends on previous fixes to t7519 for the "ignore .git
changes when invalidating UNTR" test case to pass - before this fix,
the test never actually did anything as it was not set up correctly.

Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-28 10:02:18 -08:00
Tao Klerks 37482b4080 t7519: populate untracked cache before test
In its current state, the t7519 test dealing with untracked cache
assumes that "git update-index --untracked-cache" will *populate* the
untracked cache. This is not correct - it will only add an empty
untracked cache structure to the index.

If we're going to compare two git status runs with something
interesting happening in-between, we need to ensure that the index is
in a stable/steady state *before* that first run.

Achieve this by adding another prior "git status" run.

At this stage this change does nothing, because there is a bug,
addressed in the next patch, whereby once the empty untracked cache
structure is added by the update-index invocation, the untracked cache
gets updated in every subsequent "git status" call, but the index with
these updates does not get written down.

That bug actually invalidates this entire test case - but we're fixing
that next.

Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-28 10:02:18 -08:00
Tao Klerks a67d178be4 t7519: avoid file to index mtime race for untracked cache
In t7519 there is a test that writes files to disk, and immediately
writes the index with the untracked cache. Because of
mtime-comparison logic that uses a 1-second resolution, this means
the cached entries are not trusted/used under some circumstances
(see read-cache.c#is_racy_stat()).

Untracked cache tests in t7063 use a 1-second delay to avoid this
issue, but we don't want to introduce arbitrary slowdowns, so instead
use test-tool chmtime to backdate the files slightly. The t7063
delays are a #leftoverbit, to be worked on in a separate series.

This change doesn't actually affect the outcome of the test, but does
enhance its validity, and becomes relevant after later changes.

Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-28 10:02:17 -08:00
Junio C Hamano 2587df669b rerere-train: two fixes to the use of "git show -s"
The script uses "git show -s" to display the title of the merge
commit being studied, without explicitly disabling the pager, which
is not a safe thing to do in a script.

For example, when the pager is set to "less" with "-SF" options (-S
tells the pager not to fold lines but allow horizontal scrolling to
show the overly long lines, -F tells the pager not to wait if the
output in its entirety is shown on a single page), and the title of
the merge commit is longer than the width of the terminal, the pager
will wait until the end-user tells it to quit after showing the
single line.

Explicitly disable the pager with this "git show" invocation to fix
this.

The command uses the "--pretty=format:..." format, which adds LF in
between each pair of commits it outputs, which means that the label
for the merge being learned from will be followed by the next
message on the same line.  "--pretty=tformat:..." is what we should
instead, which adds LF after each commit, or a more modern way to
spell it, i.e. "--format=...".  This existing breakage becomes
easier to see, now we no longer use the pager.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-27 14:14:03 -08:00
Alex Henrie 808213ba36 switch: mention the --detach option when dying due to lack of a branch
Users who are accustomed to doing `git checkout <tag>` assume that
`git switch <tag>` will do the same thing. Inform them of the --detach
option so they aren't left wondering why `git switch` doesn't work but
`git checkout` does.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 22:21:48 -08:00
Ævar Arnfjörð Bjarmason 6aea6baeb3 object-file API: pass an enum to read_object_with_reference()
Change the read_object_with_reference() function to take an "enum
object_type". It was not prepared to handle an arbitrary "const
char *type", as it was itself calling type_from_string().

Let's change the only caller that passes in user data to use
type_from_string(), and convert the rest to use e.g. "OBJ_TREE"
instead of "tree_type".

The "cat-file" caller is not on the codepath that
handles"--allow-unknown", so the type_from_string() there is safe. Its
use of type_from_string() doesn't functionally differ from that of the
pre-image.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason 2bbb28a3ee object-file.c: add a literal version of write_object_file_prepare()
Split off a *_literally() variant of the write_object_file_prepare()
function. To do this create a new "hash_object_body()" static helper.

We now defer the type_name() call until the very last moment in
format_object_header() for those callers that aren't "hash-object
--literally".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason 44439c1c58 object-file API: have hash_object_file() take "enum object_type"
Change the hash_object_file() function to take an "enum
object_type".

Since a preceding commit all of its callers are passing either
"{commit,tree,blob,tag}_type", or the result of a call to type_name(),
the parse_object() caller that would pass NULL is now using
stream_object_signature().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason 0ff7b4f976 object API: rename hash_object_file_literally() to write_*()
Before 0c3db67cc8 (hash-object --literally: fix buffer overrun with
extra-long object type, 2015-05-04) the hash-object code being changed
here called write_sha1_file() to both hash and write a loose
object. Before that we'd use hash_sha1_file() to if "-w" wasn't
provided, and otherwise call write_sha1_file().

Now we'll always call the same function for both writing. Let's rename
it from hash_*_literally() to write_*_literally(). Even though the
write_*() might not actually write if HASH_WRITE_OBJECT isn't in
"flags", having it be more similar to write_object_file_flags() than
hash_object_file(), but carrying a name that would suggest that it's a
variant of the latter is confusing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason 0f156dbb04 object-file API: split up and simplify check_object_signature()
Split up the check_object_signature() function into that non-streaming
version (it accepts an already filled "buf"), and a new
stream_object_signature() which will retrieve the object from storage,
and hash it on-the-fly.

All of the callers of check_object_signature() were effectively
calling two different functions, if we go by cyclomatic
complexity. I.e. they'd either take the early "if (map)" branch and
return early, or not. This has been the case since the "if (map)"
condition was added in 090ea12671 (parse_object: avoid putting whole
blob in core, 2012-03-07).

We can then further simplify the resulting check_object_signature()
function since only one caller wanted to pass a non-NULL "buf" and a
non-NULL "real_oidp". That "read_loose_object()" codepath used by "git
fsck" can instead use hash_object_file() followed by oideq().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason ee213de22d object API users + docs: check <0, not !0 with check_object_signature()
Change those users of the object API that misused
check_object_signature() by assuming it returned any non-zero when the
OID didn't match the expected value to check <0 instead. In practice
all of this code worked before, but it wasn't consistent with rest of
the users of the API.

Let's also clarify what the <0 return value means in API docs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason cdcaaec9a6 object API docs: move check_object_signature() docs to cache.h
Move the API documentation for check_object_signature() to cache.h,
where its prototype is declared. This is in preparation for adding a
companion function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason 73182b2d84 object API: correct "buf" v.s. "map" mismatch in *.c and *.h
Change the name of the second argument to check_object_signature() to
be "buf" in object-file.c, making it consistent with the prototype in
cache.h

This fixes an inconsistency that's been with us since 2ade934026 (Add
"check_sha1_signature()" helper function, 2005-04-08), and makes a
subsequent commit's diff smaller, as we'll move these API docs to
cache.h.

While we're at it fix a small grammar error in the documentation,
dropping an "an" before "in-core object-data".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason c80d226a04 object-file API: have write_object_file() take "enum object_type"
Change the write_object_file() function to take an "enum object_type"
instead of a "const char *type". Its callers either passed
{commit,tree,blob,tag}_type and can pass the corresponding OBJ_* type
instead, or were hardcoding strings like "blob".

This avoids the back & forth fragility where the callers of
write_object_file() would have the enum type, and convert it
themselves via type_name(). We do have to now do that conversion
ourselves before calling write_object_file_prepare(), but those
codepaths will be similarly adjusted in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason b04cdea46c object-file API: add a format_object_header() function
Add a convenience function to wrap the xsnprintf() command that
generates loose object headers. This code was copy/pasted in various
parts of the codebase, let's define it in one place and re-use it from
there.

All except one caller of it had a valid "enum object_type" for us,
it's only write_object_file_prepare() which might need to deal with
"git hash-object --literally" and a potential garbage type. Let's have
the primary API use an "enum object_type", and define a *_literally()
function that can take an arbitrary "const char *" for the type.

See [1] for the discussion that prompted this patch, i.e. new code in
object-file.c that wanted to copy/paste the xsnprintf() invocation.

In the case of fast-import.c the callers unfortunately need to cast
back & forth between "unsigned char *" and "char *", since
format_object_header() ad encode_in_pack_object_header() take
different signedness.

1. https://lore.kernel.org/git/211213.86bl1l9bfz.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason 63e05f9056 object-file API: return "void", not "int" from hash_object_file()
The hash_object_file() function added in abdc3fc842 (Add
hash_sha1_file(), 2006-10-14) did not have a meaningful return value,
and it never has.

One was seemingly added to avoid adding braces to the "ret = "
assignments being modified here. Let's instead assign "0" to the "ret"
variables at the beginning of the relevant functions, and have them
return "void".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason bbea0ddeb9 object-file.c: split up declaration of unrelated variables
Split up the declaration of the "ret" and "re_allocated"
variables. It's not our usual style to group variable declarations
simply because they share a type, we'd only prefer to do so when the
two are closely related (e.g. "int i, j"). This change makes a
subsequent and meaningful change's diff smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Junio C Hamano 715d08a9e5 The eighth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 15:47:38 -08:00
Junio C Hamano 0fd097b9a0 Merge branch 'tb/coc-plc-update'
Document Taylor as a new member of Git PLC at SFC.  Welcome.

* tb/coc-plc-update:
  CODE_OF_CONDUCT.md: update PLC members list
2022-02-25 15:47:38 -08:00
Junio C Hamano b3db182886 Merge branch 'en/ort-inner-merge-conflict-report'
Messages "ort" merge backend prepares while dealing with conflicted
paths were unnecessarily confusing since it did not differentiate
inner merges and outer merges.

* en/ort-inner-merge-conflict-report:
  merge-ort: make informational messages from recursive merges clearer
2022-02-25 15:47:38 -08:00
Junio C Hamano 5c4f3804a7 Merge branch 'rs/pcre-invalid-utf8-fix-fix'
Workaround we have for versions of PCRE2 before their version 10.36
were in effect only for their versions newer than 10.36 by mistake,
which has been corrected.

* rs/pcre-invalid-utf8-fix-fix:
  grep: fix triggering PCRE2_NO_START_OPTIMIZE workaround
2022-02-25 15:47:38 -08:00
Junio C Hamano 80f7f618b6 Merge branch 'ds/core-untracked-cache-config'
Setting core.untrackedCache to true failed to add the untracked
cache extension to the index.

* ds/core-untracked-cache-config:
  dir: force untracked cache with core.untrackedCache
2022-02-25 15:47:36 -08:00
Junio C Hamano 362f869ff2 Merge branch 'ab/diff-free-more'
Leakfixes.

* ab/diff-free-more:
  diff.[ch]: have diff_free() free options->parseopts
  diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec)
2022-02-25 15:47:36 -08:00
Junio C Hamano 0a01df08c0 Merge branch 'ab/date-mode-release'
Plug (some) memory leaks around parse_date_format().

* ab/date-mode-release:
  date API: add and use a date_mode_release()
  date API: add basic API docs
  date API: provide and use a DATE_MODE_INIT
  date API: create a date.h, split from cache.h
  cache.h: remove always unused show_date_human() declaration
2022-02-25 15:47:36 -08:00
Junio C Hamano 294f296292 Merge branch 'jc/name-rev-stdin'
Finishing touches to an earlier "name-rev --annotate-stdin" series.

* jc/name-rev-stdin:
  name-rev: replace --stdin with --annotate-stdin in synopsis
2022-02-25 15:47:36 -08:00
Junio C Hamano 5b84280c65 Merge branch 'ab/grep-patterntype'
Some code clean-up in the "git grep" machinery.

* ab/grep-patterntype:
  grep: simplify config parsing and option parsing
  grep.c: do "if (bool && memchr())" not "if (memchr() && bool)"
  grep.h: make "grep_opt.pattern_type_option" use its enum
  grep API: call grep_config() after grep_init()
  grep.c: don't pass along NULL callback value
  built-ins: trust the "prefix" from run_builtin()
  grep tests: add missing "grep.patternType" config tests
  grep tests: create a helper function for "BRE" or "ERE"
  log tests: check if grep_config() is called by "log"-like cmds
  grep.h: remove unused "regex_t regexp" from grep_opt
2022-02-25 15:47:36 -08:00
Junio C Hamano 2e65591ed6 Merge branch 'js/apply-partial-clone-filters-recursively'
"git clone --filter=... --recurse-submodules" only makes the
top-level a partial clone, while submodules are fully cloned.  This
behaviour is changed to pass the same filter down to the submodules.

* js/apply-partial-clone-filters-recursively:
  clone, submodule: pass partial clone filters to submodules
2022-02-25 15:47:35 -08:00
Junio C Hamano d21d5ddfe6 Merge branch 'ja/i18n-common-messages'
Unify more messages to help l10n.

* ja/i18n-common-messages:
  i18n: fix some misformated placeholders in command synopsis
  i18n: remove from i18n strings that do not hold translatable parts
  i18n: factorize "invalid value" messages
  i18n: factorize more 'incompatible options' messages
2022-02-25 15:47:35 -08:00
Junio C Hamano a47fcfe871 Merge branch 'ab/only-single-progress-at-once'
Further tweaks on progress API.

* ab/only-single-progress-at-once:
  pack-bitmap-write.c: don't return without stop_progress()
  progress API: unify stop_progress{,_msg}(), fix trace2 bug
  progress.c: refactor stop_progress{,_msg}() to use helpers
  progress.c: use dereferenced "progress" variable, not "(*p_progress)"
  progress.h: format and be consistent with progress.c naming
  progress.c tests: test some invalid usage
  progress.c tests: make start/stop commands on stdin
  progress.c test helper: add missing braces
  leak tests: fix a memory leak in "test-progress" helper
2022-02-25 15:47:35 -08:00
Junio C Hamano 6249ce2d1b Merge branch 'ds/sparse-checkout-requires-per-worktree-config'
"git sparse-checkout" wants to work with per-worktree configuration,
but did not work well in a worktree attached to a bare repository.

* ds/sparse-checkout-requires-per-worktree-config:
  config: make git_configset_get_string_tmp() private
  worktree: copy sparse-checkout patterns and config on add
  sparse-checkout: set worktree-config correctly
  config: add repo_config_set_worktree_gently()
  worktree: create init_worktree_config()
  Documentation: add extensions.worktreeConfig details
2022-02-25 15:47:33 -08:00