core doc: modernize core.bigFileThreshold documentation

The core.bigFileThreshold documentation has been largely unchanged
since 5eef828bc0 (fast-import: Stream very large blobs directly to
pack, 2010-02-01).

But since then this setting has been expanded to affect a lot more
than that description indicated. Most notably in how "git diff" treats
them, see 6bf3b81348 (diff --stat: mark any file larger than
core.bigfilethreshold binary, 2014-08-16).

In addition to that, numerous commands and APIs make use of a
streaming mode for files above this threshold.

So let's attempt to summarize 12 years of changes in behavior, which
can be seen with:

    git log --oneline -Gbig_file_thre 5eef828bc03.. -- '*.c'

To do that turn this into a bullet-point list. The summary Han Xin
produced in [1] helped a lot, but is a bit too detailed for
documentation aimed at users. Let's instead summarize how
user-observable behavior differs, and generally describe how we tend
to stream these files in various commands.

1. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/

Helped-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Ævar Arnfjörð Bjarmason 2022-06-11 10:44:20 +08:00 committed by Junio C Hamano
parent 2b6070ac4c
commit 3c3ca0b0c1

View file

@ -444,17 +444,32 @@ You probably do not need to adjust this value.
Common unit suffixes of 'k', 'm', or 'g' are supported.
core.bigFileThreshold::
Files larger than this size are stored deflated, without
attempting delta compression. Storing large files without
delta compression avoids excessive memory usage, at the
slight expense of increased disk usage. Additionally files
larger than this size are always treated as binary.
The size of files considered "big", which as discussed below
changes the behavior of numerous git commands, as well as how
such files are stored within the repository. The default is
512 MiB. Common unit suffixes of 'k', 'm', or 'g' are
supported.
+
Default is 512 MiB on all platforms. This should be reasonable
for most projects as source code and other text files can still
be delta compressed, but larger binary media files won't be.
Files above the configured limit will be:
+
Common unit suffixes of 'k', 'm', or 'g' are supported.
* Stored deflated in packfiles, without attempting delta compression.
+
The default limit is primarily set with this use-case in mind. With it,
most projects will have their source code and other text files delta
compressed, but not larger binary media files.
+
Storing large files without delta compression avoids excessive memory
usage, at the slight expense of increased disk usage.
+
* Will be treated as if they were labeled "binary" (see
linkgit:gitattributes[5]). e.g. linkgit:git-log[1] and
linkgit:git-diff[1] will not compute diffs for files above this limit.
+
* Will generally be streamed when written, which avoids excessive
memory usage, at the cost of some fixed overhead. Commands that make
use of this include linkgit:git-archive[1],
linkgit:git-fast-import[1], linkgit:git-index-pack[1] and
linkgit:git-fsck[1].
core.excludesFile::
Specifies the pathname to the file that contains patterns to