Find a file
Derrick Stolee a4c22c16fa pack-objects: add --full-name-hash option
The pack_name_hash() method has not been materially changed since it was
introduced in ce0bd64299 (pack-objects: improve path grouping
heuristics., 2006-06-05). The intention here is to group objects by path
name, but also attempt to group similar file types together by making
the most-significant digits of the hash be focused on the final
characters.

Here's the crux of the implementation:

	/*
	 * This effectively just creates a sortable number from the
	 * last sixteen non-whitespace characters. Last characters
	 * count "most", so things that end in ".c" sort together.
	 */
	while ((c = *name++) != 0) {
		if (isspace(c))
			continue;
		hash = (hash >> 2) + (c << 24);
	}

As the comment mentions, this only cares about the last sixteen
non-whitespace characters. This cause some filenames to collide more
than others. Here are some examples that I've seen while investigating
repositories that are growing more than they should be:

 * "/CHANGELOG.json" is 15 characters, and is created by the beachball
   [1] tool. Only the final character of the parent directory can
   differntiate different versions of this file, but also only the two
   most-significant digits. If that character is a letter, then this is
   always a collision. Similar issues occur with the similar
   "/CHANGELOG.md" path, though there is more opportunity for
   differences in the parent directory.

 * Localization files frequently have common filenames but differentiate
   via parent directories. In C#, the name "/strings.resx.lcl" is used
   for these localization files and they will all collide in name-hash.

[1] https://github.com/microsoft/beachball

I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.

It is clear that the existing name-hash algorithm is optimized for
repositories with short path names, but also is optimized for packing a
single snapshot of a repository, not a repository with many versions of
the same file. In my testing, this has proven out where the name-hash
algorithm does a good job of finding peer files as delta bases when
unable to use a historical version of that exact file.

However, for repositories that have many versions of most files and
directories, it is more important that the objects that appear at the
same path are grouped together.

Create a new pack_full_name_hash() method and a new --full-name-hash
option for 'git pack-objects' to call that method instead. Add a simple
pass-through for 'git repack --full-name-hash' for additional testing in
the context of a full repack, where I expect this will be most
effective.

The hash algorithm is as simple as possible to be reasonably effective:
for each character of the path string, add a multiple of that character
and a large prime number (chosen arbitrarily, but intended to be large
relative to the size of a uint32_t). Then, shift the current hash value
to the right by 5, with overlap. The addition and shift parameters are
standard mechanisms for creating hard-to-predict behaviors in the bits
of the resulting hash.

This is not meant to be cryptographic at all, but uniformly distributed
across the possible hash values. This creates a hash that appears
pseudorandom. There is no ability to consider similar file types as
being close to each other.

In a later change, a test-tool will be added so the effectiveness of
this hash can be demonstrated directly.

For now, let's consider how effective this mechanism is when repacking a
repository with and without the --full-name-hash option. Specifically,
let's use 'git repack -adf [--full-name-hash]' as our test.

On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:

| Stage                 | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone           | 260 MB    | N/A         |
| Standard Repack       | 127MB     | 106s        |
| With --full-name-hash | 126 MB    | 99s         |

This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics with and without --full-name-hash.

However, we can test this in a repository that uses one of the
problematic naming conventions above. The fluentui [2] repo uses
beachball to generate CHANGELOG.json and CHANGELOG.md files, and these
files have very poor delta characteristics when comparing against
versions across parent directories.

| Stage                 | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone           | 694 MB    | N/A         |
| Standard Repack       | 438 MB    | 728s        |
| With --full-name-hash | 168 MB    | 142s        |

[2] https://github.com/microsoft/fluentui

In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.

Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:

| Repo     | Standard Repack | With --full-name-hash |
|----------|-----------------|-----------------------|
| fluentui |         438 MB  |               168 MB  |
| Repo B   |       6,255 MB  |               829 MB  |
| Repo C   |      37,737 MB  |             7,125 MB  |
| Repo D   |     130,049 MB  |             6,190 MB  |

Future changes could include making --full-name-hash implied by a config
value or even implied by default during a full repack.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-19 14:43:00 -07:00
.github Merge branch 'jk/ci-linux32-update' 2024-09-16 15:27:08 -07:00
block-sha1
builtin pack-objects: add --full-name-hash option 2024-09-19 14:43:00 -07:00
ci Merge branch 'jk/ci-linux32-update' 2024-09-16 15:27:08 -07:00
compat Merge branch 'rj/compat-terminal-unused-fix' 2024-09-10 13:16:42 -07:00
contrib Merge branch 'ps/clar-unit-test' 2024-09-18 18:02:05 -07:00
Documentation pack-objects: add --full-name-hash option 2024-09-19 14:43:00 -07:00
ewah ewah: bitmap_equals_ewah() 2024-05-24 11:40:44 -07:00
git-gui Merge https://github.com/j6t/git-gui 2024-07-07 22:50:59 -07:00
gitk-git Makefile(s): avoid recipe prefix in conditional statements 2024-04-08 14:42:32 -07:00
gitweb Merge branch 'am/gitweb-feed-use-committer-date' 2024-07-15 10:11:41 -07:00
mergetools mergetools: vscode: new tool 2024-09-01 20:47:58 -07:00
negotiator drop trailing newline from warning/error/die messages 2024-09-05 09:07:12 -07:00
oss-fuzz oss-fuzz: mark unused argv/argc argument 2024-08-17 09:46:11 -07:00
perl git-svn: add public property svn:global-ignores 2024-07-18 15:48:06 -07:00
po l10n: zh_CN: updated translation for 2.46 2024-07-28 19:52:41 +08:00
refs Merge branch 'ps/pack-refs-auto-heuristics' 2024-09-12 11:47:23 -07:00
reftable t: move reftable/stack_test.c to the unit testing framework 2024-09-08 13:24:03 -07:00
sha1
sha1collisiondetection@855827c583
sha1dc doc: refer to internet archive 2023-11-26 10:07:06 +09:00
sha256
t pack-objects: add --full-name-hash option 2024-09-19 14:43:00 -07:00
templates Merge branch 'jp/use-diff-index-in-pre-commit-sample' into maint-2.43 2024-02-08 16:22:02 -08:00
trace2 trace2: implement trace2_printf() for event target 2024-08-22 15:02:31 -07:00
xdiff
.cirrus.yml Merge branch 'cb/use-freebsd-13-2-at-cirrus-ci' 2024-02-06 14:31:22 -08:00
.clang-format Merge branch 'rs/unit-tests-test-run' 2024-08-19 11:07:36 -07:00
.editorconfig editorconfig: add Makefiles to "text files" 2024-03-23 11:42:31 -07:00
.gitattributes Makefile(s): do not enforce "all indents must be done with tab" 2024-05-05 16:54:35 +02:00
.gitignore Makefile: wire up the clar unit testing framework 2024-09-04 08:41:37 -07:00
.gitlab-ci.yml Merge branch 'jk/ci-linux32-update' 2024-09-16 15:27:08 -07:00
.gitmodules
.mailmap .mailmap document current address. 2024-09-06 09:31:15 -07:00
.tsan-suppressions
abspath.c
abspath.h
aclocal.m4
add-interactive.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
add-interactive.h
add-patch.c Merge branch 'ps/config-wo-the-repository' 2024-08-23 09:02:34 -07:00
advice.c Merge branch 'ds/advice-sparse-index-expansion' 2024-07-16 11:18:56 -07:00
advice.h Merge branch 'ds/advice-sparse-index-expansion' 2024-07-16 11:18:56 -07:00
alias.c config: plug various memory leaks 2024-05-27 11:20:00 -07:00
alias.h
alloc.c
alloc.h
apply.c Merge branch 'jk/apply-patch-mode-check-fix' into maint-2.46 2024-09-12 11:02:15 -07:00
apply.h apply: support --ours, --theirs, and --union for three-way merges 2024-09-09 10:07:24 -07:00
archive-tar.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
archive-zip.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
archive.c builtin/upload-archive: fix leaking args passed to write_archive() 2024-08-22 09:18:04 -07:00
archive.h archive.h: remove unnecessary include 2023-12-26 12:04:31 -08:00
attr.c Merge branch 'ps/use-the-repository' 2024-07-02 09:59:00 -07:00
attr.h Merge branch 'jc/varargs-attributes' 2024-06-17 15:55:55 -07:00
banned.h
base85.c
base85.h
bisect.c Merge branch 'ps/pack-refs-auto-heuristics' 2024-09-12 11:47:23 -07:00
bisect.h
blame.c Merge branch 'ps/leakfixes-more' 2024-07-08 14:53:10 -07:00
blame.h blame.h: remove unnecessary includes 2023-12-26 12:04:32 -08:00
blob.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
blob.h
bloom.c Merge branch 'tb/path-filter-fix' 2024-07-08 14:53:10 -07:00
bloom.h bloom: introduce deinit_bloom_filters() 2024-06-25 13:52:06 -07:00
branch.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
branch.h typo: replace 'commitish' with 'committish' 2024-04-11 15:14:56 -07:00
builtin.h global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
bulk-checkin.c bulk-checkin: fix leaking state TODO 2024-08-14 10:07:57 -07:00
bulk-checkin.h bulk-checkin: only support blobs in index_bulk_checkin 2023-09-26 10:17:56 -07:00
bundle-uri.c fetch: add top-level trace2 regions 2024-08-22 15:02:31 -07:00
bundle-uri.h
bundle.c Merge branch 'ps/leakfixes-part-5' 2024-09-03 09:15:00 -07:00
bundle.h unbundle: extend object verification for fetches 2024-06-20 10:30:08 -07:00
cache-tree.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
cache-tree.h
cbtree.c
cbtree.h
chdir-notify.c
chdir-notify.h
check-builtins.sh
checkout.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
checkout.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
chunk-format.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
chunk-format.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
CODE_OF_CONDUCT.md CoC: whitespace fix 2024-01-23 10:40:10 -08:00
color.c color: add support for 12-bit RGB colors 2024-05-02 09:30:38 -07:00
color.h color: add support for 12-bit RGB colors 2024-05-02 09:30:38 -07:00
column.c column: guard against negative padding 2024-02-13 10:18:57 -08:00
column.h
combine-diff.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
command-list.txt builtin/refs: new command to migrate ref storage formats 2024-06-06 09:04:34 -07:00
commit-graph.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
commit-graph.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
commit-reach.c Merge branch 'ds/for-each-ref-is-base' 2024-08-26 11:32:24 -07:00
commit-reach.h commit-reach: add get_branch_base_for_tip 2024-08-14 10:10:05 -07:00
commit-slab-decl.h
commit-slab-impl.h
commit-slab.h
commit.c Merge branch 'ds/for-each-ref-is-base' 2024-08-26 11:32:24 -07:00
commit.h commit: add gentle reference lookup method 2024-08-14 10:10:05 -07:00
common-main.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
config.c Merge branch 'ps/maintenance-detach-fix' 2024-08-26 11:32:20 -07:00
config.h Merge branch 'ps/maintenance-detach-fix' 2024-08-26 11:32:20 -07:00
config.mak.dev config.mak.dev: enable -Wunused-parameter by default 2024-08-28 09:51:18 -07:00
config.mak.in
config.mak.uname Merge branch 'rj/cygwin-has-dev-tty' 2024-09-13 15:27:44 -07:00
configure.ac global: convert trivial usages of test <expr> -a/-o <expr> 2023-11-11 09:21:00 +09:00
connect.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
connect.h
connected.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
connected.h
convert.c convert: fix leaks when resetting attributes 2024-08-22 09:18:03 -07:00
convert.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
copy.c Merge branch 'fixes/2.45.1/2.41' into fixes/2.45.1/2.42 2024-05-24 16:57:43 -07:00
copy.h Merge branch 'fixes/2.45.1/2.40' into fixes/2.45.1/2.41 2024-05-24 16:57:02 -07:00
COPYING
credential.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
credential.h credential: clear expired c->credential, unify secret clearing 2024-06-06 11:42:40 -07:00
csum-file.c Merge branch 'ps/leakfixes-part-4' 2024-08-23 09:02:33 -07:00
csum-file.h Merge branch 'ps/leakfixes-part-4' 2024-08-23 09:02:33 -07:00
ctype.c
daemon.c Merge branch 'jk/mark-unused-parameters' 2024-08-26 11:32:23 -07:00
date.c date: detect underflow/overflow when parsing dates with timezone offset 2024-06-25 17:07:41 -07:00
date.h date: make DATE_MODE thread-safe 2024-04-05 15:21:14 -07:00
decorate.c decorate: add clear_decoration() function 2023-10-05 14:54:55 -07:00
decorate.h t/: migrate helper/test-example-decorate to the unit testing framework 2024-05-28 13:53:36 -07:00
delta-islands.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
delta-islands.h
delta.h
detect-compiler
diagnose.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
diagnose.h
diff-delta.c
diff-lib.c diff-lib: drop unused index argument from get_stat_data() 2024-08-17 09:44:41 -07:00
diff-merges.c diff-merges: introduce '--dd' option 2023-10-09 12:47:29 -07:00
diff-merges.h
diff-no-index.c remerge-diff: clean up temporary objdir at a central place 2024-08-09 15:42:40 -07:00
diff.c Merge branch 'jc/range-diff-lazy-setup' 2024-09-16 14:22:55 -07:00
diff.h Merge branch 'jc/range-diff-lazy-setup' 2024-09-16 14:22:55 -07:00
diffcore-break.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
diffcore-delta.c Merge branch 'en/diffcore-delta-final-line-fix' into maint-2.43 2024-02-13 14:44:48 -08:00
diffcore-order.c
diffcore-pickaxe.c
diffcore-rename.c Merge branch 'ps/use-the-repository' 2024-07-02 09:59:00 -07:00
diffcore-rotate.c
diffcore.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
dir-iterator.c dir-iterator: support iteration in sorted order 2024-02-21 09:58:05 -08:00
dir-iterator.h dir-iterator: support iteration in sorted order 2024-02-21 09:58:05 -08:00
dir.c win32: override fspathcmp() with a directory separator-aware version 2024-07-13 16:23:36 -07:00
dir.h win32: override fspathcmp() with a directory separator-aware version 2024-07-13 16:23:36 -07:00
editor.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
editor.h editor: do not rely on the_repository for interactive edits 2024-08-13 10:01:00 -07:00
entry.c entry: fix leaking pathnames during delayed checkout 2024-08-01 08:47:37 -07:00
entry.h wrapper: reduce scope of remove_or_warn() 2023-09-29 15:14:56 -07:00
environment.c config: fix leaking comment character config 2024-08-14 10:07:58 -07:00
environment.h config: fix leaking comment character config 2024-08-14 10:07:58 -07:00
exec-cmd.c exec_cmd: RUNTIME_PREFIX on z/OS systems 2024-08-22 08:58:46 -07:00
exec-cmd.h
fetch-negotiator.c
fetch-negotiator.h
fetch-pack.c drop trailing newline from warning/error/die messages 2024-09-05 09:07:12 -07:00
fetch-pack.h fetch-pack: expose fsckObjects configuration logic 2024-06-20 10:30:07 -07:00
fmt-merge-msg.c Merge branch 'ps/use-the-repository' 2024-07-02 09:59:00 -07:00
fmt-merge-msg.h
fsck.c builtin/refs: add verify subcommand 2024-08-08 09:36:53 -07:00
fsck.h fsck: add ref name check for files backend 2024-08-08 09:36:53 -07:00
fsmonitor--daemon.h fsmonitor--daemon.h: remove unnecessary includes 2023-12-26 12:04:32 -08:00
fsmonitor-ipc.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
fsmonitor-ipc.h
fsmonitor-ll.h
fsmonitor-path-utils.h
fsmonitor-settings.c config: clarify memory ownership in git_config_pathname() 2024-05-27 11:19:59 -07:00
fsmonitor-settings.h
fsmonitor.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
fsmonitor.h
generate-cmdlist.sh
generate-configlist.sh
generate-hooklist.sh
gettext.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
gettext.h
git-archimport.perl perl: bump the required Perl version to 5.8.1 from 5.8.0 2023-11-17 07:26:32 +09:00
git-compat-util.h CodingGuidelines: also mention MAYBE_UNUSED 2024-08-29 11:28:07 -07:00
git-curl-compat.h remote-curl: add Transfer-Encoding header only for older curl 2024-04-10 19:24:48 +02:00
git-cvsexportcommit.perl perl: bump the required Perl version to 5.8.1 from 5.8.0 2023-11-17 07:26:32 +09:00
git-cvsimport.perl Merge branch 'js/update-urls-in-doc-and-comment' into maint-2.43 2024-02-08 16:22:01 -08:00
git-cvsserver.perl perl: bump the required Perl version to 5.8.1 from 5.8.0 2023-11-17 07:26:32 +09:00
git-difftool--helper.sh git-difftool--helper: honor --trust-exit-code with --dir-diff 2024-02-20 09:30:32 -08:00
git-filter-branch.sh
git-instaweb.sh doc: switch links to https 2023-11-26 10:07:05 +09:00
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh
git-mergetool.sh
git-p4.py git-p4: show Perforce error to the user 2024-05-08 15:44:14 -07:00
git-quiltimport.sh git-quiltimport: avoid an unnecessary subshell 2024-03-16 11:08:57 -07:00
git-request-pull.sh
git-send-email.perl Merge branch 'jk/send-email-mailmap' 2024-09-06 10:38:49 -07:00
git-sh-i18n.sh
git-sh-setup.sh
git-submodule.sh builtin/submodule: allow "add" to use different ref storage format 2024-08-08 09:22:21 -07:00
git-svn.perl git-svn: mention svn:global-ignores in help+docs 2024-08-14 15:10:24 -07:00
GIT-VERSION-GEN Start preparing for Git 2.46.2 2024-09-16 15:19:05 -07:00
git-web--browse.sh
git-zlib.c
git-zlib.h
git.c git: fix leaking system paths 2024-08-14 10:07:56 -07:00
git.rc
gpg-interface.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
gpg-interface.h tag: fix sign_buffer() call to create a signed tag 2024-02-07 10:47:25 -08:00
graph.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
graph.h doc: switch links to https 2023-11-26 10:07:05 +09:00
grep.c grep: prefer UNUSED to MAYBE_UNUSED for pcre allocators 2024-08-29 13:59:46 -07:00
grep.h
hash-lookup.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
hash-lookup.h
hash.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
hashmap.c
hashmap.h
help.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
help.h builtin/merge: fix leaking struct cmdnames in get_strategy() 2024-06-11 13:15:07 -07:00
hex-ll.c hex-ll: separate out non-hash-algo functions 2023-09-29 15:14:56 -07:00
hex-ll.h hex-ll: separate out non-hash-algo functions 2023-09-29 15:14:56 -07:00
hex.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
hex.h hex: guard declarations with USE_THE_REPOSITORY_VARIABLE 2024-06-14 10:26:35 -07:00
hook.c hooks: remove implicit dependency on the_repository 2024-08-13 10:01:01 -07:00
hook.h hooks: remove implicit dependency on the_repository 2024-08-13 10:01:01 -07:00
http-backend.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
http-fetch.c http-fetch: don't crash when parsing packfile without a repo 2024-06-14 10:26:34 -07:00
http-push.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
http-walker.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
http.c http: do not ignore proxy path 2024-08-02 08:30:08 -07:00
http.h http: add support for authtype and credential 2024-04-16 22:39:07 -07:00
ident.c ident: add casts for fallback name and GECOS 2024-06-07 10:30:51 -07:00
ident.h
imap-send.c Merge branch 'jk/mark-unused-parameters' 2024-08-26 11:32:23 -07:00
INSTALL Sync with 2.42.2 2024-04-19 12:38:50 +02:00
iterator.h
json-writer.c strbuf: introduce strbuf_addstrings() to repeatedly add a string 2024-05-29 09:09:39 -07:00
json-writer.h doc: switch links to https 2023-11-26 10:07:05 +09:00
khash.h
kwset.c doc: switch links to https 2023-11-26 10:07:05 +09:00
kwset.h doc: switch links to https 2023-11-26 10:07:05 +09:00
levenshtein.c
levenshtein.h
LGPL-2.1
line-log.c line-log: always allocate the output prefix 2024-06-07 10:30:51 -07:00
line-log.h line-log.h: remove unnecessary include 2023-12-26 12:04:32 -08:00
line-range.c line-range: plug leaking find functions 2024-06-11 13:15:08 -07:00
line-range.h
linear-assignment.c
linear-assignment.h
list-objects-filter-options.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
list-objects-filter-options.h
list-objects-filter.c Merge branch 'ps/leakfixes-more' 2024-07-08 14:53:10 -07:00
list-objects-filter.h
list-objects.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
list-objects.h
list.h doc: switch links to https 2023-11-26 10:07:05 +09:00
lockfile.c
lockfile.h lockfile: report when rollback fails 2024-03-07 12:34:13 -08:00
log-tree.c Merge branch 'jc/range-diff-lazy-setup' 2024-09-16 14:22:55 -07:00
log-tree.h format-patch: return an allocated string from log_write_email_headers() 2024-03-19 17:54:16 -07:00
loose.c drop trailing newline from warning/error/die messages 2024-09-05 09:07:12 -07:00
loose.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
ls-refs.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
ls-refs.h
mailinfo.c Merge branch 'jc/mailinfo-header-cleanup' 2024-09-12 11:47:22 -07:00
mailinfo.h
mailmap.c Merge branch 'jk/send-email-mailmap' 2024-09-06 10:38:49 -07:00
mailmap.h check-mailmap: add options for additional mailmap sources 2024-08-27 14:51:29 -07:00
Makefile Merge branch 'ps/clar-unit-test' 2024-09-18 18:02:05 -07:00
match-trees.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
match-trees.h
mem-pool.c don't report vsnprintf(3) error as bug 2024-04-21 12:27:07 -07:00
mem-pool.h __attribute__: add a few missing format attributes 2024-06-10 09:16:30 -07:00
merge-blobs.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
merge-blobs.h
merge-ll.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
merge-ll.h merge options: add a conflict style member 2024-03-14 10:08:52 -07:00
merge-ort-wrappers.c merge: fix leaking merge bases 2024-06-11 13:15:08 -07:00
merge-ort-wrappers.h merge: fix leaking merge bases 2024-06-11 13:15:08 -07:00
merge-ort.c merge-ort: unconditionally release attributes index 2024-08-14 10:08:00 -07:00
merge-ort.h Merge branch 'ps/leakfixes-more' 2024-07-08 14:53:10 -07:00
merge-recursive.c merge-recursive: honor diff.algorithm 2024-07-13 18:10:49 -07:00
merge-recursive.h merge-recursive: honor diff.algorithm 2024-07-13 18:10:49 -07:00
merge.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
merge.h
mergesort.h
midx-write.c Merge branch 'ps/leakfixes-part-5' 2024-09-03 09:15:00 -07:00
midx.c pack-bitmap: tag bitmapped packs with their corresponding MIDX 2024-08-27 14:50:26 -07:00
midx.h midx: implement support for writing incremental MIDX chains 2024-08-06 12:01:39 -07:00
name-hash.c name-hash: add index_dir_find() 2024-02-26 15:34:01 -08:00
name-hash.h name-hash: add index_dir_find() 2024-02-26 15:34:01 -08:00
notes-cache.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
notes-cache.h
notes-merge.c Merge branch 'ps/leakfixes-more' 2024-07-08 14:53:10 -07:00
notes-merge.h
notes-utils.c Merge branch 'ps/leakfixes-more' 2024-07-08 14:53:10 -07:00
notes-utils.h commit: fix leaking parents when calling commit_tree_extended() 2024-06-11 13:15:07 -07:00
notes.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
notes.h revision: fix leaking display notes 2024-06-11 13:15:05 -07:00
object-file-convert.c hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
object-file-convert.h object-file-convert: stubs for converting from one object format to another 2023-10-02 14:57:38 -07:00
object-file.c Merge branch 'ps/leakfixes-part-4' 2024-08-23 09:02:33 -07:00
object-file.h
object-name.c Merge branch 'ps/leakfixes-part-4' 2024-08-23 09:02:33 -07:00
object-name.h object-name: free leaking object contexts 2024-06-11 13:15:05 -07:00
object-store-ll.h object-file: add a compat_oid_in parameter to write_object_file_flags 2023-10-02 14:57:39 -07:00
object-store.h
object.c object: fix leaking packfiles when closing object store 2024-08-08 09:22:21 -07:00
object.h Merge branch 'tb/path-filter-fix' 2024-07-08 14:53:10 -07:00
oid-array.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
oid-array.h
oidmap.c
oidmap.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
oidset.c oidset: pass hash algorithm when parsing file 2024-06-14 10:26:34 -07:00
oidset.h oidset: pass hash algorithm when parsing file 2024-06-14 10:26:34 -07:00
oidtree.c global: ensure that object IDs are always padded 2024-06-14 10:26:32 -07:00
oidtree.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
pack-bitmap-write.c Merge branch 'jk/drop-unused-parameters' 2024-08-26 11:32:22 -07:00
pack-bitmap.c pack-bitmap.c: avoid repeated pack_pos_to_offset() during reuse 2024-08-27 14:50:27 -07:00
pack-bitmap.h pack-bitmap: tag bitmapped packs with their corresponding MIDX 2024-08-27 14:50:26 -07:00
pack-check.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
pack-mtimes.c
pack-mtimes.h
pack-objects.c pack-objects: free packing_data in more places 2023-12-14 14:38:07 -08:00
pack-objects.h pack-objects: add --full-name-hash option 2024-09-19 14:43:00 -07:00
pack-revindex.c Merge branch 'ps/use-the-repository' 2024-07-02 09:59:00 -07:00
pack-revindex.h pack-revindex: implement midx_pair_to_pack_pos() 2023-12-14 14:38:08 -08:00
pack-write.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
pack.h
packfile.c midx: implement support for writing incremental MIDX chains 2024-08-06 12:01:39 -07:00
packfile.h midx: implement support for writing incremental MIDX chains 2024-08-06 12:01:39 -07:00
pager.c rebase --exec: respect --quiet 2024-08-21 08:57:51 -07:00
pager.h pager: introduce wait_for_pager 2024-07-25 09:03:00 -07:00
parallel-checkout.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
parallel-checkout.h
parse-options-cb.c Merge branch 'ps/use-the-repository' 2024-07-02 09:59:00 -07:00
parse-options.c parse-options: rearrange long_name matching code 2024-03-03 09:49:22 -08:00
parse-options.h parse-options: cast long name for OPTION_ALIAS 2024-06-07 10:30:53 -07:00
parse.c config: introduce git_config_double() 2024-05-24 11:40:42 -07:00
parse.h config: introduce git_config_double() 2024-05-24 11:40:42 -07:00
patch-delta.c
patch-ids.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
patch-ids.h
path.c path: hide functions using the_repository by default 2024-08-13 10:01:01 -07:00
path.h path: hide functions using the_repository by default 2024-08-13 10:01:01 -07:00
pathspec.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
pathspec.h Merge branch 'as/pathspec-h-typofix' 2024-07-12 08:41:57 -07:00
pkt-line.c Merge branch 'jx/sideband-chomp-newline-fix' into maint-2.43 2024-02-08 16:22:11 -08:00
pkt-line.h Merge branch 'jx/sideband-chomp-newline-fix' into maint-2.43 2024-02-08 16:22:11 -08:00
preload-index.c parse: separate out parsing functions from config.h 2023-09-29 15:14:57 -07:00
preload-index.h
pretty.c pretty: fix leaking key/value separator buffer 2024-08-22 09:18:04 -07:00
pretty.h Merge branch 'rs/date-mode-pass-by-value' 2024-04-16 14:50:29 -07:00
prio-queue.c
prio-queue.h
progress.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
progress.h
promisor-remote.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
promisor-remote.h config: clarify memory ownership in git_config_string() 2024-05-27 11:20:00 -07:00
prompt.c parse: separate out parsing functions from config.h 2023-09-29 15:14:57 -07:00
prompt.h
protocol-caps.c protocol-caps: use hash algorithm from passed-in repository 2024-06-14 10:26:34 -07:00
protocol-caps.h
protocol.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
protocol.h doc: switch links to https 2023-11-26 10:07:05 +09:00
prune-packed.c
prune-packed.h
pseudo-merge.c Merge branch 'jk/drop-unused-parameters' 2024-08-26 11:32:22 -07:00
pseudo-merge.h pack-bitmap: drop unused parameters from select_pseudo_merges() 2024-08-17 09:44:41 -07:00
quote.c
quote.h
range-diff.c userdiff: fix leaking memory for configured diff drivers 2024-08-14 10:08:01 -07:00
range-diff.h format-patch: run range-diff with larger creation-factor 2024-05-06 11:57:22 -07:00
reachable.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
reachable.h
read-cache-ll.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
read-cache.c Merge branch 'ps/maintenance-detach-fix' 2024-08-26 11:32:20 -07:00
read-cache.h
README.md git-gui: note the new maintainer 2024-05-11 17:22:17 +02:00
rebase-interactive.c Merge branch 'ps/use-the-repository' 2024-07-02 09:59:00 -07:00
rebase-interactive.h rebase -i: pass struct replay_opts to parse_insn_line() 2024-05-30 10:02:56 -07:00
rebase.c parse: separate out parsing functions from config.h 2023-09-29 15:14:57 -07:00
rebase.h
ref-filter.c ref-filter: fix leak with unterminated %(if) atoms 2024-09-10 09:26:13 -07:00
ref-filter.h ref-filter: add ref_format_clear() function 2024-09-09 16:26:11 -07:00
reflog-walk.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
reflog-walk.h date: make DATE_MODE thread-safe 2024-04-05 15:21:14 -07:00
reflog.c Merge branch 'jc/reflog-expire-lookup-commit-fix' into maint-2.46 2024-08-26 11:10:21 -07:00
reflog.h
refs.c Merge branch 'jk/mark-unused-parameters' 2024-08-26 11:32:23 -07:00
refs.h Merge branch 'sj/ref-fsck' 2024-08-16 12:51:51 -07:00
refspec.c Merge branch 'ps/use-the-repository' 2024-07-02 09:59:00 -07:00
refspec.h refspec: remove global tag refspec structure 2024-06-07 10:30:49 -07:00
RelNotes Start preparing for Git 2.46.2 2024-09-16 15:19:05 -07:00
remote-curl.c Merge branch 'jk/remote-wo-url' 2024-07-02 09:59:01 -07:00
remote.c ref-filter: fix leak when formatting %(push:remoteref) 2024-09-09 16:26:10 -07:00
remote.h ref-filter: fix leak when formatting %(push:remoteref) 2024-09-09 16:26:10 -07:00
replace-object.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
replace-object.h
repo-settings.c Merge branch 'tb/path-filter-fix' 2024-07-08 14:53:10 -07:00
repository.c Merge branch 'ps/ref-storage-migration' 2024-06-17 15:55:55 -07:00
repository.h Merge branch 'tb/path-filter-fix' 2024-07-08 14:53:10 -07:00
rerere.c Merge branch 'ps/config-wo-the-repository' 2024-08-23 09:02:34 -07:00
rerere.h
reset.c hooks: remove implicit dependency on the_repository 2024-08-13 10:01:01 -07:00
reset.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
resolve-undo.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
resolve-undo.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
revision.c Merge branch 'jk/free-commit-buffer-of-skipped-commits' into maint-2.46 2024-09-13 15:26:49 -07:00
revision.h revision: optionally record matches with pathspec elements 2024-04-03 14:55:21 -07:00
run-command.c run-command: fix detaching when running auto maintenance 2024-08-16 09:46:26 -07:00
run-command.h run-command: declare the git_shell_path() function globally 2024-07-13 16:23:37 -07:00
sane-ctype.h
scalar.c scalar: add --no-tags option to 'scalar clone' 2024-09-06 14:13:48 -07:00
SECURITY.md
send-pack.c drop trailing newline from warning/error/die messages 2024-09-05 09:07:12 -07:00
send-pack.h
sequencer.c Merge branch 'mt/rebase-x-quiet' 2024-08-28 10:31:26 -07:00
sequencer.h Merge branch 'pw/rebase-i-error-message' into maint-2.45 2024-07-02 09:27:56 -07:00
serve.c drop trailing newline from warning/error/die messages 2024-09-05 09:07:12 -07:00
serve.h
server-info.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
server-info.h
setup.c Merge branch 'jk/mark-unused-parameters' 2024-08-26 11:32:23 -07:00
setup.h refs: convert ref storage format to an enum 2024-06-06 09:04:31 -07:00
sh-i18n--envsubst.c doc: switch links to https 2023-11-26 10:07:05 +09:00
sha1dc_git.c
sha1dc_git.h
shallow.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
shallow.h
shared.mak Makefile: simplify output of the libpath_template 2024-01-31 14:43:00 -08:00
shell.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
shortlog.h
sideband.c Merge branch 'ps/leakfixes-part-5' 2024-09-03 09:15:00 -07:00
sideband.h
sigchain.c
sigchain.h
simple-ipc.h
sparse-index.c Merge branch 'ds/advice-sparse-index-expansion' 2024-07-16 11:18:56 -07:00
sparse-index.h
split-index.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
split-index.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
stable-qsort.c
statinfo.c Merge branch 'jc/fake-lstat' 2023-12-27 14:52:24 -08:00
statinfo.h cache: add fake_lstat() 2023-09-15 17:08:46 -07:00
strbuf.c Merge branch 'gt/t-hash-unit-test' 2024-06-12 13:37:15 -07:00
strbuf.h strbuf: introduce strbuf_addstrings() to repeatedly add a string 2024-05-29 09:09:39 -07:00
streaming.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
streaming.h
string-list.c
string-list.h
strmap.c
strmap.h
strvec.c strvec: declare the strvec_push_nodup() function globally 2024-07-13 16:23:36 -07:00
strvec.h strvec: declare the strvec_push_nodup() function globally 2024-07-13 16:23:36 -07:00
sub-process.c
sub-process.h
submodule-config.c submodule-config: fix leaking name entry when traversing submodules 2024-08-14 10:07:58 -07:00
submodule-config.h Merge branch 'vd/fsck-submodule-url-test' 2024-01-26 08:54:47 -08:00
submodule.c Merge branch 'ps/config-wo-the-repository' 2024-08-23 09:02:34 -07:00
submodule.h Sync with 2.39.4 2024-04-19 12:38:37 +02:00
symlinks.c
symlinks.h
tag.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
tag.h refs: pass repo when peeling objects 2024-05-17 10:33:39 -07:00
tar.h
tempfile.c lockfile: report when rollback fails 2024-03-07 12:34:13 -08:00
tempfile.h lockfile: report when rollback fails 2024-03-07 12:34:13 -08:00
thread-utils.c
thread-utils.h
tmp-objdir.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
tmp-objdir.h
trace.c doc: switch links to https 2023-11-26 10:07:05 +09:00
trace.h
trace2.c trace2: emit 'def_param' set with 'cmd_name' event 2024-03-07 10:24:34 -08:00
trace2.h __attribute__: trace2_region_enter_printf() is like "printf" 2024-06-10 09:16:19 -07:00
trailer.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
trailer.h Merge branch 'la/hide-trailer-info' 2024-05-23 11:04:27 -07:00
transport-helper.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
transport-internal.h
transport.c Merge branch 'ps/leakfixes-part-5' 2024-09-03 09:15:00 -07:00
transport.h
tree-diff.c hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
tree-walk.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
tree-walk.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00
tree.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
tree.h read_tree(): respect max_allowed_tree_depth 2023-08-31 15:51:08 -07:00
unicode-width.h unicode: update the width tables to Unicode 15.1 2023-09-25 16:17:28 -07:00
unimplemented.sh
unix-socket.c
unix-socket.h
unix-stream-server.c
unix-stream-server.h
unpack-trees.c unpack-trees: clear index when not propagating it 2024-08-14 10:08:00 -07:00
unpack-trees.h tree-walk: drop MAX_TRAVERSE_TREES macro 2023-08-31 15:51:07 -07:00
upload-pack.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
upload-pack.h
url.c hex-ll: separate out non-hash-algo functions 2023-09-29 15:14:56 -07:00
url.h
urlmatch.c hex-ll: separate out non-hash-algo functions 2023-09-29 15:14:56 -07:00
urlmatch.h
usage.c usage: report vsnprintf(3) failure 2024-04-05 15:16:27 -07:00
userdiff.c userdiff: fix leaking memory for configured diff drivers 2024-08-14 10:08:01 -07:00
userdiff.h userdiff: fix leaking memory for configured diff drivers 2024-08-14 10:08:01 -07:00
utf8.c doc: switch links to https 2023-11-26 10:07:05 +09:00
utf8.h doc: switch links to https 2023-11-26 10:07:05 +09:00
varint.c
varint.h
version.c
version.h
versioncmp.c global: prepare for hiding away repo-less config functions 2024-08-13 10:01:05 -07:00
versioncmp.h
walker.c refs: add referent to each_ref_fn 2024-08-09 08:47:34 -07:00
walker.h
wildmatch.c
wildmatch.h
worktree.c Merge branch 'ps/config-wo-the-repository' 2024-08-23 09:02:34 -07:00
worktree.h Merge branch 'jc/worktree-git-path' 2024-06-24 16:39:15 -07:00
wrap-for-bin.sh
wrapper.c don't report vsnprintf(3) error as bug 2024-04-21 12:27:07 -07:00
wrapper.h wrapper: introduce log2u() 2024-09-04 08:03:24 -07:00
write-or-die.c write-or-die: fix the polarity of GIT_FLUSH environment variable 2024-02-13 11:57:28 -08:00
write-or-die.h
ws.c
ws.h
wt-status.c Merge branch 'jc/range-diff-lazy-setup' 2024-09-16 14:22:55 -07:00
wt-status.h status: unify parsing of --untracked= and status.showUntrackedFiles 2024-03-13 10:43:32 -07:00
xdiff-interface.c global: introduce USE_THE_REPOSITORY_VARIABLE macro 2024-06-14 10:26:33 -07:00
xdiff-interface.h hash-ll: merge with "hash.h" 2024-06-14 10:26:33 -07:00

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).

Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).

To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks