Commit graph

290 commits

Author SHA1 Message Date
René Scharfe 0e90673957 use child_process members "args" and "env" directly
Build argument list and environment of child processes by using
struct child_process and populating its members "args" and "env"
directly instead of maintaining separate strvecs and letting
run_command_v_opt() and friends populate these members.  This is
simpler, shorter and slightly more efficient.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-30 14:04:40 -04:00
Junio C Hamano 220604042c Merge branch 'jk/unused-anno-more'
More UNUSED annotation to help using -Wunused option with the
compiler.

* jk/unused-anno-more:
  ll-merge: mark unused parameters in callbacks
  diffcore-pickaxe: mark unused parameters in pickaxe functions
  convert: mark unused parameter in null stream filter
  apply: mark unused parameters in noop error/warning routine
  apply: mark unused parameters in handlers
  date: mark unused parameters in handler functions
  string-list: mark unused callback parameters
  object-file: mark unused parameters in hash_unknown functions
  mark unused parameters in trivial compat functions
  update-index: drop unused argc from do_reupdate()
  submodule--helper: drop unused argc from module_list_compute()
  diffstat_consume(): assert non-zero length
2022-10-27 14:51:52 -07:00
Emily Shaffer c695592850 config: let feature.experimental imply gc.cruftPacks=true
We are interested in exploring whether gc.cruftPacks=true should become
the default value.

To determine whether it is safe to do so, let's encourage more users to
try it out.

Users who have set feature.experimental=true have already volunteered to
try new and possibly-breaking config changes, so let's try this new
default with that set of users.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26 14:39:31 -07:00
Jeff King 1ee3471045 string-list: mark unused callback parameters
String-lists may be used with callbacks for clearing or iteration. These
callbacks need to conform to a particular interface, even though not
every callback needs all of its parameters. Mark the unused ones to make
-Wunused-parameter happy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17 21:24:04 -07:00
Junio C Hamano c68bd3ec22 Merge branch 'rs/gc-pack-refs-simplify'
Code clean-up.

* rs/gc-pack-refs-simplify:
  gc: simplify maintenance_task_pack_refs()
2022-10-11 10:36:12 -07:00
René Scharfe b004c90282 gc: simplify maintenance_task_pack_refs()
Pass a constant string array directly to run_command_v_opt() instead of
copying it into a strvec first.  This shortens the code and avoids heap
allocations.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-05 12:46:27 -07:00
Derrick Stolee 50a044f1e4 gc: replace config subprocesses with API calls
The 'git maintenance [un]register' commands set or unset the multi-
valued maintenance.repo config key with the absolute path of the current
repository. These are set in the global config file.

Instead of calling a subcommand and creating a new process, create the
proper API calls to git_config_set_multivar_in_file_gently(). It
requires loading the filename for the global config file (and erroring
out if now $HOME value is set). We also need to be careful about using
CONFIG_REGEX_NONE when adding the value and using
CONFIG_FLAGS_FIXED_VALUE when removing the value. In both cases, we
check that the value already exists (this check already existed for
'unregister').

Also, remove the transparent translation of the error code from the
config API to the exit code of 'git maintenance'. Instead, use die() to
recover from failures at that level. In the case of 'unregister
--force', allow the CONFIG_NOTHING_SET error code to be a success. This
allows a possible race where another process removes the config value.
The end result is that the config value is not set anymore, so we can
treat this as a success.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-27 09:32:26 -07:00
Derrick Stolee 1ebe6b0297 maintenance: add 'unregister --force'
The 'git maintenance unregister' subcommand has a step that removes the
current repository from the multi-valued maitenance.repo config key.
This fails if the repository is not listed in that key. This makes
running 'git maintenance unregister' twice result in a failure in the
second instance.

This failure exit code is helpful, but its message is not. Add a new
die() message that explicitly calls out the failure due to the
repository not being registered.

In some cases, users may want to run 'git maintenance unregister' just
to make sure that background jobs will not start on this repository, but
they do not want to check to see if it is registered first. Add a new
'--force' option that will siltently succeed if the repository is not
already registered.

Also add an extra test of 'git maintenance unregister' at a point where
there are no registered repositories. This should fail without --force.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-27 09:32:25 -07:00
Junio C Hamano 04cc66fe8c Merge branch 'sg/parse-options-subcommand'
Fix messages incorrectly marked for translation.

* sg/parse-options-subcommand:
  gc: don't translate literal commands
2022-09-21 15:27:03 -07:00
Alex Henrie 8b74492135 gc: don't translate literal commands
The command you type is still "git maintenance" even in other languages.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21 10:43:10 -07:00
Junio C Hamano dd407f1c7c Merge branch 'ab/unused-annotation'
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.

* ab/unused-annotation:
  git-compat-util.h: use "deprecated" for UNUSED variables
  git-compat-util.h: use "UNUSED", not "UNUSED(var)"
2022-09-14 12:56:39 -07:00
Junio C Hamano a6b42ec0c6 Merge branch 'jk/unused-annotation'
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.

* jk/unused-annotation:
  is_path_owned_by_current_uid(): mark "report" parameter as unused
  run-command: mark unused async callback parameters
  mark unused read_tree_recursive() callback parameters
  hashmap: mark unused callback parameters
  config: mark unused callback parameters
  streaming: mark unused virtual method parameters
  transport: mark bundle transport_options as unused
  refs: mark unused virtual method parameters
  refs: mark unused reflog callback parameters
  refs: mark unused each_ref_fn parameters
  git-compat-util: add UNUSED macro
2022-09-14 12:56:39 -07:00
Junio C Hamano 56785a3fad Merge branch 'bc/gc-crontab-fix'
FreeBSD portability fix for "git maintenance" that spawns "crontab"
to schedule tasks.

* bc/gc-crontab-fix:
  gc: use temporary file for editing crontab
2022-09-05 18:33:41 -07:00
Junio C Hamano d528044c83 Merge branch 'sg/parse-options-subcommand'
Introduce the "subcommand" mode to parse-options API and update the
command line parser of Git commands with subcommands.

* sg/parse-options-subcommand: (23 commits)
  remote: run "remote rm" argv through parse_options()
  maintenance: add parse-options boilerplate for subcommands
  pass subcommand "prefix" arguments to parse_options()
  builtin/worktree.c: let parse-options parse subcommands
  builtin/stash.c: let parse-options parse subcommands
  builtin/sparse-checkout.c: let parse-options parse subcommands
  builtin/remote.c: let parse-options parse subcommands
  builtin/reflog.c: let parse-options parse subcommands
  builtin/notes.c: let parse-options parse subcommands
  builtin/multi-pack-index.c: let parse-options parse subcommands
  builtin/hook.c: let parse-options parse subcommands
  builtin/gc.c: let parse-options parse 'git maintenance's subcommands
  builtin/commit-graph.c: let parse-options parse subcommands
  builtin/bundle.c: let parse-options parse subcommands
  parse-options: add support for parsing subcommands
  parse-options: drop leading space from '--git-completion-helper' output
  parse-options: clarify the limitations of PARSE_OPT_NODASH
  parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options
  api-parse-options.txt: fix description of OPT_CMDMODE
  t0040-parse-options: test parse_options() with various 'parse_opt_flags'
  ...
2022-09-01 13:40:18 -07:00
Ævar Arnfjörð Bjarmason 5cf88fd8b0 git-compat-util.h: use "UNUSED", not "UNUSED(var)"
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75d (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.

Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.

This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:49:48 -07:00
Junio C Hamano c068a3b8ee Merge branch 'ds/decorate-filter-tweak'
The namespaces used by "log --decorate" from "refs/" hierarchy by
default has been tightened.

* ds/decorate-filter-tweak:
  fetch: use ref_namespaces during prefetch
  maintenance: stop writing log.excludeDecoration
  log: create log.initialDecorationSet=all
  log: add --clear-decorations option
  log: add default decoration filter
  log-tree: use ref_namespaces instead of if/else-if
  refs: use ref_namespaces for replace refs base
  refs: add array of ref namespaces
  t4207: test coloring of grafted decorations
  t4207: modernize test
  refs: allow "HEAD" as decoration filter
2022-08-29 14:55:11 -07:00
brian m. carlson ee69e7884e gc: use temporary file for editing crontab
While cron is specified by POSIX, there are a wide variety of
implementations in use.  "git maintenance" assumes that the
"crontab" command can be fed from its standard input the new
contents and the syntax to do so is not to have any filename
argument, as POSIX describes.  However, on FreeBSD, the cron
implementation requires a file name argument: if the user wants to
edit standard input, they must specify "-".

Unfortunately, POSIX systems do not have to interpret "-" on the
command line of crontab as a request to read from the standard
input.  Blindly adding "-" on the command line would not work as a
general solution.

Since POSIX tells us that cron must accept a file name argument, let's
solve this problem by specifying a temporary file instead.  This will
ensure that we work with the vast majority of implementations.

Note that because delete_tempfile closes the file for us, we should not
call fclose here on the handle, since doing so will introduce a double
free.

Reported-by: Renato Botelho <garga@FreeBSD.org>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-28 15:47:00 -07:00
Jeff King 0d330a53f3 maintenance: add parse-options boilerplate for subcommands
Several of the git-maintenance subcommands don't take any options, so
they don't bother looking at argv at all. This means they'll silently
accept garbage, like:

  $ git maintenance register --foo
  [no output]

  $ git maintenance stop bar
  [no output]

Let's give them the basic boilerplate to detect and handle these cases:

  $ git maintenance register --foo
  error: unknown option `foo'
  usage: git maintenance register

  $ git maintenance stop bar
  usage: git maintenance stop

We could reduce the number of lines of code here a bit with a shared
helper function. But it's worth building out the boilerplate, as it may
serve as the base for adding options later.

Note one complication: maintenance_start() calls directly into
maintenance_register(), so it now needs to pass a plausible argv (we
don't care, but parse_options() is expecting there to at least be an
argv[0] program name). This is an extra line of code, but it eliminates
the need for an explanatory comment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-25 09:43:30 -07:00
Jeff King 63e14ee2d6 refs: mark unused each_ref_fn parameters
Functions used with for_each_ref(), etc, need to conform to the
each_ref_fn interface. But most of them don't need every parameter;
let's annotate the unused ones to quiet -Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:54 -07:00
SZEDER Gábor 0350954482 builtin/gc.c: let parse-options parse 'git maintenance's subcommands
'git maintenanze' parses its subcommands with a couple of if
statements.  parse-options has just learned to parse subcommands, so
let's use that facility instead, with the benefits of shorter code,
handling missing or unknown subcommands, and listing subcommands for
Bash completion.

This change makes 'git maintenance' consistent with other commands in
that the help text shown for '-h' goes to standard output, not error,
in the exit code and error message on unknown subcommand, and the
error message on missing subcommand.  There is a test checking these,
which is now updated accordingly.

Note that some of the functions implementing each subcommand don't
accept any parameters, so add the (unused) 'argc', '**argv' and
'*prefix' parameters to make them match the type expected by
parse-options, and thus avoid casting function pointers.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 11:13:15 -07:00
Derrick Stolee 863a8ae97b maintenance: stop writing log.excludeDecoration
This reverts commit 96eaffebbf (maintenance: set
log.excludeDecoration durin prefetch, 2021-01-19).

The previous change created a default decoration filter that does not
include refs/prefetch/, so this modification of the config is no longer
needed.

One issue that can happen from this point on is that users who ran the
prefetch task on previous versions of Git will still have a
log.excludeDecoration value and that will prevent the new default
decoration filter from being active. Thus, when we add the refs/bundle/
namespace as part of the bundle URI feature, those users will see
refs/bundle/ decorations.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-05 14:13:13 -07:00
Ævar Arnfjörð Bjarmason 55916bba0f gc: fix a memory leak
Fix a memory leak in code added in 41abfe15d9 (maintenance: add
pack-refs task, 2021-02-09), we need to call strvec_clear() on the
"struct strvec" that we initialized.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-01 11:43:43 -07:00
Jiang Xin b4eda05d58 i18n: fix mismatched camelCase config variables
Some config variables are combinations of multiple words, and we
typically write them in camelCase forms in manpage and translatable
strings. It's not easy to find mismatches for these camelCase config
variables during code reviews, but occasionally they are identified
during localization translations.

To check for mismatched config variables, I introduced a new feature
in the helper program for localization[^1]. The following mismatched
config variables have been identified by running the helper program,
such as "git-po-helper check-pot".

Lowercase in manpage should use camelCase:

 * Documentation/config/http.txt: http.pinnedpubkey

Lowercase in translable strings should use camelCase:

 * builtin/fast-import.c:  pack.indexversion
 * builtin/gc.c:           gc.logexpiry
 * builtin/index-pack.c:   pack.indexversion
 * builtin/pack-objects.c: pack.indexversion
 * builtin/repack.c:       pack.writebitmaps
 * commit.c:               i18n.commitencoding
 * gpg-interface.c:        user.signingkey
 * http.c:                 http.postbuffer
 * submodule-config.c:     submodule.fetchjobs

Mismatched camelCases, choose the former:

 * Documentation/config/transfer.txt: transfer.credentialsInUrl
   remote.c:                          transfer.credentialsInURL

[^1]: https://github.com/git-l10n/git-po-helper

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-17 10:38:26 -07:00
Junio C Hamano a50036da1a Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.

* tb/cruft-packs:
  sha1-file.c: don't freshen cruft packs
  builtin/gc.c: conditionally avoid pruning objects via loose
  builtin/repack.c: add cruft packs to MIDX during geometric repack
  builtin/repack.c: use named flags for existing_packs
  builtin/repack.c: allow configuring cruft pack generation
  builtin/repack.c: support generating a cruft pack
  builtin/pack-objects.c: --cruft with expiration
  reachable: report precise timestamps from objects in cruft packs
  reachable: add options to add_unseen_recent_objects_to_traversal
  builtin/pack-objects.c: --cruft without expiration
  builtin/pack-objects.c: return from create_object_entry()
  t/helper: add 'pack-mtimes' test-tool
  pack-mtimes: support writing pack .mtimes files
  chunk-format.h: extract oid_version()
  pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
  pack-mtimes: support reading .mtimes files
  Documentation/technical: add cruft-packs.txt
2022-06-03 14:30:37 -07:00
Taylor Blau 5b92477f89 builtin/gc.c: conditionally avoid pruning objects via loose
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in
flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding
unreachable objects as loose ones, and instead create a cruft pack and
`.mtimes` file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Junio C Hamano 538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano 2b0a58d164 Merge branch 'ep/maint-equals-null-cocci' for maint-2.35
* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-02 10:06:04 -07:00
Junio C Hamano afe8a9070b tree-wide: apply equals-null.cocci
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02 09:50:37 -07:00
Elia Pinto 7cbbb77173 builtin/gc.c: delete duplicate include
object-store.h is included more than once.

Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-13 22:23:16 +00:00
Junio C Hamano c70bc338e9 Merge branch 'ab/config-based-hooks-2'
More "config-based hooks".

* ab/config-based-hooks-2:
  run-command: remove old run_hook_{le,ve}() hook API
  receive-pack: convert push-to-checkout hook to hook.h
  read-cache: convert post-index-change to use hook.h
  commit: convert {pre-commit,prepare-commit-msg} hook to hook.h
  git-p4: use 'git hook' to run hooks
  send-email: use 'git hook run' for 'sendemail-validate'
  git hook run: add an --ignore-missing flag
  hooks: convert worktree 'post-checkout' hook to hook library
  hooks: convert non-worktree 'post-checkout' hook to hook library
  merge: convert post-merge to use hook.h
  am: convert applypatch-msg to use hook.h
  rebase: convert pre-rebase to use hook.h
  hook API: add a run_hooks_l() wrapper
  am: convert {pre,post}-applypatch to use hook.h
  gc: use hook library for pre-auto-gc hook
  hook API: add a run_hooks() wrapper
  hook: add 'run' subcommand
2022-02-09 14:21:00 -08:00
Emily Shaffer bad62a8cd5 gc: use hook library for pre-auto-gc hook
Move the pre-auto-gc hook away from run-command.h to and over to the
new hook.h library. This uses the new run_hooks() wrapper.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-07 15:19:34 -08:00
Ævar Arnfjörð Bjarmason 24f6e6d626 usage.c + gc: add and use a die_message_errno()
Change the "error: " output when we exit with 128 due to gc.log errors
to use a "fatal: " prefix instead. To do this add a
die_message_errno() a sibling function to the die_errno() added in a
preceding commit.

Before this we'd expect report_last_gc_error() to return -1 from
error_errno() in this case. It already treated a status of 0 and 1
specially. Let's just document that anything that's not 0 or 1 should
be returned.

We could also retain the "ret < 0" behavior here without hardcoding
128 by returning -128, and having the caller do a "return -ret", but I
think this makes more sense, and preserves the path from
die_message*()'s return value to the "return" without hardcoding
"128".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-07 13:25:16 -08:00
Ævar Arnfjörð Bjarmason 0faf84d97d gc: return from cmd_gc(), don't call exit()
A minor code cleanup. Let's "return" from cmd_gc() instead of calling
exit(). See 338abb0f04 (builtins + test helpers: use return instead
of exit() in cmd_*, 2021-06-08) for other such cases.

While we're at it add a \n to separate the variable declaration from
the rest of the code in this block. Both of these changes make a
subsequent change smaller and easier to read.

This change isn't really needed for that subsequent change, but now
someone viewing that future behavior change won't need to wonder why
we're either still calling exit() here, or fixing it while we're at
it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-07 13:25:16 -08:00
Junio C Hamano c1d16cedd4 Merge branch 'ds/no-usable-cron-on-macos'
"git maintenance run" learned to use system supplied scheduler
backend, but cron on macOS turns out to be unusable for this
purpose.

* ds/no-usable-cron-on-macos:
  maintenance: disable cron on macOS
2021-11-10 15:01:20 -08:00
Derrick Stolee 689a2aa719 maintenance: disable cron on macOS
In eba1ba9 (maintenance: `git maintenance run` learned
`--scheduler=<scheduler>`, 2021-09-04), we introduced the ability to
specify a scheduler explicitly. This led to some extra checks around
whether an alternative scheduler was available. This added the
functionality of removing background maintenance from schedulers other
than the one selected.

On macOS, cron is technically available, but running 'crontab' triggers
a UI prompt asking for special permissions. This is the major reason why
launchctl is used as the default scheduler. The is_crontab_available()
method triggers this UI prompt, causing user disruption.

Remove this disruption by using an #ifdef to prevent running crontab
this way on macOS. This has the unfortunate downside that if a user
manually selects cron via the '--scheduler' option, then adjusting the
scheduler later will not remove the schedule from cron. The
'--scheduler' option ignores the is_available checks, which is how we
can get into this situation.

Extract the new check_crontab_process() method to avoid making the
'child' variable unused on macOS. The method is marked MAYBE_UNUSED
because it has no callers on macOS.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-10 11:20:20 -08:00
Glen Choo a897ab7ed1 gc: perform incremental repack when implictly enabled
builtin/gc.c has two ways of checking if multi-pack-index is enabled:
- git_config_get_bool() in incremental_repack_auto_condition()
- the_repository->settings.core_multi_pack_index in
  maintenance_task_incremental_repack()

The two implementations have existed since the incremental-repack task
was introduced in e841a79a13 (maintenance: add incremental-repack auto
condition, 2020-09-25). These two values can diverge because
prepare_repo_settings() enables the feature in the_repository->settings
by default.

In the case where core.multiPackIndex is not set in the config, the auto
condition would fail, causing the incremental-repack task to not be
run. Because we always want to consider the default values, we should
always use the_repository->settings.

Standardize on using the_repository->settings.core_multi_pack_index to
check if multi-pack-index is enabled.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15 14:30:10 -07:00
Junio C Hamano b5866edf97 Merge branch 'ab/gc-remove-unused-call'
Code clean-up.

* ab/gc-remove-unused-call:
  gc: remove unused launchctl_get_uid() call
2021-09-23 13:44:46 -07:00
Junio C Hamano c042ad5ad5 Merge branch 'js/run-command-close-packs'
The run-command API has been updated so that the callers can easily
ask the file descriptors open for packfiles to be closed immediately
before spawning commands that may trigger auto-gc.

* js/run-command-close-packs:
  Close object store closer to spawning child processes
  run_auto_maintenance(): implicitly close the object store
  run-command: offer to close the object store before running
  run-command: prettify the `RUN_COMMAND_*` flags
  pull: release packs before fetching
  commit-graph: when closing the graph, also release the slab
2021-09-20 15:20:45 -07:00
Junio C Hamano ed8794ef7a Merge branch 'lh/systemd-timers'
"git maintenance" scheduler learned to use systemd timers as a
possible backend.

* lh/systemd-timers:
  maintenance: add support for systemd timers on Linux
  maintenance: `git maintenance run` learned `--scheduler=<scheduler>`
  cache.h: Introduce a generic "xdg_config_home_for(…)" function
2021-09-20 15:20:40 -07:00
Ævar Arnfjörð Bjarmason 3218cb753f gc: remove unused launchctl_get_uid() call
When the launchctl_boot_plist() function was added in
a16eb6b1ff (maintenance: skip bootout/bootstrap when plist is
registered, 2021-08-24), an unused call to launchctl_get_uid() was
added along with it. That call appears to have been copy/pasted from
launchctl_boot_plist().

Since we can remove that, we can also get rid of the "result"
variable, whose only purpose was allow for the free() between its
assignment and the return. That pattern also appears to have been
copy/pasted from launchctl_boot_plist().

As the patch shows the returned value from launchctl_get_uid() wasn't
used at all in this function. The launchctl_get_uid() function itself
just calls xstrfmt() and getuid(), neither of which have any subtle
global side-effects, so this removal is safe.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-12 16:47:18 -07:00
Junio C Hamano 02d263277a Merge branch 'ab/gc-log-rephrase'
A pathname in an advice message has been made cut-and-paste ready.

* ab/gc-log-rephrase:
  gc: remove trailing dot from "gc.log" line
2021-09-10 11:46:28 -07:00
Johannes Schindelin c4dee2c085 Close object store closer to spawning child processes
In many cases where we spawned child processes that _may_ trigger a
repack, we explicitly closed the object store first (so that the
`repack` process can delete the `.pack` files, which would otherwise not
be possible on Windows since files cannot be deleted as long as they as
still in use).

Wherever possible, we now use the new `close_object_store` bit of the
`run_command()` API, to delay closing the object store even further.
This makes the code easier to maintain because it is now more obvious
that we only release those file handles because of those child
processes.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 12:56:11 -07:00
Junio C Hamano 4293c057dc Merge branch 'js/maintenance-launchctl-fix'
"git maintenance" scheduler fix for macOS.

* js/maintenance-launchctl-fix:
  maintenance: skip bootout/bootstrap when plist is registered
  maintenance: create `launchctl` configuration using a lock file
2021-09-08 13:30:29 -07:00
Lénaïc Huard b681b191f9 maintenance: add support for systemd timers on Linux
The existing mechanism for scheduling background maintenance is done
through cron. On Linux systems managed by systemd, systemd provides an
alternative to schedule recurring tasks: systemd timers.

The main motivations to implement systemd timers in addition to cron
are:
* cron is optional and Linux systems running systemd might not have it
  installed.
* The execution of `crontab -l` can tell us if cron is installed but not
  if the daemon is actually running.
* With systemd, each service is run in its own cgroup and its logs are
  tagged by the service inside journald. With cron, all scheduled tasks
  are running in the cron daemon cgroup and all the logs of the
  user-scheduled tasks are pretended to belong to the system cron
  service.
  Concretely, a user that doesn’t have access to the system logs won’t
  have access to the log of their own tasks scheduled by cron whereas
  they will have access to the log of their own tasks scheduled by
  systemd timer.
  Although `cron` attempts to send email, that email may go unseen by
  the user because these days, local mailboxes are not heavily used
  anymore.

In order to schedule git maintenance, we need two unit template files:
* ~/.config/systemd/user/git-maintenance@.service
  to define the command to be started by systemd and
* ~/.config/systemd/user/git-maintenance@.timer
  to define the schedule at which the command should be run.

Those units are templates that are parameterized by the frequency.

Based on those templates, 3 timers are started:
* git-maintenance@hourly.timer
* git-maintenance@daily.timer
* git-maintenance@weekly.timer

The command launched by those three timers are the same as with the
other scheduling methods:

/path/to/git for-each-repo --exec-path=/path/to
--config=maintenance.repo maintenance run --schedule=%i

with the full path for git to ensure that the version of git launched
for the scheduled maintenance is the same as the one used to run
`maintenance start`.

The timer unit contains `Persistent=true` so that, if the computer is
powered down when a maintenance task should run, the task will be run
when the computer is back powered on.

Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 10:57:04 -07:00
Lénaïc Huard eba1ba9d32 maintenance: git maintenance run learned --scheduler=<scheduler>
Depending on the system, different schedulers can be used to schedule
the hourly, daily and weekly executions of `git maintenance run`:
* `launchctl` for MacOS,
* `schtasks` for Windows and
* `crontab` for everything else.

`git maintenance run` now has an option to let the end-user explicitly
choose which scheduler he wants to use:
`--scheduler=auto|crontab|launchctl|schtasks`.

When `git maintenance start --scheduler=XXX` is run, it not only
registers `git maintenance run` tasks in the scheduler XXX, it also
removes the `git maintenance run` tasks from all the other schedulers to
ensure we cannot have two schedulers launching concurrent identical
tasks.

The default value is `auto` which chooses a suitable scheduler for the
system.

`git maintenance stop` doesn't have any `--scheduler` parameter because
this command will try to remove the `git maintenance run` tasks from all
the available schedulers.

Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 10:57:04 -07:00
Ævar Arnfjörð Bjarmason b45c172e51 gc: remove trailing dot from "gc.log" line
Remove the trailing dot from the warning we emit about gc.log. It's
common for various terminal UX's to allow the user to select "words",
and by including the trailing dot a user wanting to select the path to
gc.log will need to manually remove the trailing dot.

Such a user would also probably need to adjust the path if it e.g. had
spaces in it, but this should address this very common case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Suggested-by: Jan Judas <snugar.i@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-02 11:22:32 -07:00
Derrick Stolee a16eb6b1ff maintenance: skip bootout/bootstrap when plist is registered
On macOS, we use launchctl to manage the background maintenance
schedule. This uses a set of .plist files to describe the schedule, but
these files are also registered with 'launchctl bootstrap'. If multiple
'git maintenance start' commands run concurrently, then they can collide
replacing these schedule files and registering them with launchctl.

To avoid extra launchctl commands, do a check for the .plist files on
disk and check if they are registered using 'launchctl list <name>'.
This command will return with exit code 0 if it exists, or exit code 113
if it does not.

We can test this behavior using the GIT_TEST_MAINT_SCHEDULER environment
variable.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24 14:16:58 -07:00
Johannes Schindelin bb01122a82 maintenance: create launchctl configuration using a lock file
When two `git maintenance` processes try to write the `.plist` file, we
need to help them with serializing their efforts.

The 150ms time-out value was determined from thin air.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24 14:16:57 -07:00
Johannes Schindelin a03b097d63 Use a better name for the function interpolating paths
It is not immediately clear what `expand_user_path()` means, so let's
rename it to `interpolate_path()`. This also opens the path for
interpolating more than just a home directory.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26 12:17:16 -07:00
Lénaïc Huard c5d0b12a4c maintenance: fix two memory leaks
Fixes two memory leaks when running `git maintenance start` or `git
maintenance stop` in `update_background_schedule`:

$ valgrind --leak-check=full ~/git/bin/git maintenance start
==76584== Memcheck, a memory error detector
==76584== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==76584== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==76584== Command: /home/lenaic/git/bin/git maintenance start
==76584==
==76584==
==76584== HEAP SUMMARY:
==76584==     in use at exit: 34,880 bytes in 252 blocks
==76584==   total heap usage: 820 allocs, 568 frees, 146,414 bytes allocated
==76584==
==76584== 65 bytes in 1 blocks are definitely lost in loss record 17 of 39
==76584==    at 0x483E6AF: malloc (vg_replace_malloc.c:306)
==76584==    by 0x3DC39C: xrealloc (wrapper.c:126)
==76584==    by 0x3992CC: strbuf_grow (strbuf.c:98)
==76584==    by 0x39A473: strbuf_vaddf (strbuf.c:392)
==76584==    by 0x39BC54: xstrvfmt (strbuf.c:979)
==76584==    by 0x39BD2C: xstrfmt (strbuf.c:989)
==76584==    by 0x18451B: update_background_schedule (gc.c:1977)
==76584==    by 0x1846F6: maintenance_start (gc.c:2011)
==76584==    by 0x1847B4: cmd_maintenance (gc.c:2030)
==76584==    by 0x127A2E: run_builtin (git.c:453)
==76584==    by 0x127E81: handle_builtin (git.c:704)
==76584==    by 0x128142: run_argv (git.c:771)
==76584==
==76584== 240 bytes in 1 blocks are definitely lost in loss record 29 of 39
==76584==    at 0x4840D7B: realloc (vg_replace_malloc.c:834)
==76584==    by 0x491CE5D: getdelim (in /usr/lib/libc-2.33.so)
==76584==    by 0x39ADD7: strbuf_getwholeline (strbuf.c:635)
==76584==    by 0x39AF31: strbuf_getdelim (strbuf.c:706)
==76584==    by 0x39B064: strbuf_getline_lf (strbuf.c:727)
==76584==    by 0x184273: crontab_update_schedule (gc.c:1919)
==76584==    by 0x184678: update_background_schedule (gc.c:1997)
==76584==    by 0x1846F6: maintenance_start (gc.c:2011)
==76584==    by 0x1847B4: cmd_maintenance (gc.c:2030)
==76584==    by 0x127A2E: run_builtin (git.c:453)
==76584==    by 0x127E81: handle_builtin (git.c:704)
==76584==    by 0x128142: run_argv (git.c:771)
==76584==
==76584== LEAK SUMMARY:
==76584==    definitely lost: 305 bytes in 2 blocks
==76584==    indirectly lost: 0 bytes in 0 blocks
==76584==      possibly lost: 0 bytes in 0 blocks
==76584==    still reachable: 34,575 bytes in 250 blocks
==76584==         suppressed: 0 bytes in 0 blocks
==76584== Reachable blocks (those to which a pointer was found) are not shown.
==76584== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==76584==
==76584== For lists of detected and suppressed errors, rerun with: -s
==76584== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-12 07:00:45 +09:00
Derrick Stolee 32f67888d8 maintenance: respect remote.*.skipFetchAll
If a remote has the skipFetchAll setting enabled, then that remote is
not intended for frequent fetching. It makes sense to not fetch that
data during the 'prefetch' maintenance task. Skip that remote in the
iteration without error. The skip_default_update member is initialized
in remote.c:handle_config() as part of initializing the 'struct remote'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-16 13:36:55 -07:00
Derrick Stolee cfd781ea22 maintenance: use 'git fetch --prefetch'
The 'prefetch' maintenance task previously forced the following refspec
for each remote:

	+refs/heads/*:refs/prefetch/<remote>/*

If a user has specified a more strict refspec for the remote, then this
prefetch task downloads more objects than necessary.

The previous change introduced the '--prefetch' option to 'git fetch'
which manipulates the remote's refspec to place all resulting refs into
refs/prefetch/, with further partitioning based on the destinations of
those refspecs.

Update the documentation to be more generic about the destination refs.
Do not mention custom refspecs explicitly, as that does not need to be
highlighted in this documentation. The important part of placing refs in
refs/prefetch/ remains.

Reported-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-16 13:36:55 -07:00
Derrick Stolee a039a1fcf9 maintenance: simplify prefetch logic
The previous logic filled a string list with the names of each remote,
but instead we could simply run the appropriate 'git fetch' data
directly in the remote iterator. Do this for reduced code size, but also
because it sets up an upcoming change to use the remote's refspec. This
data is accessible from the 'struct remote' data that is now accessible
in fetch_remote().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-06 14:23:47 -07:00
Eric Sunshine 26c7974376 maintenance: fix incorrect maintenance.repo path with bare repository
The periodic maintenance tasks configured by `git maintenance start`
invoke `git for-each-repo` to run `git maintenance run` on each path
specified by the multi-value global configuration variable
`maintenance.repo`. Because `git for-each-repo` will likely be run
outside of the repositories which require periodic maintenance, it is
mandatory that the repository paths specified by `maintenance.repo` are
absolute.

Unfortunately, however, `git maintenance register` does nothing to
ensure that the paths it assigns to `maintenance.repo` are indeed
absolute, and may in fact -- especially in the case of a bare repository
-- assign a relative path to `maintenance.repo` instead. Fix this
problem by converting all paths to absolute before assigning them to
`maintenance.repo`.

While at it, also fix `git maintenance unregister` to convert paths to
absolute, as well, in order to ensure that it can correctly remove from
`maintenance.repo` a path assigned via `git maintenance register`.

Reported-by: Clement Moyroud <clement.moyroud@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23 00:22:45 -08:00
Derrick Stolee acc1c4d5d4 maintenance: incremental strategy runs pack-refs weekly
When the 'maintenance.strategy' config option is set to 'incremental',
a default maintenance schedule is enabled. Add the 'pack-refs' task to
that strategy at the weekly cadence.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-09 23:09:29 -08:00
Derrick Stolee 41abfe15d9 maintenance: add pack-refs task
It is valuable to collect loose refs into a more compressed form. This
is typically the packed-refs file, although this could be the reftable
in the future. Having packed refs can be extremely valuable in repos
with many tags or remote branches that are not modified by the local
user, but still are necessary for other queries.

For instance, with many exploded refs, commands such as

	git describe --tags --exact-match HEAD

can be very slow (multiple seconds). This command in particular is used
by terminal prompts to show when a detatched HEAD is pointing to an
existing tag, so having it be slow causes significant delays for users.

Add a new 'pack-refs' maintenance task. It runs 'git pack-refs --all
--prune' to move loose refs into a packed form. For now, that is the
packed-refs file, but could adjust to other file formats in the future.

This is the first of several sub-tasks of the 'gc' task that could be
extracted to their own tasks. In this process, we should not change the
behavior of the 'gc' task since that remains the default way to keep
repositories maintained. Creating a new task for one of these sub-tasks
only provides more customization options for those choosing to not use
the 'gc' task. It is certainly possible to have both the 'gc' and
'pack-refs' tasks enabled and run regularly. While they may repeat
effort, they do not conflict in a destructive way.

The 'auto_condition' function pointer is left NULL for now. We could
extend this in the future to have a condition check if pack-refs should
be run during 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-09 23:09:24 -08:00
Junio C Hamano 973e20b83f Merge branch 'jk/peel-iterated-oid'
The peel_ref() API has been replaced with peel_iterated_oid().

* jk/peel-iterated-oid:
  refs: switch peel_ref() to peel_iterated_oid()
2021-02-03 15:04:49 -08:00
Junio C Hamano 15bf48b987 Merge branch 'ds/maintenance-prefetch-cleanup'
Test clean-up plus UI improvement by hiding extra refs that
the prefetch task uses from "log --decorate" output.

* ds/maintenance-prefetch-cleanup:
  t7900: clean up some broken refs
  maintenance: set log.excludeDecoration durin prefetch
2021-02-03 15:04:48 -08:00
Junio C Hamano bcaaf972e6 Merge branch 'tb/pack-revindex-api'
Abstract accesses to in-core revindex that allows enumerating
objects stored in a packfile in the order they appear in the pack,
in preparation for introducing an on-disk precomputed revindex.

* tb/pack-revindex-api: (21 commits)
  for_each_object_in_pack(): clarify pack vs index ordering
  pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()'
  pack-revindex: hide the definition of 'revindex_entry'
  pack-revindex: remove unused 'find_revindex_position()'
  pack-revindex: remove unused 'find_pack_revindex()'
  builtin/gc.c: guess the size of the revindex
  for_each_object_in_pack(): convert to new revindex API
  unpack_entry(): convert to new revindex API
  packed_object_info(): convert to new revindex API
  retry_bad_packed_offset(): convert to new revindex API
  get_delta_base_oid(): convert to new revindex API
  rebuild_existing_bitmaps(): convert to new revindex API
  try_partial_reuse(): convert to new revindex API
  get_size_by_pos(): convert to new revindex API
  show_objects_for_type(): convert to new revindex API
  bitmap_position_packfile(): convert to new revindex API
  check_object(): convert to new revindex API
  write_reused_pack_verbatim(): convert to new revindex API
  write_reused_pack_one(): convert to new revindex API
  write_reuse_object(): convert to new revindex API
  ...
2021-01-25 14:19:20 -08:00
Junio C Hamano 58e2ce9112 Merge branch 'ma/more-opaque-lock-file'
Code clean-up.

* ma/more-opaque-lock-file:
  read-cache: try not to peek into `struct {lock_,temp}file`
  refs/files-backend: don't peek into `struct lock_file`
  midx: don't peek into `struct lock_file`
  commit-graph: don't peek into `struct lock_file`
  builtin/gc: don't peek into `struct lock_file`
2021-01-25 14:19:17 -08:00
Jeff King 36a317929b refs: switch peel_ref() to peel_iterated_oid()
The peel_ref() interface is confusing and error-prone:

  - it's typically used by ref iteration callbacks that have both a
    refname and oid. But since they pass only the refname, we may load
    the ref value from the filesystem again. This is inefficient, but
    also means we are open to a race if somebody simultaneously updates
    the ref. E.g., this:

      int some_ref_cb(const char *refname, const struct object_id *oid, ...)
      {
              if (!peel_ref(refname, &peeled))
                      printf("%s peels to %s",
                             oid_to_hex(oid), oid_to_hex(&peeled);
      }

    could print nonsense. It is correct to say "refname peels to..."
    (you may see the "before" value or the "after" value, either of
    which is consistent), but mentioning both oids may be mixing
    before/after values.

    Worse, whether this is possible depends on whether the optimization
    to read from the current iterator value kicks in. So it is actually
    not possible with:

      for_each_ref(some_ref_cb);

    but it _is_ possible with:

      head_ref(some_ref_cb);

    which does not use the iterator mechanism (though in practice, HEAD
    should never peel to anything, so this may not be triggerable).

  - it must take a fully-qualified refname for the read_ref_full() code
    path to work. Yet we routinely pass it partial refnames from
    callbacks to for_each_tag_ref(), etc. This happens to work when
    iterating because there we do not call read_ref_full() at all, and
    only use the passed refname to check if it is the same as the
    iterator. But the requirements for the function parameters are quite
    unclear.

Instead of taking a refname, let's instead take an oid. That fixes both
problems. It's a little funny for a "ref" function not to involve refs
at all. The key thing is that it's optimizing under the hood based on
having access to the ref iterator. So let's change the name to make it
clear why you'd want this function versus just peel_object().

There are two other directions I considered but rejected:

  - we could pass the peel information into the each_ref_fn callback.
    However, we don't know if the caller actually wants it or not. For
    packed-refs, providing it is essentially free. But for loose refs,
    we actually have to peel the object, which would be wasteful in most
    cases. We could likewise pass in a flag to the callback indicating
    whether the peeled information is known, but that complicates those
    callbacks, as they then have to decide whether to manually peel
    themselves. Plus it requires changing the interface of every
    callback, whether they care about peeling or not, and there are many
    of them.

  - we could make a function to return the peeled value of the current
    iterated ref (computing it if necessary), and BUG() otherwise. I.e.:

      int peel_current_iterated_ref(struct object_id *out);

    Each of the current callers is an each_ref_fn callback, so they'd
    mostly be happy. But:

      - we use those callbacks with functions like head_ref(), which do
        not use the iteration code. So we'd need to handle the fallback
        case there, anyway.

      - it's possible that a caller would want to call into generic code
        that sometimes is used during iteration and sometimes not. This
        encapsulates the logic to do the fast thing when possible, and
        fallback when necessary.

The implementation is mostly obvious, but I want to call out a few
things in the patch:

  - the test-tool coverage for peel_ref() is now meaningless, as it all
    collapses to a single peel_object() call (arguably they were pretty
    uninteresting before; the tricky part of that function is the
    fast-path we see during iteration, but these calls didn't trigger
    that). I've just dropped it entirely, though note that some other
    tests relied on the tags we created; I've moved that creation to the
    tests where it matters.

  - we no longer need to take a ref_store parameter, since we'd never
    look up a ref now. We do still rely on a global "current iterator"
    variable which _could_ be kept per-ref-store. But in practice this
    is only useful if there are multiple recursive iterations, at which
    point the more appropriate solution is probably a stack of
    iterators. No caller used the actual ref-store parameter anyway
    (they all call the wrapper that passes the_repository).

  - the original only kicked in the optimization when the "refname"
    pointer matched (i.e., not string comparison). We do likewise with
    the "oid" parameter here, but fall back to doing an actual oideq()
    call. This in theory lets us kick in the optimization more often,
    though in practice no current caller cares. It should never be
    wrong, though (peeling is a property of an object, so two refs
    pointing to the same object would peel identically).

  - the original took care not to touch the peeled out-parameter unless
    we found something to put in it. But no caller cares about this, and
    anyway, it is enforced by peel_object() itself (and even in the
    optimized iterator case, that's where we eventually end up). We can
    shorten the code and avoid an extra copy by just passing the
    out-parameter through the stack.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21 15:51:31 -08:00
Derrick Stolee 96eaffebbf maintenance: set log.excludeDecoration durin prefetch
The 'prefetch' task fetches refs from all remotes and places them in the
refs/prefetch/<remote>/ refspace. As this task is intended to run in the
background, this allows users to keep their local data very close to the
remote servers' data while not updating the users' understanding of the
remote refs in refs/remotes/<remote>/.

However, this can clutter 'git log' decorations with copies of the refs
with the full name 'refs/prefetch/<remote>/<branch>'.

The log.excludeDecoration config option was added in a6be5e67 (log: add
log.excludeDecoration config option, 2020-05-16) for exactly this
purpose.

Ensure we set this only for users that would benefit from it by
assigning it at the beginning of the prefetch task. Other alternatives
would be during 'git maintenance register' or 'git maintenance start',
but those might assign the config even when the prefetch task is
disabled by existing config. Further, users could run 'git maintenance
run --task=prefetch' using their own scripting or scheduling. This
provides the best coverage to automatically update the config when
valuable.

It is improbable, but possible, that users might want to run the
prefetch task _and_ see these refs in their log decorations. This seems
incredibly unlikely to me, but users can always opt-in on a
command-by-command basis using --decorate-refs=refs/prefetch/.

Test that this works in a few cases. In particular, ensure that our
assignment of log.excludeDecoration=refs/prefetch/ is additive to other
existing exclusions. Further, ensure we do not add multiple copies in
multiple runs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20 18:46:22 -08:00
Junio C Hamano b2ace18759 Merge branch 'ds/maintenance-part-4'
Follow-up on the "maintenance part-3" which introduced scheduled
maintenance tasks to support platforms whose native scheduling
methods are not 'cron'.

* ds/maintenance-part-4:
  maintenance: use Windows scheduled tasks
  maintenance: use launchctl on macOS
  maintenance: include 'cron' details in docs
  maintenance: extract platform-specific scheduling
2021-01-15 21:48:45 -08:00
Taylor Blau 2891b434ac builtin/gc.c: guess the size of the revindex
'estimate_repack_memory()' takes into account the amount of memory
required to load the reverse index in memory by multiplying the assumed
number of objects by the size of the 'revindex_entry' struct.

Prepare for hiding the definition of 'struct revindex_entry' by removing
a 'sizeof()' of that type from outside of pack-revindex.c. Instead,
guess that one off_t and one uint32_t are required per object. Strictly
speaking, this is a worse guess than asking for 'sizeof(struct
revindex_entry)' directly, since the true size of this struct is 16
bytes with padding on the end of the struct in order to align the offset
field.

But, this is an approximation anyway, and it does remove a use of the
'struct revindex_entry' from outside of pack-revindex internals.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13 21:53:47 -08:00
Martin Ågren d4a4976648 builtin/gc: don't peek into struct lock_file
A `struct lock_file` is pretty much just a wrapper around a tempfile.
But it's easy enough to avoid relying on this. Use the wrappers that the
lock file API provides rather than peeking at the temp file or even into
*its* internals.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-06 13:53:32 -08:00
Derrick Stolee 3797a0a7b7 maintenance: use Windows scheduled tasks
Git's background maintenance uses cron by default, but this is not
available on Windows. Instead, integrate with Task Scheduler.

Tasks can be scheduled using the 'schtasks' command. There are several
command-line options that can allow for some advanced scheduling, but
unfortunately these seem to all require authenticating using a password.

Instead, use the "/xml" option to pass an XML file that contains the
configuration for the necessary schedule. These XML files are based on
some that I exported after constructing a schedule in the Task Scheduler
GUI. These options only run background maintenance when the user is
logged in, and more fields are populated with the current username and
SID at run-time by 'schtasks'.

Since the GIT_TEST_MAINT_SCHEDULER environment variable allows us to
specify 'schtasks' as the scheduler, we can test the Windows-specific
logic on other platforms. Thus, add a check that the XML file written
by Git is valid when xmllint exists on the system.

Since we use a temporary file for the XML files sent to 'schtasks', we
prefix the random characters with the frequency so it is easier to
examine the proper file during tests. Instead of an exact match on the
'args' file, we 'grep' for the arguments other than the filename.

There is a deficiency in the current design. Windows has two kinds of
applications: GUI applications that start by "winmain()" and console
applications that start by "main()". Console applications are attached
to a new Console window if they are not already associated with a GUI
application. This means that every hour the scheudled task launches a
command window for the scheduled tasks. Not only is this visually
obtrusive, but it also takes focus from whatever else the user is
doing!

A simple fix would be to insert a GUI application that acts as a shim
between the scheduled task and Git. This is currently possible in Git
for Windows by setting the <Command> tag equal to

  C:\Program Files\Git\git-bash.exe

with options "--hide --no-needs-console --command=cmd\git.exe"
followed by the arguments currently used. Since git-bash.exe is not
included in Windows builds of core Git, I chose to leave out this
feature. My plan is to submit a small patch to Git for Windows that
converts the use of git.exe with this use of git-bash.exe in the
short term. In the long term, we can consider creating this GUI
shim application within core Git, perhaps in contrib/.

Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-05 14:38:02 -08:00
Derrick Stolee 2afe7e3567 maintenance: use launchctl on macOS
The existing mechanism for scheduling background maintenance is done
through cron. The 'crontab -e' command allows updating the schedule
while cron itself runs those commands. While this is technically
supported by macOS, it has some significant deficiencies:

1. Every run of 'crontab -e' must request elevated privileges through
   the user interface. When running 'git maintenance start' from the
   Terminal app, it presents a dialog box saying "Terminal.app would
   like to administer your computer. Administration can include
   modifying passwords, networking, and system settings." This is more
   alarming than what we are hoping to achieve. If this alert had some
   information about how "git" is trying to run "crontab" then we would
   have some reason to believe that this dialog might be fine. However,
   it also doesn't help that some scenarios just leave Git waiting for
   a response without presenting anything to the user. I experienced
   this when executing the command from a Bash terminal view inside
   Visual Studio Code.

2. While cron initializes a user environment enough for "git config
   --global --show-origin" to show the correct config file information,
   it does not set up the environment enough for Git Credential Manager
   Core to load credentials during a 'prefetch' task. My prefetches
   against private repositories required re-authenticating through UI
   pop-ups in a way that should not be required.

The solution is to switch from cron to the Apple-recommended [1]
'launchd' tool.

[1] https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/ScheduledJobs.html

The basics of this tool is that we need to create XML-formatted
"plist" files inside "~/Library/LaunchAgents/" and then use the
'launchctl' tool to make launchd aware of them. The plist files
include all of the scheduling information, along with the command-line
arguments split across an array of <string> tags.

For example, here is my plist file for the weekly scheduled tasks:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"><dict>
<key>Label</key><string>org.git-scm.git.weekly</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/libexec/git-core/git</string>
<string>--exec-path=/usr/local/libexec/git-core</string>
<string>for-each-repo</string>
<string>--config=maintenance.repo</string>
<string>maintenance</string>
<string>run</string>
<string>--schedule=weekly</string>
</array>
<key>StartCalendarInterval</key>
<array>
<dict>
<key>Day</key><integer>0</integer>
<key>Hour</key><integer>0</integer>
<key>Minute</key><integer>0</integer>
</dict>
</array>
</dict>
</plist>

The schedules for the daily and hourly tasks are more complicated
since we need to use an array for the StartCalendarInterval with
an entry for each of the six days other than the 0th day (to avoid
colliding with the weekly task), and each of the 23 hours other
than the 0th hour (to avoid colliding with the daily task).

The "Label" value is currently filled with "org.git-scm.git.X"
where X is the frequency. We need a different plist file for each
frequency.

The launchctl command needs to be aligned with a user id in order
to initialize the command environment. This must be done using
the 'launchctl bootstrap' subcommand. This subcommand is new as
of macOS 10.11, which was released in September 2015. Before that
release the 'launchctl load' subcommand was recommended. The best
source of information on this transition I have seen is available
at [2]. The current design does not preclude a future version that
detects the available fatures of 'launchctl' to use the older
commands. However, it is best to rely on the newest version since
Apple might completely remove the deprecated version on short
notice.

[2] https://babodee.wordpress.com/2016/04/09/launchctl-2-0-syntax/

To remove a schedule, we must run 'launchctl bootout' with a valid
plist file. We also need to 'bootout' a task before the 'bootstrap'
subcommand will succeed, if such a task already exists.

The need for a user id requires us to run 'id -u' which works on
POSIX systems but not Windows. Further, the need for fully-qualitifed
path names including $HOME behaves differently in the Git internals and
the external test suite. The $HOME variable starts with "C:\..." instead
of the "/c/..." that is provided by Git in these subcommands. The test
therefore has a prerequisite that we are not on Windows. The cross-
platform logic still allows us to test the macOS logic on a Linux
machine.

We can verify the commands that were run by 'git maintenance start'
and 'git maintenance stop' by injecting a script that writes the
command-line arguments into GIT_TEST_MAINT_SCHEDULER.

An earlier version of this patch accidentally had an opening
"<dict>" tag when it should have had a closing "</dict>" tag. This
was caught during manual testing with actual 'launchctl' commands,
but we do not want to update developers' tasks when running tests.
It appears that macOS includes the "xmllint" tool which can verify
the XML format. This is useful for any system that might contain
the tool, so use it whenever it is available.

We strive to make these tests work on all platforms, but Windows caused
some headaches. In particular, the value of getuid() called by the C
code is not guaranteed to be the same as `$(id -u)` invoked by a test.
This is because `git.exe` is a native Windows program, whereas the
utility programs run by the test script mostly utilize the MSYS2 runtime,
which emulates a POSIX-like environment. Since the purpose of the test
is to check that the input to the hook is well-formed, the actual user
ID is immaterial, thus we can work around the problem by making the the
test UID-agnostic. Another subtle issue is the $HOME environment
variable being a Windows-style path instead of a Unix-style path. We can
be more flexible here instead of expecting exact path matches.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-05 14:38:02 -08:00
Martin Ågren 66dc0a3625 gc: fix handling of crontab magic markers
On `git maintenance start`, we add a few entries to the user's cron
table. We wrap our entries using two magic markers, "# BEGIN GIT
MAINTENANCE SCHEDULE" and "# END GIT MAINTENANCE SCHEDULE". At a later
`git maintenance stop`, we will go through the table and remove these
lines. Or rather, we will remove the "BEGIN" marker, the "END" marker
and everything between them.

Alas, we have a bug in how we detect the "END" marker: we don't. As we
loop through all the lines of the crontab, if we are in the "old
region", i.e., the region we're aiming to remove, we make an early
`continue` and don't get as far as checking for the "END" marker. Thus,
once we've seen our "BEGIN", we remove everything until the end of the
file.

Rewrite the logic for identifying these markers. There are four cases
that are mutually exclusive: The current line starts a region or it ends
it, or it's firmly within the region, or it's outside of it (and should
be printed).

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21 14:33:08 -08:00
Junio C Hamano f2a75cb312 Merge branch 'rs/maintenance-run-outside-repo'
"git maintenance run/start/stop" needed to be run in a repository
to hold the lockfile they use, but didn't make sure they are
actually in a repository, which has been corrected.

* rs/maintenance-run-outside-repo:
  t7900: fix typo: "test_execpt_success"
  maintenance: fix SEGFAULT when no repository
2020-12-08 15:11:21 -08:00
Junio C Hamano d702cb9e89 Merge branch 'ds/maintenance-part-3'
"git maintenance" command had trouble working in a directory whose
pathname contained an ERE metacharacter like '+'.

* ds/maintenance-part-3:
  maintenance: use 'git config --fixed-value'
2020-12-08 15:11:19 -08:00
Junio C Hamano 1c04cdd424 Merge branch 'ab/gc-keep-base-option'
Fix an option name in "gc" documentation.

* ab/gc-keep-base-option:
  gc: rename keep_base_pack variable for --keep-largest-pack
  gc docs: change --keep-base-pack to --keep-largest-pack
2020-11-30 14:49:43 -08:00
Rafael Silva e72f7defc4 maintenance: fix SEGFAULT when no repository
The "git maintenance run" and "git maintenance start/stop" commands
holds a file-based lock at the .git/maintenance.lock and
.git/schedule.lock respectively. These locks are used to ensure only
one maintenance process is executed at the time as both operations
involves writing data into the git repository.

The path to the lock file is built using
"the_repository->objects->odb->path" that results in SEGFAULT when we
have no repository available as "the_repository->objects->odb" is
set to NULL.

Let's teach maintenance command to use RUN_SETUP option that will
provide the validation and fail when running outside of a repository.
Hence fixing the SEGFAULT for all three operations and making the
behaviour consistent across all subcommands.

Setting the RUN_SETUP also provides the same protection for all
subcommands given that the "register" and "unregister" also requires to
be executed inside a repository.

Furthermore let's remove the local validation implemented by the
"register" and "unregister" as this will not be required anymore with
the new option.

Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-30 13:44:15 -08:00
Derrick Stolee 483a6d9b5d maintenance: use 'git config --fixed-value'
When a repository's leading directories contain regex metacharacters,
the config calls for 'git maintenance register' and 'git maintenance
unregister' are not careful enough. Use the new --fixed-value option
to direct the config machinery to use exact string matches. This is a
more robust option than escaping these arguments in a piecemeal fashion.

For the test, require that we are not running on Windows since the '+'
and '*' characters are not allowed on that filesystem.

Reported-by: Emily Shaffer <emilyshaffer@google.com>
Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 15:04:55 -08:00
Derrick Stolee 31345d5545 maintenance: extract platform-specific scheduling
The existing schedule mechanism using 'cron' is supported by POSIX
platforms, but not Windows. It also works slightly differently on
macOS to significant detriment of the user experience. To allow for
new implementations on these platforms, extract a method that
performs the platform-specific scheduling mechanism. This will be
swapped at compile time with new implementations on specialized
platforms.

As we add this generality, rename GIT_TEST_CRONTAB to
GIT_TEST_MAINT_SCHEDULER. Further, this variable is now parsed as
"<scheduler>:<command>" so we can test platform-specific scheduling
logic even when not on the correct platform. By specifying the
<scheduler> in this string, we will be able to test all three sets of
Git logic from a Linux machine.

Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-24 13:02:29 -08:00
Ævar Arnfjörð Bjarmason 793c1464d3 gc: rename keep_base_pack variable for --keep-largest-pack
As noted in an earlier change the keep_base_pack variable name is a
relic from an earlier on-list version of ae4e89e549 ("gc: add
--keep-largest-pack option", 2018-04-15) before it was renamed to
--keep-largest-pack.

Let's change the variable name to avoid that confusion, it's easier to
read the code if there's a 1=1 mapping between the variable name and
option name.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-21 11:39:59 -08:00
René Scharfe a1c74791d5 gc: fix cast in compare_tasks_by_selection()
compare_tasks_by_selection() is used with QSORT and gets passed pointers
to the elements of "static struct maintenance_task tasks[]".  It casts
the *addresses* of these passed pointers to element pointers, though,
and thus effectively compares some unrelated values from the stack.  Fix
the casts to actually compare array elements.

Detected by USan (make SANITIZE=undefined test).

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-18 14:15:58 -08:00
Junio C Hamano 7660da1618 Merge branch 'ds/maintenance-part-3'
Parts of "git maintenance" to ease writing crontab entries (and
other scheduling system configuration) for it.

* ds/maintenance-part-3:
  maintenance: add troubleshooting guide to docs
  maintenance: use 'incremental' strategy by default
  maintenance: create maintenance.strategy config
  maintenance: add start/stop subcommands
  maintenance: add [un]register subcommands
  for-each-repo: run subcommands on configured repos
  maintenance: add --schedule option and config
  maintenance: optionally skip --auto process
2020-11-18 13:32:53 -08:00
Junio C Hamano 902f358555 Merge branch 'rs/clear-commit-marks-in-repo'
Code clean-up.

* rs/clear-commit-marks-in-repo:
  bisect: clear flags in passed repository
  object: allow clear_commit_marks_all to handle any repo
2020-11-11 13:18:37 -08:00
Junio C Hamano 0be2d65132 Merge branch 'ds/maintenance-commit-graph-auto-fix'
Test-coverage enhancement of running commit-graph task "git
maintenance" as needed led to discovery and fix of a bug.

* ds/maintenance-commit-graph-auto-fix:
  maintenance: core.commitGraph=false prevents writes
  maintenance: test commit-graph auto condition
2020-11-02 13:17:39 -08:00
René Scharfe cd8888452c object: allow clear_commit_marks_all to handle any repo
Allow callers to specify the repository to use.  Rename the function to
repo_clear_commit_marks to document its new scope.  No functional change
intended.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 10:46:34 -07:00
Derrick Stolee 61f7a383d3 maintenance: use 'incremental' strategy by default
The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.

To make it simple for users to start background maintenance with a
recommended schedlue, update the 'maintenance.strategy' config option in
both the 'register' and 'start' subcommands. This allows users to
customize beyond the defaults using individual
'maintenance.<task>.schedule' options, but also the user can opt-out of
this strategy using 'maintenance.strategy=none'.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 08:36:42 -07:00
Derrick Stolee a4cb1a2339 maintenance: create maintenance.strategy config
To provide an on-ramp for users to use background maintenance without
several 'git config' commands, create a 'maintenance.strategy' config
option. Currently, the only important value is 'incremental' which
assigns the following schedule:

* gc: never
* prefetch: hourly
* commit-graph: hourly
* loose-objects: daily
* incremental-repack: daily

These tasks are chosen to minimize disruptions to foreground Git
commands and use few compute resources.

The 'maintenance.strategy' is intended as a baseline that can be
customzied further by manually assigning 'maintenance.<task>.enabled'
and 'maintenance.<task>.schedule' config options, which will override
any recommendation from 'maintenance.strategy'. This operates similarly
to config options like 'feature.experimental' which operate as "meta"
config options that change default config values.

This presents a way forward for updating the 'incremental' strategy in
the future or adding new strategies. For example, a potential strategy
could be to include a 'full' strategy that runs the 'gc' task weekly
and no other tasks by default.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 08:36:42 -07:00
Derrick Stolee d334107c5d maintenance: core.commitGraph=false prevents writes
Recently, a user had an issue due to combining
fetch.writeCommitGraph=true with core.commitGraph=false. The root bug
has been resolved by preventing commit-graph writes when
core.commitGraph is disabled. This happens inside the 'git commit-graph
write' command, but we can be more aware of this situation and prevent
that process from ever starting in the 'commit-graph' maintenance task.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12 12:13:21 -07:00
Derrick Stolee 8f801804be maintenance: test commit-graph auto condition
The auto condition for the commit-graph maintenance task walks refs
looking for commits that are not in the commit-graph file. This was
added in 4ddc79b2 (maintenance: add auto condition for commit-graph
task, 2020-09-17) but was left untested.

The initial goal of this change was to demonstrate the feature works
properly by adding tests. However, there was an off-by-one error that
caused the basic tests around maintenance.commit-graph.auto=1 to fail
when it should work.

The subtlety is that if a ref tip is not in the commit-graph, then we
were not adding that to the total count. In the test, we see that we
have only added one commit since our last commit-graph write, so the
auto condition would say there is nothing to do.

The fix is simple: add the check for the commit-graph position to see
that the tip is not in the commit-graph file before starting our walk.
Since this happens before adding to the DFS stack, we do not need to
clear our (currently empty) commit list.

This does add some extra complexity for the test, because we also want
to verify that the walk along the parents actually does some work. This
means we need to add at least two commits in a row without writing the
commit-graph. However, we also need to make sure no additional refs are
pointing to the middle of this list or else the for_each_ref() in
should_write_commit_graph() might visit these commits as tips instead of
doing a DFS walk. Hence, the last two commits are added with "git
commit" instead of "test_commit".

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 10:24:40 -07:00
Derrick Stolee 2fec604f8d maintenance: add start/stop subcommands
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.

The schedule is laid out as follows:

  0 1-23 * * *   $cmd maintenance run --schedule=hourly
  0 0    * * 1-6 $cmd maintenance run --schedule=daily
  0 0    * * 0   $cmd maintenance run --schedule=weekly

where $cmd is a properly-qualified 'git for-each-repo' execution:

$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo

where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.

This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.

The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee 0c18b70081 maintenance: add [un]register subcommands
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.

These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.

For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.

The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee b08ff1fee0 maintenance: add --schedule option and config
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.

The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:

	'hourly' > 'daily' > 'weekly'

Use new 'enum schedule_priority' to track these values numerically.

The following cron table would run the scheduled tasks with the correct
frequencies:

  0 1-23 * * *    git -C <repo> maintenance run --schedule=hourly
  0 0    * * 1-6  git -C <repo> maintenance run --schedule=daily
  0 0    * * 0    git -C <repo> maintenance run --schedule=weekly

This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee e841a79a13 maintenance: add incremental-repack auto condition
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.

The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.

Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:05 -07:00
Derrick Stolee a13e3d0ec8 maintenance: auto-size incremental-repack batch
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:05 -07:00
Derrick Stolee 52fe41ff1c maintenance: add incremental-repack task
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.

One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.

Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).

The 'incremental-repack' task runs the following steps:

1. 'git multi-pack-index write' creates a multi-pack-index file if
   one did not exist, and otherwise will update the multi-pack-index
   with any new pack-files that appeared since the last write. This
   is particularly relevant with the background fetch job.

   When the multi-pack-index sees two copies of the same object, it
   stores the offset data into the newer pack-file. This means that
   some old pack-files could become "unreferenced" which I will use
   to mean "a pack-file that is in the pack-file list of the
   multi-pack-index but none of the objects in the multi-pack-index
   reference a location inside that pack-file."

2. 'git multi-pack-index expire' deletes any unreferenced pack-files
   and updaes the multi-pack-index to drop those pack-files from the
   list. This is safe to do as concurrent Git processes will see the
   multi-pack-index and not open those packs when looking for object
   contents. (Similar to the 'loose-objects' job, there are some Git
   commands that open pack-files regardless of the multi-pack-index,
   but they are rarely used. Further, a user that self-selects to
   use background operations would likely refrain from using those
   commands.)

3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
   of pack-files that are listed in the multi-pack-index and creates
   a new pack-file containing the objects whose offsets are listed
   by the multi-pack-index to be in those objects. The set of pack-
   files is selected greedily by sorting the pack-files by modified
   time and adding a pack-file to the set if its "expected size" is
   smaller than the batch size until the total expected size of the
   selected pack-files is at least the batch size. The "expected
   size" is calculated by taking the size of the pack-file divided
   by the number of objects in the pack-file and multiplied by the
   number of objects from the multi-pack-index with offset in that
   pack-file. The expected size approximates how much data from that
   pack-file will contribute to the resulting pack-file size. The
   intention is that the resulting pack-file will be close in size
   to the provided batch size.

   The next run of the incremental-repack task will delete these
   repacked pack-files during the 'expire' step.

   In this version, the batch size is set to "0" which ignores the
   size restrictions when selecting the pack-files. It instead
   selects all pack-files and repacks all packed objects into a
   single pack-file. This will be updated in the next change, but
   it requires doing some calculations that are better isolated to
   a separate change.

These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).

These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.

By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 3e220e6069 maintenance: create auto condition for loose-objects
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.

The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 252cfb7cb8 maintenance: add loose-objects task
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.

Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:

1. Run 'git prune-packed' to delete any loose objects that exist
   in a pack-file. Concurrent commands will prefer the packed
   version of the object to the loose version. (Of course, there
   are exceptions for commands that specifically care about the
   location of an object. These are rare for a user to run on
   purpose, and we hope a user that has selected background
   maintenance will not be trying to do foreground maintenance.)

2. Run 'git pack-objects' on a batch of loose objects. These
   objects are grouped by scanning the loose object directories in
   lexicographic order until listing all loose objects -or-
   reaching 50,000 objects. This is more than enough if the loose
   objects are created only by a user doing normal development.
   We noticed users with _millions_ of loose objects because VFS
   for Git downloads blobs on-demand when a file read operation
   requires populating a virtual file.

This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 28cb5e66dd maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.

Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.

The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.

However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.

When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:

1. --no-tags prevents getting a new tag when a user wants to see
   the new tags appear in their foreground fetches.

2. --refmap= removes the configured refspec which usually updates
   refs/remotes/<remote>/* with the refs advertised by the remote.
   While this looks confusing, this was documented and tested by
   b40a50264a (fetch: document and test --refmap="", 2020-01-21),
   including this sentence in the documentation:

	Providing an empty `<refspec>` to the `--refmap` option
	causes Git to ignore the configured refspecs and rely
	entirely on the refspecs supplied as command-line arguments.

3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
   we can ensure that we actually load the new values somewhere in
   our refspace while not updating refs/heads or refs/remotes. By
   storing these refs here, the commit-graph job will update the
   commit-graph with the commits from these hidden refs.

4. --prune will delete the refs/prefetch/<remote> refs that no
   longer appear on the remote.

5. --no-write-fetch-head prevents updating FETCH_HEAD.

We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 25914c4fde maintenance: add trace2 regions for task execution
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 4ddc79b2da maintenance: add auto condition for commit-graph task
Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 916d0626c2 maintenance: use pointers to check --auto
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 65d655b52d maintenance: create maintenance.<task>.enabled config
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee d7514f6ed5 maintenance: take a lock on the objects directory
Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 090511bc0b maintenance: add --task option
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 663b2b1b90 maintenance: add commit-graph task
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00