scalar: convert README.md into a technical design doc

Adapt the content from 'contrib/scalar/README.md' into a design document in
'Documentation/technical/'. In addition to reformatting for asciidoc,
elaborate on the background, purpose, and design choices that went into
Scalar.

Most of this document will persist in the 'Documentation/technical/' after
Scalar has been moved out of 'contrib/' and into the root of Git. Until that
time, it will also contain a temporary "Roadmap" section detailing the
remaining series needed to finish the initial version of Scalar. The section
will be removed once Scalar is moved to the repo root, but in the meantime
serves as a guide for readers to keep up with progress on the feature.

Signed-off-by: Victoria Dye <vdye@github.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Victoria Dye 2022-07-12 00:06:07 +00:00 committed by Junio C Hamano
parent f22c95db53
commit 72d3a5da32
2 changed files with 127 additions and 82 deletions

View file

@ -0,0 +1,127 @@
Scalar
======
Scalar is a repository management tool that optimizes Git for use in large
repositories. It accomplishes this by helping users to take advantage of
advanced performance features in Git. Unlike most other Git built-in commands,
Scalar is not executed as a subcommand of 'git'; rather, it is built as a
separate executable containing its own series of subcommands.
Background
----------
Scalar was originally designed as an add-on to Git and implemented as a .NET
Core application. It was created based on the learnings from the VFS for Git
project (another application aimed at improving the experience of working with
large repositories). As part of its initial implementation, Scalar relied on
custom features in the Microsoft fork of Git that have since been integrated
into core Git:
* partial clone,
* commit graphs,
* multi-pack index,
* sparse checkout (cone mode),
* scheduled background maintenance,
* etc
With the requisite Git functionality in place and a desire to bring the benefits
of Scalar to the larger Git community, the Scalar application itself was ported
from C# to C and integrated upstream.
Features
--------
Scalar is comprised of two major pieces of functionality: automatically
configuring built-in Git performance features and managing repository
enlistments.
The Git performance features configured by Scalar (see "Background" for
examples) confer substantial performance benefits to large repositories, but are
either too experimental to enable for all of Git yet, or only benefit large
repositories. As new features are introduced, Scalar should be updated
accordingly to incorporate them. This will prevent the tool from becoming stale
while also providing a path for more easily bringing features to the appropriate
users.
Enlistments are how Scalar knows which repositories on a user's system should
utilize Scalar-configured features. This allows it to update performance
settings when new ones are added to the tool, as well as centrally manage
repository maintenance. The enlistment structure - a root directory with a
`src/` subdirectory containing the cloned repository itself - is designed to
encourage users to route build outputs outside of the repository to avoid the
performance-limiting overhead of ignoring those files in Git.
Design
------
Scalar is implemented in C and interacts with Git via a mix of child process
invocations of Git and direct usage of `libgit.a`. Internally, it is structured
much like other built-ins with subcommands (e.g., `git stash`), containing a
`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()`
function. Most options are unique to each subcommand, with `scalar` respecting
some "global" `git` options (e.g., `-c` and `-C`).
Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is
built and installed as its own executable in the `bin/` directory, alongside
`git`, `git-gui`, etc.
Roadmap
-------
NOTE: this section will be removed once the remaining tasks outlined in this
roadmap are complete.
Scalar is a large enough project that it is being upstreamed incrementally,
living in `contrib/` until it is feature-complete. So far, the following patch
series have been accepted:
- `scalar-the-beginning`: The initial patch series which sets up
`contrib/scalar/` and populates it with a minimal `scalar` command that
demonstrates the fundamental ideas.
- `scalar-c-and-C`: The `scalar` command learns about two options that can be
specified before the command, `-c <key>=<value>` and `-C <directory>`.
- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
Roughly speaking (and subject to change), the following series are needed to
"finish" this initial version of Scalar:
- Finish Scalar features: Enable the built-in FSMonitor in Scalar enlistments
and implement `scalar help`. At the end of this series, Scalar should be
feature-complete from the perspective of a user.
- Generalize features not specific to Scalar: In the spirit of making Scalar
configure only what is needed for large repo performance, move common
utilities into other parts of Git. Some of this will be internal-only, but one
major change will be generalizing `scalar diagnose` for use with any Git
repository.
- Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
`git`, including updates to build and install it with the rest of Git. This
change will incorporate Scalar into the Git CI and test framework, as well as
expand regression and performance testing to ensure the tool is stable.
Finally, there are two additional patch series that exist in Microsoft's fork of
Git, but there is no current plan to upstream them. There are some interesting
ideas there, but the implementation is too specific to Azure Repos and/or VFS
for Git to be of much help in general.
These still exist mainly because the GVFS protocol is what Azure Repos has
instead of partial clone, while Git is focused on improving partial clone:
- `scalar-with-gvfs`: The primary purpose of this patch series is to support
existing Scalar users whose repositories are hosted in Azure Repos (which does
not support Git's partial clones, but supports its predecessor, the GVFS
protocol, which is used by Scalar to emulate the partial clone).
Since the GVFS protocol will never be supported by core Git, this patch series
will remain in Microsoft's fork of Git.
- `run-scalar-functional-tests`: The Scalar project developed a quite
comprehensive set of integration tests (or, "Functional Tests"). They are the
sole remaining part of the original C#-based Scalar project, and this patch
adds a GitHub workflow that runs them all.
Since the tests partially depend on features that are only provided in the
`scalar-with-gvfs` patch series, this patch cannot be upstreamed.

View file

@ -1,82 +0,0 @@
# Scalar - an opinionated repository management tool
Scalar is an add-on to Git that helps users take advantage of advanced
performance features in Git. Originally implemented in C# using .NET Core,
based on the learnings from the VFS for Git project, most of the techniques
developed by the Scalar project have been integrated into core Git already:
* partial clone,
* commit graphs,
* multi-pack index,
* sparse checkout (cone mode),
* scheduled background maintenance,
* etc
This directory contains the remaining parts of Scalar that are not (yet) in
core Git.
## Roadmap
The idea is to populate this directory via incremental patch series and
eventually move to a top-level directory next to `gitk-git/` and to `git-gui/`. The
current plan involves the following patch series:
- `scalar-the-beginning`: The initial patch series which sets up
`contrib/scalar/` and populates it with a minimal `scalar` command that
demonstrates the fundamental ideas.
- `scalar-c-and-C`: The `scalar` command learns about two options that can be
specified before the command, `-c <key>=<value>` and `-C <directory>`.
- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
- `scalar-and-builtin-fsmonitor`: The built-in FSMonitor is enabled in `scalar
register` and in `scalar clone`, for an enormous performance boost when
working in large worktrees. This patch series necessarily depends on Jeff
Hostetler's FSMonitor patch series to be integrated into Git.
- `scalar-gentler-config-locking`: Scalar enlistments are registered in the
user's Git config. This usually does not represent any problem because it is
rare for a user to register an enlistment. However, in Scalar's functional
tests, Scalar enlistments are created galore, and in parallel, which can lead
to lock contention. This patch series works around that problem by re-trying
to lock the config file in a gentle fashion.
- `scalar-extra-docs`: Add some extensive documentation that has been written
in the original Scalar project (all subject to discussion, of course).
- `optionally-install-scalar`: Now that Scalar is feature (and documentation)
complete and is verified in CI builds, let's offer to install it.
- `move-scalar-to-toplevel`: Now that Scalar is complete, let's move it next to
`gitk-git/` and to `git-gui/`, making it a top-level command.
The following two patch series exist in Microsoft's fork of Git and are
publicly available. There is no current plan to upstream them, not because I
want to withhold these patches, but because I don't think the Git community is
interested in these patches.
There are some interesting ideas there, but the implementation is too specific
to Azure Repos and/or VFS for Git to be of much help in general (and also: my
colleagues tried to upstream some patches already and the enthusiasm for
integrating things related to Azure Repos and VFS for Git can be summarized in
very, very few words).
These still exist mainly because the GVFS protocol is what Azure Repos has
instead of partial clone, while Git is focused on improving partial clone:
- `scalar-with-gvfs`: The primary purpose of this patch series is to support
existing Scalar users whose repositories are hosted in Azure Repos (which
does not support Git's partial clones, but supports its predecessor, the GVFS
protocol, which is used by Scalar to emulate the partial clone).
Since the GVFS protocol will never be supported by core Git, this patch
series will remain in Microsoft's fork of Git.
- `run-scalar-functional-tests`: The Scalar project developed a quite
comprehensive set of integration tests (or, "Functional Tests"). They are the
sole remaining part of the original C#-based Scalar project, and this patch
adds a GitHub workflow that runs them all.
Since the tests partially depend on features that are only provided in the
`scalar-with-gvfs` patch series, this patch cannot be upstreamed.