aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/technical
diff options
context:
space:
mode:
authorVictoria Dye <vdye@github.com>2022-07-12 00:06:07 +0000
committerJunio C Hamano <gitster@pobox.com>2022-07-18 11:03:56 -0700
commit72d3a5da32a095abe1fdb7534a6c7e0451ae3021 (patch)
tree97ea1a9e6dd8e068ba068f420103fe727887e627 /Documentation/technical
parentf22c95db53fd77087fd85ebdf76fd57daae727d8 (diff)
downloadgit-72d3a5da32a095abe1fdb7534a6c7e0451ae3021.tar.gz
scalar: convert README.md into a technical design doc
Adapt the content from 'contrib/scalar/README.md' into a design document in 'Documentation/technical/'. In addition to reformatting for asciidoc, elaborate on the background, purpose, and design choices that went into Scalar. Most of this document will persist in the 'Documentation/technical/' after Scalar has been moved out of 'contrib/' and into the root of Git. Until that time, it will also contain a temporary "Roadmap" section detailing the remaining series needed to finish the initial version of Scalar. The section will be removed once Scalar is moved to the repo root, but in the meantime serves as a guide for readers to keep up with progress on the feature. Signed-off-by: Victoria Dye <vdye@github.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/scalar.txt127
1 files changed, 127 insertions, 0 deletions
diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt
new file mode 100644
index 0000000000..08bc09c225
--- /dev/null
+++ b/Documentation/technical/scalar.txt
@@ -0,0 +1,127 @@
+Scalar
+======
+
+Scalar is a repository management tool that optimizes Git for use in large
+repositories. It accomplishes this by helping users to take advantage of
+advanced performance features in Git. Unlike most other Git built-in commands,
+Scalar is not executed as a subcommand of 'git'; rather, it is built as a
+separate executable containing its own series of subcommands.
+
+Background
+----------
+
+Scalar was originally designed as an add-on to Git and implemented as a .NET
+Core application. It was created based on the learnings from the VFS for Git
+project (another application aimed at improving the experience of working with
+large repositories). As part of its initial implementation, Scalar relied on
+custom features in the Microsoft fork of Git that have since been integrated
+into core Git:
+
+* partial clone,
+* commit graphs,
+* multi-pack index,
+* sparse checkout (cone mode),
+* scheduled background maintenance,
+* etc
+
+With the requisite Git functionality in place and a desire to bring the benefits
+of Scalar to the larger Git community, the Scalar application itself was ported
+from C# to C and integrated upstream.
+
+Features
+--------
+
+Scalar is comprised of two major pieces of functionality: automatically
+configuring built-in Git performance features and managing repository
+enlistments.
+
+The Git performance features configured by Scalar (see "Background" for
+examples) confer substantial performance benefits to large repositories, but are
+either too experimental to enable for all of Git yet, or only benefit large
+repositories. As new features are introduced, Scalar should be updated
+accordingly to incorporate them. This will prevent the tool from becoming stale
+while also providing a path for more easily bringing features to the appropriate
+users.
+
+Enlistments are how Scalar knows which repositories on a user's system should
+utilize Scalar-configured features. This allows it to update performance
+settings when new ones are added to the tool, as well as centrally manage
+repository maintenance. The enlistment structure - a root directory with a
+`src/` subdirectory containing the cloned repository itself - is designed to
+encourage users to route build outputs outside of the repository to avoid the
+performance-limiting overhead of ignoring those files in Git.
+
+Design
+------
+
+Scalar is implemented in C and interacts with Git via a mix of child process
+invocations of Git and direct usage of `libgit.a`. Internally, it is structured
+much like other built-ins with subcommands (e.g., `git stash`), containing a
+`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()`
+function. Most options are unique to each subcommand, with `scalar` respecting
+some "global" `git` options (e.g., `-c` and `-C`).
+
+Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is
+built and installed as its own executable in the `bin/` directory, alongside
+`git`, `git-gui`, etc.
+
+Roadmap
+-------
+
+NOTE: this section will be removed once the remaining tasks outlined in this
+roadmap are complete.
+
+Scalar is a large enough project that it is being upstreamed incrementally,
+living in `contrib/` until it is feature-complete. So far, the following patch
+series have been accepted:
+
+- `scalar-the-beginning`: The initial patch series which sets up
+ `contrib/scalar/` and populates it with a minimal `scalar` command that
+ demonstrates the fundamental ideas.
+
+- `scalar-c-and-C`: The `scalar` command learns about two options that can be
+ specified before the command, `-c <key>=<value>` and `-C <directory>`.
+
+- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
+
+Roughly speaking (and subject to change), the following series are needed to
+"finish" this initial version of Scalar:
+
+- Finish Scalar features: Enable the built-in FSMonitor in Scalar enlistments
+ and implement `scalar help`. At the end of this series, Scalar should be
+ feature-complete from the perspective of a user.
+
+- Generalize features not specific to Scalar: In the spirit of making Scalar
+ configure only what is needed for large repo performance, move common
+ utilities into other parts of Git. Some of this will be internal-only, but one
+ major change will be generalizing `scalar diagnose` for use with any Git
+ repository.
+
+- Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
+ `git`, including updates to build and install it with the rest of Git. This
+ change will incorporate Scalar into the Git CI and test framework, as well as
+ expand regression and performance testing to ensure the tool is stable.
+
+Finally, there are two additional patch series that exist in Microsoft's fork of
+Git, but there is no current plan to upstream them. There are some interesting
+ideas there, but the implementation is too specific to Azure Repos and/or VFS
+for Git to be of much help in general.
+
+These still exist mainly because the GVFS protocol is what Azure Repos has
+instead of partial clone, while Git is focused on improving partial clone:
+
+- `scalar-with-gvfs`: The primary purpose of this patch series is to support
+ existing Scalar users whose repositories are hosted in Azure Repos (which does
+ not support Git's partial clones, but supports its predecessor, the GVFS
+ protocol, which is used by Scalar to emulate the partial clone).
+
+ Since the GVFS protocol will never be supported by core Git, this patch series
+ will remain in Microsoft's fork of Git.
+
+- `run-scalar-functional-tests`: The Scalar project developed a quite
+ comprehensive set of integration tests (or, "Functional Tests"). They are the
+ sole remaining part of the original C#-based Scalar project, and this patch
+ adds a GitHub workflow that runs them all.
+
+ Since the tests partially depend on features that are only provided in the
+ `scalar-with-gvfs` patch series, this patch cannot be upstreamed.