aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/git-maintenance.txt
diff options
context:
space:
mode:
authorDerrick Stolee <dstolee@microsoft.com>2020-09-25 12:33:32 +0000
committerJunio C Hamano <gitster@pobox.com>2020-09-25 10:53:04 -0700
commit252cfb7cb86a33f7740c799748ee6c586931c7bc (patch)
tree448b62e3fa3b920d3da7c4484be34dbc95b56f00 /Documentation/git-maintenance.txt
parent28cb5e66ddaed47aa8789fa326fec0339831b80b (diff)
downloadgit-252cfb7cb86a33f7740c799748ee6c586931c7bc.tar.gz
maintenance: add loose-objects task
One goal of background maintenance jobs is to allow a user to disable auto-gc (gc.auto=0) but keep their repository in a clean state. Without any cleanup, loose objects will clutter the object database and slow operations. In addition, the loose objects will take up extra space because they are not stored with deltas against similar objects. Create a 'loose-objects' task for the 'git maintenance run' command. This helps clean up loose objects without disrupting concurrent Git commands using the following sequence of events: 1. Run 'git prune-packed' to delete any loose objects that exist in a pack-file. Concurrent commands will prefer the packed version of the object to the loose version. (Of course, there are exceptions for commands that specifically care about the location of an object. These are rare for a user to run on purpose, and we hope a user that has selected background maintenance will not be trying to do foreground maintenance.) 2. Run 'git pack-objects' on a batch of loose objects. These objects are grouped by scanning the loose object directories in lexicographic order until listing all loose objects -or- reaching 50,000 objects. This is more than enough if the loose objects are created only by a user doing normal development. We noticed users with _millions_ of loose objects because VFS for Git downloads blobs on-demand when a file read operation requires populating a virtual file. This step is based on a similar step in Scalar [1] and VFS for Git. [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/git-maintenance.txt')
-rw-r--r--Documentation/git-maintenance.txt15
1 files changed, 15 insertions, 0 deletions
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 12668fccf7..fc95eb594f 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -70,6 +70,21 @@ gc::
be disruptive in some situations, as it deletes stale data. See
linkgit:git-gc[1] for more details on garbage collection in Git.
+loose-objects::
+ The `loose-objects` job cleans up loose objects and places them into
+ pack-files. In order to prevent race conditions with concurrent Git
+ commands, it follows a two-step process. First, it deletes any loose
+ objects that already exist in a pack-file; concurrent Git processes
+ will examine the pack-file for the object data instead of the loose
+ object. Second, it creates a new pack-file (starting with "loose-")
+ containing a batch of loose objects. The batch size is limited to 50
+ thousand objects to prevent the job from taking too long on a
+ repository with many loose objects. The `gc` task writes unreachable
+ objects as loose objects to be cleaned up by a later step only if
+ they are not re-added to a pack-file; for this reason it is not
+ advisable to enable both the `loose-objects` and `gc` tasks at the
+ same time.
+
OPTIONS
-------
--auto::