diff options
author | Ævar Arnfjörð Bjarmason <avarab@gmail.com> | 2022-08-04 18:28:40 +0200 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2022-08-04 14:12:24 -0700 |
commit | 6b6029dd1d347ce2c83a43afc275c5641b514ab4 (patch) | |
tree | 919f0fb176d535d9f45e57b0dbdccb412addb66d /Documentation/technical | |
parent | 977c47b46d4d4e5b25afd548c1bd6c108afad632 (diff) | |
download | git-6b6029dd1d347ce2c83a43afc275c5641b514ab4.tar.gz |
docs: move cruft pack docs to gitformat-pack
Integrate the cruft packs documentation initially added in
3d89a8c1180 (Documentation/technical: add cruft-packs.txt, 2022-05-20)
to the newly created "gitformat-pack" documentation.
Like the "bitmap-format" added before it in
0d4455a3ab0 (documentation: add documentation for the bitmap format,
2013-11-14) the "cruft-packs" were documented in their own file.
As the diff move detection will show there is no change to
"Documentation/technical/cruft-packs.txt" here except to move it, and
to "indent" the existing sections by adding an extra "=" to them.
We could similarly convert the "bitmap-format.txt", but let's leave it
for now due to a conflict with the in-flight ac/bitmap-lookup-table
series.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/cruft-packs.txt | 123 |
1 files changed, 0 insertions, 123 deletions
diff --git a/Documentation/technical/cruft-packs.txt b/Documentation/technical/cruft-packs.txt deleted file mode 100644 index d81f3a8982..0000000000 --- a/Documentation/technical/cruft-packs.txt +++ /dev/null @@ -1,123 +0,0 @@ -= Cruft packs - -The cruft packs feature offer an alternative to Git's traditional mechanism of -removing unreachable objects. This document provides an overview of Git's -pruning mechanism, and how a cruft pack can be used instead to accomplish the -same. - -== Background - -To remove unreachable objects from your repository, Git offers `git repack -Ad` -(see linkgit:git-repack[1]). Quoting from the documentation: - -[quote] -[...] unreachable objects in a previous pack become loose, unpacked objects, -instead of being left in the old pack. [...] loose unreachable objects will be -pruned according to normal expiry rules with the next 'git gc' invocation. - -Unreachable objects aren't removed immediately, since doing so could race with -an incoming push which may reference an object which is about to be deleted. -Instead, those unreachable objects are stored as loose objects and stay that way -until they are older than the expiration window, at which point they are removed -by linkgit:git-prune[1]. - -Git must store these unreachable objects loose in order to keep track of their -per-object mtimes. If these unreachable objects were written into one big pack, -then either freshening that pack (because an object contained within it was -re-written) or creating a new pack of unreachable objects would cause the pack's -mtime to get updated, and the objects within it would never leave the expiration -window. Instead, objects are stored loose in order to keep track of the -individual object mtimes and avoid a situation where all cruft objects are -freshened at once. - -This can lead to undesirable situations when a repository contains many -unreachable objects which have not yet left the grace period. Having large -directories in the shards of `.git/objects` can lead to decreased performance in -the repository. But given enough unreachable objects, this can lead to inode -starvation and degrade the performance of the whole system. Since we -can never pack those objects, these repositories often take up a large amount of -disk space, since we can only zlib compress them, but not store them in delta -chains. - -== Cruft packs - -A cruft pack eliminates the need for storing unreachable objects in a loose -state by including the per-object mtimes in a separate file alongside a single -pack containing all loose objects. - -A cruft pack is written by `git repack --cruft` when generating a new pack. -linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft` -is a classic all-into-one repack, meaning that everything in the resulting pack is -reachable, and everything else is unreachable. Once written, the `--cruft` -option instructs `git repack` to generate another pack containing only objects -not packed in the previous step (which equates to packing all unreachable -objects together). This progresses as follows: - - 1. Enumerate every object, marking any object which is (a) not contained in a - kept-pack, and (b) whose mtime is within the grace period as a traversal - tip. - - 2. Perform a reachability traversal based on the tips gathered in the previous - step, adding every object along the way to the pack. - - 3. Write the pack out, along with a `.mtimes` file that records the per-object - timestamps. - -This mode is invoked internally by linkgit:git-repack[1] when instructed to -write a cruft pack. Crucially, the set of in-core kept packs is exactly the set -of packs which will not be deleted by the repack; in other words, they contain -all of the repository's reachable objects. - -When a repository already has a cruft pack, `git repack --cruft` typically only -adds objects to it. An exception to this is when `git repack` is given the -`--cruft-expiration` option, which allows the generated cruft pack to omit -expired objects instead of waiting for linkgit:git-gc[1] to expire those objects -later on. - -It is linkgit:git-gc[1] that is typically responsible for removing expired -unreachable objects. - -== Caution for mixed-version environments - -Repositories that have cruft packs in them will continue to work with any older -version of Git. Note, however, that previous versions of Git which do not -understand the `.mtimes` file will use the cruft pack's mtime as the mtime for -all of the objects in it. In other words, do not expect older (pre-cruft pack) -versions of Git to interpret or even read the contents of the `.mtimes` file. - -Note that having mixed versions of Git GC-ing the same repository can lead to -unreachable objects never being completely pruned. This can happen under the -following circumstances: - - - An older version of Git running GC explodes the contents of an existing - cruft pack loose, using the cruft pack's mtime. - - A newer version running GC collects those loose objects into a cruft pack, - where the .mtime file reflects the loose object's actual mtimes, but the - cruft pack mtime is "now". - -Repeating this process will lead to unreachable objects not getting pruned as a -result of repeatedly resetting the objects' mtimes to the present time. - -If you are GC-ing repositories in a mixed version environment, consider omitting -the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and -leaving the `gc.cruftPacks` configuration unset until all writers understand -cruft packs. - -== Alternatives - -Notable alternatives to this design include: - - - The location of the per-object mtime data, and - - Storing unreachable objects in multiple cruft packs. - -On the location of mtime data, a new auxiliary file tied to the pack was chosen -to avoid complicating the `.idx` format. If the `.idx` format were ever to gain -support for optional chunks of data, it may make sense to consolidate the -`.mtimes` format into the `.idx` itself. - -Storing unreachable objects among multiple cruft packs (e.g., creating a new -cruft pack during each repacking operation including only unreachable objects -which aren't already stored in an earlier cruft pack) is significantly more -complicated to construct, and so aren't pursued here. The obvious drawback to -the current implementation is that the entire cruft pack must be re-written from -scratch. |