diff options
author | Ævar Arnfjörð Bjarmason <avarab@gmail.com> | 2022-08-04 18:28:37 +0200 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2022-08-04 14:12:23 -0700 |
commit | 00d3e8d7dd9ece6fe89dafff384ff32444754211 (patch) | |
tree | c3a1170e077b9aaa504c7b910e738899615031ef /Documentation/technical | |
parent | 5db921054e685a4dbaeb622acda53d6a154e947f (diff) | |
download | git-00d3e8d7dd9ece6fe89dafff384ff32444754211.tar.gz |
docs: move index format docs to man section 5
Continue the move of existing Documentation/technical/* protocol and
file-format documentation into our main documentation space by moving
the index format documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/index-format.txt | 406 |
1 files changed, 0 insertions, 406 deletions
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt deleted file mode 100644 index 65da0daaa5..0000000000 --- a/Documentation/technical/index-format.txt +++ /dev/null @@ -1,406 +0,0 @@ -Git index format -================ - -== The Git index file has the following format - - All binary numbers are in network byte order. - In a repository using the traditional SHA-1, checksums and object IDs - (object names) mentioned below are all computed using SHA-1. Similarly, - in SHA-256 repositories, these values are computed using SHA-256. - Version 2 is described here unless stated otherwise. - - - A 12-byte header consisting of - - 4-byte signature: - The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") - - 4-byte version number: - The current supported versions are 2, 3 and 4. - - 32-bit number of index entries. - - - A number of sorted index entries (see below). - - - Extensions - - Extensions are identified by signature. Optional extensions can - be ignored if Git does not understand them. - - Git currently supports cache tree and resolve undo extensions. - - 4-byte extension signature. If the first byte is 'A'..'Z' the - extension is optional and can be ignored. - - 32-bit size of the extension - - Extension data - - - Hash checksum over the content of the index file before this checksum. - -== Index entry - - Index entries are sorted in ascending order on the name field, - interpreted as a string of unsigned bytes (i.e. memcmp() order, no - localization, no special casing of directory separator '/'). Entries - with the same name are sorted by their stage field. - - An index entry typically represents a file. However, if sparse-checkout - is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the - `extensions.sparseIndex` extension is enabled, then the index may - contain entries for directories outside of the sparse-checkout definition. - These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and - the path ends in a directory separator. - - 32-bit ctime seconds, the last time a file's metadata changed - this is stat(2) data - - 32-bit ctime nanosecond fractions - this is stat(2) data - - 32-bit mtime seconds, the last time a file's data changed - this is stat(2) data - - 32-bit mtime nanosecond fractions - this is stat(2) data - - 32-bit dev - this is stat(2) data - - 32-bit ino - this is stat(2) data - - 32-bit mode, split into (high to low bits) - - 4-bit object type - valid values in binary are 1000 (regular file), 1010 (symbolic link) - and 1110 (gitlink) - - 3-bit unused - - 9-bit unix permission. Only 0755 and 0644 are valid for regular files. - Symbolic links and gitlinks have value 0 in this field. - - 32-bit uid - this is stat(2) data - - 32-bit gid - this is stat(2) data - - 32-bit file size - This is the on-disk size from stat(2), truncated to 32-bit. - - Object name for the represented object - - A 16-bit 'flags' field split into (high to low bits) - - 1-bit assume-valid flag - - 1-bit extended flag (must be zero in version 2) - - 2-bit stage (during merge) - - 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF - is stored in this field. - - (Version 3 or later) A 16-bit field, only applicable if the - "extended flag" above is 1, split into (high to low bits). - - 1-bit reserved for future - - 1-bit skip-worktree flag (used by sparse checkout) - - 1-bit intent-to-add flag (used by "git add -N") - - 13-bit unused, must be zero - - Entry path name (variable length) relative to top level directory - (without leading slash). '/' is used as path separator. The special - path components ".", ".." and ".git" (without quotes) are disallowed. - Trailing slash is also disallowed. - - The exact encoding is undefined, but the '.' and '/' characters - are encoded in 7-bit ASCII and the encoding cannot contain a NUL - byte (iow, this is a UNIX pathname). - - (Version 4) In version 4, the entry path name is prefix-compressed - relative to the path name for the previous entry (the very first - entry is encoded as if the path name for the previous entry is an - empty string). At the beginning of an entry, an integer N in the - variable width encoding (the same encoding as the offset is encoded - for OFS_DELTA pack entries; see pack-format.txt) is stored, followed - by a NUL-terminated string S. Removing N bytes from the end of the - path name for the previous entry, and replacing it with the string S - yields the path name for this entry. - - 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes - while keeping the name NUL-terminated. - - (Version 4) In version 4, the padding after the pathname does not - exist. - - Interpretation of index entries in split index mode is completely - different. See below for details. - -== Extensions - -=== Cache tree - - Since the index does not record entries for directories, the cache - entries cannot describe tree objects that already exist in the object - database for regions of the index that are unchanged from an existing - commit. The cache tree extension stores a recursive tree structure that - describes the trees that already exist and completely match sections of - the cache entries. This speeds up tree object generation from the index - for a new commit by only computing the trees that are "new" to that - commit. It also assists when comparing the index to another tree, such - as `HEAD^{tree}`, since sections of the index can be skipped when a tree - comparison demonstrates equality. - - The recursive tree structure uses nodes that store a number of cache - entries, a list of subnodes, and an object ID (OID). The OID references - the existing tree for that node, if it is known to exist. The subnodes - correspond to subdirectories that themselves have cache tree nodes. The - number of cache entries corresponds to the number of cache entries in - the index that describe paths within that tree's directory. - - The extension tracks the full directory structure in the cache tree - extension, but this is generally smaller than the full cache entry list. - - When a path is updated in index, Git invalidates all nodes of the - recursive cache tree corresponding to the parent directories of that - path. We store these tree nodes as being "invalid" by using "-1" as the - number of cache entries. Invalid nodes still store a span of index - entries, allowing Git to focus its efforts when reconstructing a full - cache tree. - - The signature for this extension is { 'T', 'R', 'E', 'E' }. - - A series of entries fill the entire extension; each of which - consists of: - - - NUL-terminated path component (relative to its parent directory); - - - ASCII decimal number of entries in the index that is covered by the - tree this entry represents (entry_count); - - - A space (ASCII 32); - - - ASCII decimal number that represents the number of subtrees this - tree has; - - - A newline (ASCII 10); and - - - Object name for the object that would result from writing this span - of index as a tree. - - An entry can be in an invalidated state and is represented by having - a negative number in the entry_count field. In this case, there is no - object name and the next entry starts immediately after the newline. - When writing an invalid entry, -1 should always be used as entry_count. - - The entries are written out in the top-down, depth-first order. The - first entry represents the root level of the repository, followed by the - first subtree--let's call this A--of the root level (with its name - relative to the root level), followed by the first subtree of A (with - its name relative to A), and so on. The specified number of subtrees - indicates when the current level of the recursive stack is complete. - -=== Resolve undo - - A conflict is represented in the index as a set of higher stage entries. - When a conflict is resolved (e.g. with "git add path"), these higher - stage entries will be removed and a stage-0 entry with proper resolution - is added. - - When these higher stage entries are removed, they are saved in the - resolve undo extension, so that conflicts can be recreated (e.g. with - "git checkout -m"), in case users want to redo a conflict resolution - from scratch. - - The signature for this extension is { 'R', 'E', 'U', 'C' }. - - A series of entries fill the entire extension; each of which - consists of: - - - NUL-terminated pathname the entry describes (relative to the root of - the repository, i.e. full pathname); - - - Three NUL-terminated ASCII octal numbers, entry mode of entries in - stage 1 to 3 (a missing stage is represented by "0" in this field); - and - - - At most three object names of the entry in stages from 1 to 3 - (nothing is written for a missing stage). - -=== Split index - - In split index mode, the majority of index entries could be stored - in a separate file. This extension records the changes to be made on - top of that to produce the final index. - - The signature for this extension is { 'l', 'i', 'n', 'k' }. - - The extension consists of: - - - Hash of the shared index file. The shared index file path - is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the - index does not require a shared index file. - - - An ewah-encoded delete bitmap, each bit represents an entry in the - shared index. If a bit is set, its corresponding entry in the - shared index will be removed from the final index. Note, because - a delete operation changes index entry positions, but we do need - original positions in replace phase, it's best to just mark - entries for removal, then do a mass deletion after replacement. - - - An ewah-encoded replace bitmap, each bit represents an entry in - the shared index. If a bit is set, its corresponding entry in the - shared index will be replaced with an entry in this index - file. All replaced entries are stored in sorted order in this - index. The first "1" bit in the replace bitmap corresponds to the - first index entry, the second "1" bit to the second entry and so - on. Replaced entries may have empty path names to save space. - - The remaining index entries after replaced ones will be added to the - final index. These added entries are also sorted by entry name then - stage. - -== Untracked cache - - Untracked cache saves the untracked file list and necessary data to - verify the cache. The signature for this extension is { 'U', 'N', - 'T', 'R' }. - - The extension starts with - - - A sequence of NUL-terminated strings, preceded by the size of the - sequence in variable width encoding. Each string describes the - environment where the cache can be used. - - - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from - ctime field until "file size". - - - Stat data of core.excludesFile - - - 32-bit dir_flags (see struct dir_struct) - - - Hash of $GIT_DIR/info/exclude. A null hash means the file - does not exist. - - - Hash of core.excludesFile. A null hash means the file does - not exist. - - - NUL-terminated string of per-dir exclude file name. This usually - is ".gitignore". - - - The number of following directory blocks, variable width - encoding. If this number is zero, the extension ends here with a - following NUL. - - - A number of directory blocks in depth-first-search order, each - consists of - - - The number of untracked entries, variable width encoding. - - - The number of sub-directory blocks, variable width encoding. - - - The directory name terminated by NUL. - - - A number of untracked file/dir names terminated by NUL. - -The remaining data of each directory block is grouped by type: - - - An ewah bitmap, the n-th bit marks whether the n-th directory has - valid untracked cache entries. - - - An ewah bitmap, the n-th bit records "check-only" bit of - read_directory_recursive() for the n-th directory. - - - An ewah bitmap, the n-th bit indicates whether hash and stat data - is valid for the n-th directory and exists in the next data. - - - An array of stat data. The n-th data corresponds with the n-th - "one" bit in the previous ewah bitmap. - - - An array of hashes. The n-th hash corresponds with the n-th "one" bit - in the previous ewah bitmap. - - - One NUL. - -== File System Monitor cache - - The file system monitor cache tracks files for which the core.fsmonitor - hook has told us about changes. The signature for this extension is - { 'F', 'S', 'M', 'N' }. - - The extension starts with - - - 32-bit version number: the current supported versions are 1 and 2. - - - (Version 1) - 64-bit time: the extension data reflects all changes through the given - time which is stored as the nanoseconds elapsed since midnight, - January 1, 1970. - - - (Version 2) - A null terminated string: an opaque token defined by the file system - monitor application. The extension data reflects all changes relative - to that token. - - - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. - - - An ewah bitmap, the n-th bit indicates whether the n-th index entry - is not CE_FSMONITOR_VALID. - -== End of Index Entry - - The End of Index Entry (EOIE) is used to locate the end of the variable - length index entries and the beginning of the extensions. Code can take - advantage of this to quickly locate the index extensions without having - to parse through all of the index entries. - - Because it must be able to be loaded before the variable length cache - entries and other index extensions, this extension must be written last. - The signature for this extension is { 'E', 'O', 'I', 'E' }. - - The extension consists of: - - - 32-bit offset to the end of the index entries - - - Hash over the extension types and their sizes (but not - their contents). E.g. if we have "TREE" extension that is N-bytes - long, "REUC" extension that is M-bytes long, followed by "EOIE", - then the hash would be: - - Hash("TREE" + <binary representation of N> + - "REUC" + <binary representation of M>) - -== Index Entry Offset Table - - The Index Entry Offset Table (IEOT) is used to help address the CPU - cost of loading the index by enabling multi-threading the process of - converting cache entries from the on-disk format to the in-memory format. - The signature for this extension is { 'I', 'E', 'O', 'T' }. - - The extension consists of: - - - 32-bit version (currently 1) - - - A number of index offset entries each consisting of: - - - 32-bit offset from the beginning of the file to the first cache entry - in this block of entries. - - - 32-bit count of cache entries in this block - -== Sparse Directory Entries - - When using sparse-checkout in cone mode, some entire directories within - the index can be summarized by pointing to a tree object instead of the - entire expanded list of paths within that tree. An index containing such - entries is a "sparse index". Index format versions 4 and less were not - implemented with such entries in mind. Thus, for these versions, an - index containing sparse directory entries will include this extension - with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, - tools should avoid interacting with a sparse index unless they understand - this extension. |