Age | Commit message (Collapse) | Author | Files | Lines |
|
* maint-2.44: (41 commits)
Git 2.44.1
Git 2.43.4
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
...
|
|
* maint-2.43: (40 commits)
Git 2.43.4
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
...
|
|
* maint-2.42: (39 commits)
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
...
|
|
* maint-2.41: (38 commits)
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
docs: document security issues around untrusted .git dirs
...
|
|
* maint-2.40: (39 commits)
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
docs: document security issues around untrusted .git dirs
upload-pack: disable lazy-fetching by default
...
|
|
* maint-2.39: (38 commits)
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
docs: document security issues around untrusted .git dirs
upload-pack: disable lazy-fetching by default
fetch/clone: detect dubious ownership of local repositories
...
|
|
There is a bug in directory/file ("D/F") conflict checking optimization:
It assumes that such a conflict cannot happen if a newly added entry's
path is lexicgraphically "greater than" the last already-existing index
entry _and_ contains a directory separator that comes strictly after the
common prefix (`len > len_eq_offset`).
This assumption is incorrect, though: `a-` sorts _between_ `a` and
`a/b`, their common prefix is `a`, the slash comes after the common
prefix, and there is still a file/directory conflict.
Let's re-design this logic, taking these facts into consideration:
- It is impossible for a file to sort after another file with whose
directory it conflicts because the trailing NUL byte is always smaller
than any other character.
- Since there are quite a number of ASCII characters that sort before
the slash (e.g. `-`, `.`, the space character), looking at the last
already-existing index entry is not enough to determine whether there
is a D/F conflict when the first character different from the
existing last index entry's path is a slash.
If it is not a slash, there cannot be a file/directory conflict.
And if the existing index entry's first different character is a
slash, it also cannot be a file/directory conflict because the
optimization requires the newly-added entry's path to sort _after_ the
existing entry's, and the conflicting file's path would not.
So let's fall back to the regular binary search whenever the newly-added
item's path differs in a slash character. If it does not, and it sorts
after the last index entry, there is no D/F conflict and the new index
entry can be safely appended.
This fix also nicely simplifies the logic and makes it much easier to
reason about, while the impact on performance should be negligible:
After this fix, the optimization will be skipped only when index
entry's paths differ in a slash and a space, `!`, `"`, `#`, `$`,
`%`, `&`, `'`, | ( `)`, `*`, `+`, `,`, `-`, or `.`, which should
be a rare situation.
Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
|
|
Unlike "git add" and other end-user facing commands, where it is
diagnosed as an error to give a pathspec with an element that does
not match any path, the diff machinery does not care if some
elements of the pathspec do not match. Given that the diff
machinery is heavily used in pathspec-limited "git log" machinery,
and it is common for a path to come and go while traversing the
project history, this is usually a good thing.
However, in some cases we would want to know if all the pathspec
elements matched. For example, "git add -u <pathspec>" internally
uses the machinery used by "git diff-files" to decide contents from
what paths to add to the index, and as an end-user facing command,
"git add -u" would want to report an unmatched pathspec element.
Add a new .ps_matched member next to the .prune_data member in
"struct rev_info" so that we can optionally keep track of the use of
.prune_data pathspec elements that can be inspected by the caller.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove unused header "#include".
* en/header-cleanup:
treewide: remove unnecessary includes in source files
treewide: add direct includes currently only pulled in transitively
trace2/tr2_tls.h: remove unnecessary include
submodule-config.h: remove unnecessary include
pkt-line.h: remove unnecessary include
line-log.h: remove unnecessary include
http.h: remove unnecessary include
fsmonitor--daemon.h: remove unnecessary includes
blame.h: remove unnecessary includes
archive.h: remove unnecessary include
treewide: remove unnecessary includes in source files
treewide: remove unnecessary includes from header files
|
|
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
At times, we may already know that a path represented by a
cache_entry ce has no changes via some out-of-line means, like
fsmonitor, and yet need the control to go through a codepath that
requires us to have "struct stat" obtained by lstat() on the path,
for various purposes (e.g. "ie_match_stat()" wants cached stat-info
is still current wrt "struct stat", "diff" wants to know st_mode).
The callers of lstat() on a tracked file, when its cache_entry knows
it is up-to-date, can instead call this helper to pretend that it
called lstat() by faking the "struct stat" information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Further shuffling of declarations across header files to streamline
file dependencies.
* cw/compat-util-header-cleanup:
git-compat-util: move alloc macros to git-compat-util.h
treewide: remove unnecessary includes for wrapper.h
kwset: move translation table from ctype
sane-ctype.h: create header for sane-ctype macros
git-compat-util: move wrapper.c funcs to its header
git-compat-util: move strbuf.c funcs to its header
|
|
A few places failed to differenciate the case where the index is
truly empty (nothing added) and we haven't yet read from the
on-disk index file, which have been corrected.
* js/empty-index-fixes:
commit -a -m: allow the top-level tree to become empty again
split-index: accept that a base index can be empty
do_read_index(): always mark index as initialized unless erroring out
|
|
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We are about to fix an ancient bug where `do_read_index()` pretended
that the index was not initialized when there are no index entries.
Before the `index_state` structure gained the `initialized` flag in
913e0e99b6a (unpack_trees(): protect the handcrafted in-core index from
read_cache(), 2008-08-23), that was the best we could do (even if it was
incorrect: it is totally possible to read a Git index file that contains
no index entries).
This pattern was repeated also in 998330ac2e7 (read-cache: look for
shared index files next to the index, too, 2021-08-26), which we fix
here by _not_ mistaking an empty base index for a missing
`sharedindex.*` file.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 913e0e99b6a (unpack_trees(): protect the handcrafted in-core index
from read_cache(), 2008-08-23) a flag was introduced into the
`index_state` structure to indicate whether it had been initialized (or
more correctly: read and parsed).
There was one code path that was not handled, though: when the index
file does not yet exist (but the `must_exist` parameter is set to 0 to
indicate that that's okay). In this instance, Git wants to go forward
with a new, pristine Git index, almost as if the file had existed and
contained no index entries or extensions.
Since Git wants to handle this situation the same as if an "empty" Git
index file existed, let's set the `initialized` flag also in that case.
This is necessary to prepare for fixing the bug where the condition
`cache_nr == 0` is incorrectly used as an indicator that the index was
already read, and the condition `initialized != 0` needs to be used
instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This also made it clear that several .c files that depended upon path.h
were missing a #include for it; add the missing includes while at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We already have a preload-index.c file; move the declarations for the
functions in that file into a new preload-index.h. These were
previously split between cache.h and repository.h.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These functions do not depend upon struct cache_entry or struct
index_state in any way, and it seems more logical to break them out into
this file, especially since statinfo.h already has the struct stat_data
declaration.
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function add_files_to_cache(), plus associated helper functions,
were defined in builtin/add.c, but also shared with builtin/checkout.c
and builtin/commit.c. Move these shared functions to read-cache.c.
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function overlay_tree_on_index(), plus associated helper functions,
were defined in builtin/ls-files.c, but also shared with
builtin/commit.c. Move these shared functions to read-cache.c.
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
|
|
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
None of base_name_compare(), df_name_compare(), or name_compare()
depended upon a cache_entry or index_state in any way. By moving these
functions to tree.h, half a dozen other files can stop depending upon
cache.h (though that change will be made in a later commit).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since cmp_cache_name_compare() was comparing cache_entry structs, it
was associated with the cache rather than with trees. Move the
function. As a side effect, we can make cache_name_stage_compare()
static as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.
* en/header-split-cleanup:
csum-file.h: remove unnecessary inclusion of cache.h
write-or-die.h: move declarations for write-or-die.c functions from cache.h
treewide: remove cache.h inclusion due to setup.h changes
setup.h: move declarations for setup.c functions from cache.h
treewide: remove cache.h inclusion due to environment.h changes
environment.h: move declarations for environment.c functions from cache.h
treewide: remove unnecessary includes of cache.h
wrapper.h: move declarations for wrapper.c functions from cache.h
path.h: move function declarations for path.c functions from cache.h
cache.h: remove expand_user_path()
abspath.h: move absolute path functions from cache.h
environment: move comment_line_char from cache.h
treewide: remove unnecessary cache.h inclusion from several sources
treewide: remove unnecessary inclusion of gettext.h
treewide: be explicit about dependence on gettext.h
treewide: remove unnecessary cache.h inclusion from a few headers
|
|
Code clean-up around the use of the_repository.
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
|
|
The index files can become corrupt under certain conditions when
the split-index feature is in use, especially together with
fsmonitor, which have been corrected.
* js/split-index-fixes:
unpack-trees: take care to propagate the split-index flag
fsmonitor: avoid overriding `cache_changed` bits
split-index; stop abusing the `base_oid` to strip the "link" extension
split-index & fsmonitor: demonstrate a bug
|
|
en/header-split-cache-h
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
|
|
As can easily be seen from grepping in our sources, we had these uses
of "the_repository" in various library code in cases where the
function in question was already getting a "struct repository *"
argument. Let's use that argument instead.
Out of these changes only the changes to "cache-tree.c",
"commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly
applied before the migration away from the "repo_*()" wrapper macros
in the preceding commits.
The rest aren't new, as we'd previously implicitly refer to
"the_repository", but it's now more obvious that we were doing the
wrong thing all along, and should have used the parameter instead.
The change to change "get_index_format_default(the_repository)" in
"read-cache.c" to use the "r" variable instead should arguably have
been part of [1], or in the subsequent cleanup in [2]. Let's do it
here, as can be seen from the initial code in [3] it's not important
that we use "the_repository" there, but would prefer to always use the
current repository.
This change excludes the "the_repository" use in "upload-pack.c"'s
upload_pack_advertise(), as the in-flight [4] makes that change.
1. ee1f0c242ef (read-cache: add index.skipHash config option,
2023-01-06)
2. 6269f8eaad0 (treewide: always have a valid "index_state.repo"
member, 2023-01-17)
3. 7211b9e7534 (repo-settings: consolidate some config settings,
2019-08-13)
4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the part of "the_repository.pending.cocci" pertaining to
"object-store.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the part of "the_repository.pending.cocci" pertaining to
"cache.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When a split-index is in effect, the `$GIT_DIR/index` file needs to
contain a "link" extension that contains all the information about the
split-index, including the information about the shared index.
However, in some cases Git needs to suppress writing that "link"
extension (i.e. to fall back to writing a full index) even if the
in-memory index structure _has_ a `split_index` configured. This is the
case e.g. when "too many not shared" index entries exist.
In such instances, the current code sets the `base_oid` field of said
`split_index` structure to all-zero to indicate that `do_write_index()`
should skip writing the "link" extension.
This can lead to problems later on, when the in-memory index is still
used to perform other operations and eventually wants to write a
split-index, detects the presence of the `split_index` and reuses that,
too (under the assumption that it has been initialized correctly and
still has a non-null `base_oid`).
Let's stop zeroing out the `base_oid` to indicate that the "link"
extension should not be written.
One might be tempted to simply call `discard_split_index()` instead,
under the assumption that Git decided to write a non-split index and
therefore the `split_index` structure might no longer be wanted.
However, that is not possible because that would release index entries
in `split_index->base` that are likely to still be in use. Therefore we
cannot do that.
The next best thing we _can_ do is to introduce a bit field to indicate
specifically which index extensions (not) to write. So that's what we do
here.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Type fixes.
* rs/size-t-fixes:
pack-objects: use strcspn(3) in name_cmp_len()
read-cache: use size_t for {base,df}_name_compare()
|
|
Support names of any length in base_name_compare() and df_name_compare()
by using size_t for their length parameters. They pass the length on to
memcmp(3), which also takes it as a size_t.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the "repo" member was added to "the_index" in [1] the
repo_read_index() was made to populate it, but the unpopulated
"the_index" variable didn't get the same treatment.
Let's do that in initialize_the_repository() when we set it up, and
likewise for all of the current callers initialized an empty "struct
index_state".
This simplifies code that needs to deal with "the_index" or a custom
"struct index_state", we no longer need to second-guess this part of
the "index_state" deep in the stack. A recent example of such
second-guessing is the "istate->repo ? istate->repo : the_repository"
code in [2]. We can now simply use "istate->repo".
We're doing this by making use of the INDEX_STATE_INIT() macro (and
corresponding function) added in [3], which now have mandatory "repo"
arguments.
Because we now call index_state_init() in repository.c's
initialize_the_repository() we don't need to handle the case where we
have a "repo->index" whose "repo" member doesn't match the "repo"
we're setting up, i.e. the "Complete the double-reference" code in
repo_read_index() being altered here. That logic was originally added
in [1], and was working around the lack of what we now have in
initialize_the_repository().
For "fsmonitor-settings.c" we can remove the initialization of a NULL
"r" argument to "the_repository". This was added back in [4], and was
needed at the time for callers that would pass us the "r" from an
"istate->repo". Before this change such a change to
"fsmonitor-settings.c" would segfault all over the test suite (e.g. in
t0002-gitfile.sh).
This change has wider eventual implications for
"fsmonitor-settings.c". The reason the other lazy loading behavior in
it is required (starting with "if (!r->settings.fsmonitor) ..." is
because of the previously passed "r" being "NULL".
I have other local changes on top of this which move its configuration
reading to "prepare_repo_settings()" in "repo-settings.c", as we could
now start to rely on it being called for our "r". But let's leave all
of that for now, and narrowly remove this particular part of the
lazy-loading.
1. 1fd9ae517c4 (repository: add repo reference to index_state,
2021-01-23)
2. ee1f0c242ef (read-cache: add index.skipHash config option,
2023-01-06)
3. 2f6b1eb794e (cache API: add a "INDEX_STATE_INIT" macro/function,
add release_index(), 2023-01-12)
4. 1e0ea5c4316 (fsmonitor: config settings are repository-specific,
2022-03-25)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ab/cache-api-cleanup:
cache API: add a "INDEX_STATE_INIT" macro/function, add release_index()
read-cache.c: refactor set_new_index_sparsity() for subsequent commit
sparse-index API: BUG() out on NULL ensure_full_index()
sparse-index.c: expand_to_path() can assume non-NULL "istate"
builtin/difftool.c: { 0 }-initialize rather than using memset()
|
|
Introduce an optional configuration to allow the trailing hash that
protects the index file from bit flipping.
* ds/omit-trailing-hash-in-index:
features: feature.manyFiles implies fast index writes
test-lib-functions: add helper for trailing hash
read-cache: add index.skipHash config option
hashfile: allow skipping the hash function
|
|
Hopefully in some not so distant future, we'll get advantages from always
initializing the "repo" member of the "struct index_state". To make
that easier let's introduce an initialization macro & function.
The various ad-hoc initialization of the structure can then be changed
over to it, and we can remove the various "0" assignments in
discard_index() in favor of calling index_state_init() at the end.
While not strictly necessary, let's also change the CALLOC_ARRAY() of
various "struct index_state *" to use an ALLOC_ARRAY() followed by
index_state_init() instead.
We're then adding the release_index() function and converting some
callers (including some of these allocations) over to it if they
either won't need to use their "struct index_state" again, or are just
about to call index_state_init().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor code added to set_new_index_sparsity() in [1] to eliminate
indentation resulting from putting the body of his function within the
"if" block. Let's instead return early if we have no
istate->repo. This trivial change makes the subsequent commit's diff
smaller.
1. 491df5f679f (read-cache: set sparsity when index is new, 2022-05-10)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The recent addition of the index.skipHash config option allows index
writes to speed up by skipping the hash computation for the trailing
checksum. This is particularly critical for repositories with many files
at HEAD, so add this config option to two cases where users in that
scenario may opt-in to such behavior:
1. The feature.manyFiles config option enables some options that are
helpful for repositories with many files at HEAD.
2. 'scalar register' and 'scalar reconfigure' set config options that
optimize for large repositories.
In both of these cases, set index.skipHash=true to gain this
speedup. Add tests that demonstrate the proper way that
index.skipHash=true can override feature.manyFiles=true.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous change allowed skipping the hashing portion of the
hashwrite API, using it instead as a buffered write API. Disabling the
hashwrite can be particularly helpful when the write operation is in a
critical path.
One such critical path is the writing of the index. This operation is so
critical that the sparse index was created specifically to reduce the
size of the index to make these writes (and reads) faster.
This trade-off between file stability at rest and write-time performance
is not easy to balance. The index is an interesting case for a couple
reasons:
1. Writes block users. Writing the index takes place in many user-
blocking foreground operations. The speed improvement directly
impacts their use. Other file formats are typically written in the
background (commit-graph, multi-pack-index) or are super-critical to
correctness (pack-files).
2. Index files are short lived. It is rare that a user leaves an index
for a long time with many staged changes. Outside of staged changes,
the index can be completely destroyed and rewritten with minimal
impact to the user.
Following a similar approach to one used in the microsoft/git fork [1],
add a new config option (index.skipHash) that allows disabling this
hashing during the index write. The cost is that we can no longer
validate the contents for corruption-at-rest using the trailing hash.
[1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201
We load this config from the repository config given by istate->repo,
with a fallback to the_repository if it is not set.
While older Git versions will not recognize the null hash as a special
case, the file format itself is still being met in terms of its
structure. Using this null hash will still allow Git operations to
function across older versions.
The one exception is 'git fsck' which checks the hash of the index file.
This used to be a check on every index read, but was split out to just
the index in a33fc72fe91 (read-cache: force_verify_index_checksum,
2017-04-14) and released first in Git 2.13.0. Document the versions that
relaxed these restrictions, with the optimistic expectation that this
change will be included in Git 2.40.0.
Here, we disable this check if the trailing hash is all zeroes. We add a
warning to the config option that this may cause undesirable behavior
with older Git versions.
As a quick comparison, I tested 'git update-index --force-write' with
and without index.skipHash=true on a copy of the Linux kernel
repository.
Benchmark 1: with hash
Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms]
Range (min … max): 34.3 ms … 79.1 ms 82 runs
Benchmark 2: without hash
Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms]
Range (min … max): 16.3 ms … 42.0 ms 69 runs
Summary
'without hash' ran
1.78 ± 0.76 times faster than 'with hash'
These performance benefits are substantial enough to allow users the
ability to opt-in to this feature, even with the potential confusion
with older 'git fsck' versions.
Test this new config option, both at a command-line level and within a
submodule. The confirmation is currently limited to confirm that 'git
fsck' does not complain about the index. Future updates will make this
test more robust.
It is critical that this test is placed before the test_index_version
tests, since those tests obliterate the .git/config file and hence lose
the setting from GIT_TEST_DEFAULT_HASH, if set.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Various leak fixes.
* ab/various-leak-fixes:
built-ins: use free() not UNLEAK() if trivial, rm dead code
revert: fix parse_options_concat() leak
cherry-pick: free "struct replay_opts" members
rebase: don't leak on "--abort"
connected.c: free the "struct packed_git"
sequencer.c: fix "opts->strategy" leak in read_strategy_opts()
ls-files: fix a --with-tree memory leak
revision API: call graph_clear() in release_revisions()
unpack-file: fix ancient leak in create_temp_file()
built-ins & libs & helpers: add/move destructors, fix leaks
dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache"
read-cache.c: clear and free "sparse_checkout_patterns"
commit: discard partial cache before (re-)reading it
{reset,merge}: call discard_index() before returning
tests: mark tests as passing with SANITIZE=leak
|
|
The "sparse_checkout_patterns" member was added to the "struct
index_state" in 836e25c51b2 (sparse-checkout: hold pattern list in
index, 2021-03-30), but wasn't added to discard_index(). Let's do
that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
|
|
The discard_index() function has not returned non-zero since
7a51ed66f65 (Make on-disk index representation separate from in-core
one, 2008-01-14), but we've had various code in-tree still acting as
though that might be the case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The process for reading the index into memory from disk is to first read its
contents into a single memory-mapped file buffer (type 'char *'), then
sequentially convert each on-disk index entry into a corresponding incore
'cache_entry'. To access the contents of the on-disk entry for processing, a
moving pointer within the memory-mapped file is cast to type 'struct
ondisk_cache_entry *'.
In index v4, the entries in the on-disk index file are written *without*
aligning their first byte to a 4-byte boundary; entries are a variable
length (depending on the entry name and whether or not extended flags are
used). As a result, casting the 'char *' buffer pointer to 'struct
ondisk_cache_entry *' then accessing its contents in a 'SANITIZE=undefined'
build can trigger the following error:
read-cache.c:1886:46: runtime error: member access within misaligned
address <address> for type 'struct ondisk_cache_entry', which requires 4
byte alignment
Avoid this error by reading fields directly from the 'char *' buffer, using
the 'offsetof' individual fields in 'struct ondisk_cache_entry'.
Additionally, add documentation describing why the new approach avoids the
misaligned address error, as well as advice on how to improve the
implementation in the future.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fixes to sparse index compatibility work for "reset" and "checkout"
commands.
source: <pull.1312.v3.git.1659985672.gitgitgadget@gmail.com>
* vd/sparse-reset-checkout-fixes:
unpack-trees: unpack new trees as sparse directories
cache.h: create 'index_name_pos_sparse()'
oneway_diff: handle removed sparse directories
checkout: fix nested sparse directory diff in sparse index
|
|
Fix for a bug that makes write-tree to fail to write out a
non-existent index as a tree, introduced in 2.37.
source: <20220722212232.833188-1-martin.agren@gmail.com>
* tk/untracked-cache-with-uall:
read-cache: make `do_read_index()` always set up `istate->repo`
|
|
Add 'index_name_pos_sparse()', which behaves the same as 'index_name_pos()',
except that it does not expand a sparse index to search for an entry inside
a sparse directory.
'index_entry_exists()' was originally implemented in 20ec2d034c (reset: make
sparse-aware (except --mixed), 2021-11-29) as an alternative to
'index_name_pos()' to allow callers to search for an index entry without
expanding a sparse index. However, that particular use case only required
knowing whether the requested entry existed, so 'index_entry_exists()' does
not return the index positioning information provided by 'index_name_pos()'.
This patch implements 'index_name_pos_sparse()' to accommodate callers that
need the positioning information of 'index_name_pos()', but do not want to
expand the index.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If there is no index file, e.g., because the repository has just been
created, we return zero early (unless `must_exist` makes us die
instead.)
This early return means we do not set up `istate->repo`. With
`core.untrackedCache=true`, the recent e6a653554b ("untracked-cache:
support '--untracked-files=all' if configured", 2022-03-31) will
eventually pass down `istate->repo` as a null pointer to
`repo_config_get_string()`, causing a segmentation fault.
If we do hit this early return, set up `istate->repo` similar to when we
actually read the index.
Reported-by: Joey Hess <id@joeyh.name>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 998330ac2e7c (read-cache: look for shared index files next to the
index, too, 2021-08-26), we added code that allocates memory to store
the base path of a shared index, but we never released that memory.
Reported by Coverity.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove redundant copying (with index v3 and older) or possible
over-reading beyond end of mmapped memory (with index v4) has been
corrected.
* zh/read-cache-copy-name-entry-fix:
read-cache.c: reduce unnecessary cache entry name copying
|
|
575fa8a3 (read-cache: read data in a hash-independent way,
2019-02-19) added a new code to copy from the on-disk data into the
name member of the in-core cache entry, which is already done
immediately after that in a way that takes prefix-compression into
account.
Remove this code, as it is not just unnecessary, but also can be
reading beyond the on-disk data, when we are copying very long
prefix string from the previous entry.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: rewrote the log message with Réne's findings]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"sparse-checkout" learns to work well with the sparse-index
feature.
* ds/sparse-sparse-checkout:
sparse-checkout: integrate with sparse index
p2000: add test for 'git sparse-checkout [add|set]'
sparse-index: complete partial expansion
sparse-index: partially expand directories
sparse-checkout: --no-sparse-index needs a full index
cache-tree: implement cache_tree_find_path()
sparse-index: introduce partially-sparse indexes
sparse-index: create expand_index()
t1092: stress test 'git sparse-checkout set'
t1092: refactor 'sparse-index contents' test
|
|
A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.
For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:
* INDEX_EXPANDED (0): No sparse directories exist. This is always the
case for repositories that do not use cone-mode sparse-checkout.
* INDEX_COLLAPSED: Sparse directories may exist. Files outside the
sparse-checkout cone are reduced to sparse directory entries whenever
possible.
* INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file
entries outside the sparse-checkout cone may exist. Running
convert_to_sparse() may further reduce those files to sparse directory
entries.
The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
INDEX_EXPANDED mode but to actually do the necessary work when in
INDEX_PARTIALLY_SPARSE mode.
The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the index read in 'do_read_index()' does not exist on-disk, mark the
index "sparse" if the executing command does not require a full index and
sparse index is otherwise enabled.
Some commands (such as 'git stash -u') implicitly create a new index (when
the 'GIT_INDEX_FILE' variable points to a non-existent file) and perform
some operation on it. However, when this index is created, it isn't created
with the same sparsity settings as the repo index. As a result, while these
indexes may be sparse during the operation, they are always expanded before
being written to disk. We can avoid that expansion by defaulting the index
to "sparse", in which case it will only be expanded if the full index is
needed.
Note that the function 'set_new_index_sparsity()' is created despite having
only a single caller because additional callers will be added in a
subsequent patch.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git mv" failed to refresh the cached stat information for the
entry it moved.
* vd/mv-refresh-stat:
mv: refresh stat info for moved entry
|
|
Update the stat info of the moved index entry in 'rename_index_entry_at()'
if the entry is up-to-date with the index. Internally, 'git mv' uses
'rename_index_entry_at()' to move the source index entry to the destination.
However, it directly copies the stat info of the original cache entry, which
will not reflect the 'ctime' of the file renaming operation that happened as
part of the move. If a file is otherwise up-to-date with the index, that
difference in 'ctime' will make the entry appear out-of-date until the next
index-refreshing operation (e.g., 'git status').
Some commands, such as 'git reset', use the cached stat information to
determine whether a file is up-to-date; if this information is incorrect,
the command will fail when it should pass. In order to ensure a moved entry
is evaluated as 'up-to-date' when appropriate, refresh the destination index
entry's stat info in 'git mv' if and only if the file is up-to-date.
Note that the test added in 't7001-mv.sh' requires a "sleep 1" to ensure the
'ctime' of the file creation will be definitively older than the 'ctime' of
the renamed file in 'git mv'.
Reported-by: Maximilian Reichel <reichemn@icloud.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Replace core.fsyncObjectFiles with two new configuration variables,
core.fsync and core.fsyncMethod.
* ns/core-fsyncmethod:
core.fsync: documentation and user-friendly aggregate options
core.fsync: new option to harden the index
core.fsync: add configuration parsing
core.fsync: introduce granular fsync control infrastructure
core.fsyncmethod: add writeout-only mode
wrapper: make inclusion of Windows csprng header tightly scoped
|
|
Object-file API shuffling.
* ab/object-file-api-updates:
object-file API: pass an enum to read_object_with_reference()
object-file.c: add a literal version of write_object_file_prepare()
object-file API: have hash_object_file() take "enum object_type"
object API: rename hash_object_file_literally() to write_*()
object-file API: split up and simplify check_object_signature()
object API users + docs: check <0, not !0 with check_object_signature()
object API docs: move check_object_signature() docs to cache.h
object API: correct "buf" v.s. "map" mismatch in *.c and *.h
object-file API: have write_object_file() take "enum object_type"
object-file API: add a format_object_header() function
object-file API: return "void", not "int" from hash_object_file()
object-file.c: split up declaration of unrelated variables
|
|
This commit introduces the new ability for the user to harden
the index. In the event of a system crash, the index must be
durable for the user to actually find a file that has been added
to the repo and then deleted from the working tree.
We use the presence of the COMMIT_LOCK flag and absence of the
alternate_index_output as a proxy for determining whether we're
updating the persistent index of the repo or some temporary
index. We don't sync these temporary indexes.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This commit introduces the infrastructure for the core.fsync
configuration knob. The repository components we want to sync
are identified by flags so that we can turn on or off syncing
for specific components.
If core.fsyncObjectFiles is set and the core.fsync configuration
also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any
loose objects. This picks the strictest data integrity behavior
if core.fsync and core.fsyncObjectFiles are set to conflicting values.
This change introduces the currently unused fsync_component
helper, which will be used by a later patch that adds fsyncing to
the refs backend.
Actual configuration and documentation of the fsync components
list are in other patches in the series to separate review of
the underlying mechanism from the policy of how it's configured.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the write_object_file() function to take an "enum object_type"
instead of a "const char *type". Its callers either passed
{commit,tree,blob,tag}_type and can pass the corresponding OBJ_* type
instead, or were hardcoding strings like "blob".
This avoids the back & forth fragility where the callers of
write_object_file() would have the enum type, and convert it
themselves via type_name(). We do have to now do that conversion
ourselves before calling write_object_file_prepare(), but those
codepaths will be similarly adjusted in subsequent commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git update-index", "git checkout-index", and "git clean" are
taught to work better with the sparse checkout feature.
* vd/sparse-clean-etc:
update-index: reduce scope of index expansion in do_reupdate
update-index: integrate with sparse index
update-index: add tests for sparse-checkout compatibility
checkout-index: integrate with sparse index
checkout-index: add --ignore-skip-worktree-bits option
checkout-index: expand sparse checkout compatibility tests
clean: integrate with sparse index
reset: reorder wildcard pathspec conditions
reset: fix validation in sparse index test
|
|
Mark in various places in the code that the sparse index and the
split index features are mutually incompatible.
* js/sparse-vs-split-index:
split-index: it really is incompatible with the sparse index
t1091: disable split index
sparse-index: sparse index is disallowed when split index is active
|
|
More "config-based hooks".
* ab/config-based-hooks-2:
run-command: remove old run_hook_{le,ve}() hook API
receive-pack: convert push-to-checkout hook to hook.h
read-cache: convert post-index-change to use hook.h
commit: convert {pre-commit,prepare-commit-msg} hook to hook.h
git-p4: use 'git hook' to run hooks
send-email: use 'git hook run' for 'sendemail-validate'
git hook run: add an --ignore-missing flag
hooks: convert worktree 'post-checkout' hook to hook library
hooks: convert non-worktree 'post-checkout' hook to hook library
merge: convert post-merge to use hook.h
am: convert applypatch-msg to use hook.h
rebase: convert pre-rebase to use hook.h
hook API: add a run_hooks_l() wrapper
am: convert {pre,post}-applypatch to use hook.h
gc: use hook library for pre-auto-gc hook
hook API: add a run_hooks() wrapper
hook: add 'run' subcommand
|
|
... at least for now. So let's error out if we are even trying to
initialize the split index when the index is sparse, or when trying to
write the split index extension for a sparse index.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Enable use of the sparse index with `update-index`. Most variations of
`update-index` work without explicitly expanding the index or making any
other updates in or outside of `update-index.c`.
The one usage requiring additional changes is `--cacheinfo`; if a file
inside a sparse directory was specified, the index would not be expanded
until after the cache tree is invalidated, leading to a mismatch between the
index and cache tree. This scenario is handled by rearranging
`add_index_entry_with_check`, allowing `index_name_stage_pos` to expand the
index *before* attempting to invalidate the relevant cache tree path,
avoiding cache tree/index corruption.
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the post-index-change hook away from run-command.h to and over to
the new hook.h library.
This removes the last direct user of "run_hook_ve()" outside of
run-command.c ("run_hook_le()" still uses it). So we can make the
function static now. A subsequent commit will remove this code
entirely when "run_hook_le()" itself goes away.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
'git update-index --refresh' and '--really-refresh' should force writing
of the index file if racy timestamps have been encountered, as
'git status' already does [1].
Note that calling 'git update-index --refresh' still does not guarantee
that there will be no more racy timestamps afterwards (the same holds
true for 'git status'):
- calling 'git update-index --refresh' immediately after touching and
adding a file may still leave racy timestamps if all three operations
occur within the racy-tolerance (usually 1 second unless USE_NSEC has
been defined)
- calling 'git update-index --refresh' for timestamps which are set into
the future will leave them racy
To guarantee that such racy timestamps will be resolved would require to
wait until the system clock has passed beyond these timestamps and only
then write the index file. Especially for future timestamps, this does
not seem feasible because of possibly long delays/hangs.
[1] https://lore.kernel.org/git/d3dd805c-7c1d-30a9-6574-a7bfcb7fc013@syntevo.com/
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Various operating modes of "git reset" have been made to work
better with the sparse index.
* vd/sparse-reset:
unpack-trees: improve performance of next_cache_entry
reset: make --mixed sparse-aware
reset: make sparse-aware (except --mixed)
reset: integrate with sparse index
reset: expand test coverage for sparse checkouts
sparse-index: update command for expand/collapse test
reset: preserve skip-worktree bit in mixed reset
reset: rename is_missing to !is_in_reset_tree
|
|
Ensure that the sparseness of the in-core index matches the
index.sparse configuration specified by the repository immediately
after the on-disk index file is read.
* vd/sparse-sparsity-fix-on-read:
sparse-index: update do_read_index to ensure correct sparsity
sparse-index: add ensure_correct_sparsity function
sparse-index: avoid unnecessary cache tree clearing
test-read-cache.c: prepare_repo_settings after config init
|
|
Remove `ensure_full_index` guard on `prime_cache_tree` and update
`prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
must determine whether a directory entry is sparse or not by searching for
it in the index (*without* expanding the index). If a matching sparse
directory index entry is found, no subtrees are added to the cache tree
entry and the entry count is set to 1 (representing the sparse directory
itself). Otherwise, the tree is assumed to not be sparse and its subtrees
are recursively added to the cache tree.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Unless `command_requires_full_index` forces index expansion, ensure in-core
index sparsity matches config settings on read by calling
`ensure_correct_sparsity`. This makes the behavior of the in-core index more
consistent between different methods of updating sparsity: manually changing
the `index.sparse` config setting vs. executing
`git sparse-checkout --[no-]sparse-index init`
Although index sparsity is normally updated with `git sparse-checkout init`,
ensuring correct sparsity after a manual `index.sparse` change has some
practical benefits:
1. It allows for command-by-command sparsity toggling with
`-c index.sparse=<true|false>`, e.g. when troubleshooting issues with the
sparse index.
2. It prevents users from experiencing abnormal slowness after setting
`index.sparse` to `true` due to use of a full index in all commands until
the on-disk index is updated.
Helped-by: Junio C Hamano <gitster@pobox.com>
Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop "git add --dry-run" from creating new blob and tree objects.
* rs/add-dry-run-without-objects:
add: don't write objects with --dry-run
|
|
Recent sparse-index work broke safety against attempts to add paths
with trailing slashes to the index, which has been corrected.
* rs/make-verify-path-really-verify-again:
read-cache: let verify_path() reject trailing dir separators again
read-cache: add verify_path_internal()
t3905: show failure to ignore sub-repo
|
|
When the option --dry-run/-n is given, "git add" doesn't change the
index, but still writes out new object files. Only hash the latter
without writing instead to make the run as dry as possible.
Use this opportunity to also make the hash_flags variable unsigned,
to match the index_path() parameter it is used as.
Reported-by: git.mexon@spamgourmet.com
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test updates.
* sg/test-split-index-fix:
read-cache: fix GIT_TEST_SPLIT_INDEX
tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
read-cache: look for shared index files next to the index, too
t1600-index: disable GIT_TEST_SPLIT_INDEX
t1600-index: don't run git commands upstream of a pipe
t1600-index: remove unnecessary redirection
|
|
6e773527b6 (sparse-index: convert from full to sparse, 2021-03-30) made
verify_path() accept trailing directory separators for directories,
which is necessary for sparse directory entries. This clemency causes
"git stash" to stumble over sub-repositories, though, and there may be
more unintended side-effects.
Avoid them by restoring the old verify_path() behavior and accepting
trailing directory separators only in places that are supposed to handle
sparse directory entries.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Turn verify_path() into an internal function that distinguishes between
valid paths and those with trailing directory separators and rename it
to verify_path_internal(). Provide a wrapper with the old behavior
under the old name. No functional change intended. The new function
will be used in the next patch.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code cleanup.
* ab/repo-settings-cleanup:
repository.h: don't use a mix of int and bitfields
repo-settings.c: simplify the setup
read-cache & fetch-negotiator: check "enum" values in switch()
environment.c: remove test-specific "ignore_untracked..." variable
wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c
|
|
Simplify the setup code in repo-settings.c in various ways, making the
code shorter, easier to read, and requiring fewer hacks to do the same
thing as it did before:
Since 7211b9e7534 (repo-settings: consolidate some config settings,
2019-08-13) we have memset() the whole "settings" structure to -1 in
prepare_repo_settings(), and subsequently relied on the -1 value.
Most of the fields did not need to be initialized to -1, and because
we were doing that we had the enum labels "UNTRACKED_CACHE_UNSET" and
"FETCH_NEGOTIATION_UNSET" purely to reflect the resulting state
created this memset() in prepare_repo_settings(). No other code used
or relied on them, more on that below.
For the rest most of the subsequent "are we -1, then read xyz" can
simply be removed by re-arranging what we read first. E.g. when
setting the "index.version" setting we should have first read
"feature.experimental", so that it (and "feature.manyfiles") can
provide a default for our "index.version".
Instead the code setting it, added when "feature.manyFiles"[1] was
created, was using the UPDATE_DEFAULT_BOOL() macro added in an earlier
commit[2]. That macro is now gone, since it was only needed for this
pattern of reading things in the wrong order.
This also fixes an (admittedly obscure) logic error where we'd
conflate an explicit "-1" value in the config with our own earlier
memset() -1.
We can also remove the UPDATE_DEFAULT_BOOL() wrapper added in
[3]. Using it is redundant to simply using the return value from
repo_config_get_bool(), which is non-zero if the provided key exists
in the config.
Details on edge cases relating to the memset() to -1, continued from
"more on that below" above:
* UNTRACKED_CACHE_KEEP:
In [4] the "unset" and "keep" handling for core.untrackedCache was
consolidated. But it while we understand the "keep" value, we don't
handle it differently than the case of any other unknown value.
So let's retain UNTRACKED_CACHE_KEEP and remove the
UNTRACKED_CACHE_UNSET setting (which was always implicitly
UNTRACKED_CACHE_KEEP before). We don't need to inform any code
after prepare_repo_settings() that the setting was "unset", as far
as anyone else is concerned it's core.untrackedCache=keep. if
"core.untrackedcache" isn't present in the config.
* FETCH_NEGOTIATION_UNSET & FETCH_NEGOTIATION_NONE:
Since these two two enum fields added in [5] don't rely on the
memzero() setting them to "-1" anymore we don't have to provide
them with explicit values.
1. c6cc4c5afd2 (repo-settings: create feature.manyFiles setting,
2019-08-13)
2. 31b1de6a09b (commit-graph: turn on commit-graph by default,
2019-08-13)
3. 31b1de6a09b (commit-graph: turn on commit-graph by default,
2019-08-13)
4. ad0fb659993 (repo-settings: parse core.untrackedCache,
2019-08-13)
5. aaf633c2ad1 (repo-settings: create feature.experimental setting,
2019-08-13)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change tweak_untracked_cache() in "read-cache.c" to use a switch() to
have the compiler assert that we checked all possible values in the
"enum untracked_cache_setting" type, and likewise remove the "default"
case in fetch_negotiator_init() in favor of checking for
"FETCH_NEGOTIATION_UNSET" and "FETCH_NEGOTIATION_NONE".
As will be discussed in a subsequent we'll only ever have either of
these set to FETCH_NEGOTIATION_NONE, FETCH_NEGOTIATION_UNSET and
UNTRACKED_CACHE_UNSET within the prepare_repo_settings() function
itself. In preparation for fixing that code let's add a BUG() here to
mark this as unreachable code.
See ad0fb659993 (repo-settings: parse core.untrackedCache, 2019-08-13)
for when the "unset" and "keep" handling for core.untrackedCache was
consolidated, and aaf633c2ad1 (repo-settings: create
feature.experimental setting, 2019-08-13) for the addition of the
"default" pattern in "fetch-negotiator.c".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
split index feature and trigger index splitting (mostly) randomly.
Alas, this has been broken since 6e37c8ed3c (read-cache.c: fix writing
"link" index ext with null base oid, 2019-02-13), and
GIT_TEST_SPLIT_INDEX=1 hasn't triggered any index splitting since
then.
This patch makes GIT_TEST_SPLIT_INDEX work again, though it doesn't
restore the pre-6e37c8ed3c behavior. To understand the bug, the fix,
and the behavior change we first have to look at how
GIT_TEST_SPLIT_INDEX used to work before 6e37c8ed3c:
There are two places where we check the value of
GIT_TEST_SPLIT_INDEX, and before 6e37c8ed3c they worked like this:
1) In the lower-level do_write_index(), where, if
GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index().
This call merely allocates and zero-initializes
'istate->split_index', but does nothing else (i.e. doesn't fill
the base/shared index with cache entries, doesn't actually
write a shared index file, etc.). Pertinent to this issue, the
hash of the base index remains all zeroed out.
2) In the higher-level write_locked_index(), but only when
'istate->split_index' has already been initialized. Then, if
GIT_TEST_SPLIT_INDEX is enabled, it randomly sets the flag that
triggers index splitting later in this function. This
randomness comes from the first byte of the hash of the base
index via an 'if ((first_byte & 15) < 6)' condition.
However, if 'istate->split_index' hasn't been initialized (i.e.
it is still NULL), then write_locked_index() just calls
do_write_locked_index(), which internally calls the above
mentioned do_write_index().
This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index
splitting randomly, the first two index writes were always
deterministic (though I suspect this was unintentional):
- The initial index write never splits the index.
During the first index write write_locked_index() is called with
'istate->split_index' still uninitialized, so the check in 2) is
not executed. It still calls do_write_index(), though, which
then executes the check in 1). The resulting all zero base
index hash then leads to the 'link' extension being written to
'.git/index', though a shared index file is not written:
$ rm .git/index
$ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
$ test-tool dump-split-index .git/index
own c6ef71168597caec8553c83d9d0048f1ef416170
base 0000000000000000000000000000000000000000
100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file
replacements:
deletions:
$ ls -l .git/sharedindex.*
ls: cannot access '.git/sharedindex.*': No such file or directory
- The second index write always splits the index.
When the index written in the previous point is read,
'istate->split_index' is initialized because of the presence of
the 'link' extension. So during the second write
write_locked_index() does run the check in 2), and the first
byte of the all zero base index hash always fulfills the
randomness condition, which in turn always triggers the index
splitting.
- Subsequent index writes will find the 'link' extension with a
real non-zero base index hash, so from then on the check in 2)
is executed and the first byte of the base index hash is as
random as it gets (coming from the SHA-1 of index data including
timestamps and inodes...).
All this worked until 6e37c8ed3c came along, and stopped writing the
'link' extension if the hash of the base index was all zero:
$ rm .git/index
$ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
$ test-tool dump-split-index .git/index
own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7
not a split index
$ ls -l .git/sharedindex.*
ls: cannot access '.git/sharedindex.*': No such file or directory
So, since the first index write with GIT_TEST_SPLIT_INDEX=1 doesn't
write a 'link' extension, in the second index write
'istate->split_index' remains uninitialized, and the check in 2) is
not executed, and ultimately the index is never split.
Fix this by modifying write_locked_index() to make sure to check
GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still
uninitialized, and initialize it if necessary. The check for
GIT_TEST_SPLIT_INDEX and separate init_split_index() call in
do_write_index() thus becomes unnecessary, so remove it. Furthermore,
add a test to 't1700-split-index.sh' to make sure that
GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the
index splitting on the first index write, because after that it will
be random).
Note that this change does not restore the pre-6e37c8ed3c behaviour,
as it will deterministically split the index already on the first
index write. Since GIT_TEST_SPLIT_INDEX is purely a developer aid,
there is no backwards compatibility issue here. The new behaviour did
trigger test failures in 't0003-attributes.sh' and 't1600-index.sh',
though, which have been fixed in preparatory patches in this series.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When reading a split index git always looks for its referenced shared
base index in the gitdir of the current repository, even when reading
an alternate index specified via GIT_INDEX_FILE, and even when that
alternate index file is the "main" '.git/index' file of an other
repository. However, if that split index and its referenced shared
index files were written by a git command running entirely in that
other repository, then, naturally, the shared index file is written to
that other repository's gitdir. Consequently, a git command
attempting to read that shared index file while running in a different
repository won't be able find it and will error out.
I'm not sure in what use case it is necessary to read the index of one
repository by a git command running in a different repository, but it
is certainly possible to do so, and in fact the test 'bare repository:
check that --cached honors index' in 't0003-attributes.sh' does
exactly that. If GIT_TEST_SPLIT_INDEX=1 were to split the index in
just the right moment [1], then this test would indeed fail, because
the referenced shared index file could not be found.
Let's look for the referenced shared index file not only in the gitdir
of the current directory, but, if the shared index is not there, right
next to the split index as well.
[1] We haven't seen this issue trigger a failure in t0003 yet,
because:
- While GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index
splitting randomly, the first index write has always been
deterministic and it has never split the index.
- That alternate index file in the other repository is written
only once in the entire test script, so it's never split.
However, the next patch will fix GIT_TEST_SPLIT_INDEX, and while
doing so it will slightly change its behavior to always split the
index already on the first index write, and t0003 would always
fail without this patch.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The convert_to_sparse() method checks for the GIT_TEST_SPARSE_INDEX
environment variable or the "index.sparse" config setting before
converting the index to a sparse one. This is for ease of use since all
current consumers are preparing to compress the index before writing it
to disk. If these settings are not enabled, then convert_to_sparse()
silently returns without doing anything.
We will add a consumer in the next change that wants to use the sparse
index as an in-memory data structure, regardless of whether the on-disk
format should be sparse.
To that end, create the SPARSE_INDEX_MEMORY_ONLY flag that will skip
these config checks when enabled. All current consumers are modified to
pass '0' in the new 'flags' parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leak plugging.
* ah/plugleaks:
reset: clear_unpack_trees_porcelain to plug leak
builtin/rebase: fix options.strategy memory lifecycle
builtin/merge: free found_ref when done
builtin/mv: free or UNLEAK multiple pointers at end of cmd_mv
convert: release strbuf to avoid leak
read-cache: call diff_setup_done to avoid leak
ref-filter: also free head for ATOM_HEAD to avoid leak
diffcore-rename: move old_dir/new_dir definition to plug leak
builtin/for-each-repo: remove unnecessary argv copy to plug leak
builtin/submodule--helper: release unused strbuf to avoid leak
environment: move strbuf into block to plug leak
fmt-merge-msg: free newly allocated temporary strings when done
|
|
"git read-tree" had a codepath where blobs are fetched one-by-one
from the promisor remote, which has been corrected to fetch in bulk.
* jt/bulk-prefetch:
cache-tree: prefetch in partial clone read-tree
unpack-trees: refactor prefetching code
|
|
"git status" codepath learned to work with sparsely populated index
without hydrating it fully.
* ds/status-with-sparse-index:
t1092: document bad sparse-checkout behavior
fsmonitor: integrate with sparse index
wt-status: expand added sparse directory entries
status: use sparse-index throughout
status: skip sparse-checkout percentage with sparse-index
diff-lib: handle index diffs with sparse dirs
dir.c: accept a directory as part of cone-mode patterns
unpack-trees: unpack sparse directory entries
unpack-trees: rename unpack_nondirectories()
unpack-trees: compare sparse directories correctly
unpack-trees: preserve cache_bottom
t1092: add tests for status/add and sparse files
t1092: expand repository data shape
t1092: replace incorrect 'echo' with 'cat'
sparse-index: include EXTENDED flag when expanding
sparse-index: skip indexes with unmerged entries
|
|
repo_diff_setup() calls through to diff.c's static prep_parse_options(),
which in turn allocates a new array into diff_opts.parseopts.
diff_setup_done() is responsible for freeing that array, and has the
benefit of verifying diff_opts too - hence we add a call to
diff_setup_done() to avoid leaking parseopts.
Output from the leak as found while running t0090 with LSAN:
Direct leak of 7120 byte(s) in 1 object(s) allocated from:
#0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0xa8bf89 in do_xmalloc wrapper.c:41:8
#2 0x7a7bae in prep_parse_options diff.c:5636:2
#3 0x7a7bae in repo_diff_setup diff.c:4611:2
#4 0x93716c in repo_index_has_changes read-cache.c:2518:3
#5 0x872233 in unclean merge-ort-wrappers.c:12:14
#6 0x872233 in merge_ort_recursive merge-ort-wrappers.c:53:6
#7 0x5d5b11 in try_merge_strategy builtin/merge.c:752:12
#8 0x5d0b6b in cmd_merge builtin/merge.c:1666:9
#9 0x4ce83e in run_builtin git.c:475:11
#10 0x4ccafe in handle_builtin git.c:729:3
#11 0x4cb01c in run_argv git.c:818:4
#12 0x4cb01c in cmd_main git.c:949:19
#13 0x6bdc2d in main common-main.c:52:11
#14 0x7f551eb51349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 7120 byte(s) leaked in 1 allocation(s)
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor the prefetching code in unpack-trees.c into its own function,
because it will be used elsewhere in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Error message update.
* ew/mmap-failures:
xmmap: inform Linux users of tuning knobs on ENOMEM
|
|
By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().
In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.
This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.
The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:
Test HEAD~1 HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9%
Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.
Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* ab/progress-cleanup:
read-cache.c: don't guard calls to progress.c API
|
|
Linux users may benefit from additional information on how to
avoid ENOMEM from mmap despite the system having enough RAM to
accomodate them. We can't reliably unmap pack windows to work
around the issue since malloc and other library routines may
mmap without our knowledge.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Don't guard the calls to the progress.c API with "if (progress)". The
API itself will check this. This doesn't change any behavior, but
makes this code consistent with the rest of the codebase.
See ae9af12287b (status: show progress bar if refreshing the index
takes too long, 2018-09-15) for the commit that added the pattern
we're changing here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These methods were marked as MAYBE_UNUSED in the previous change to
avoid a complicated diff. Delete them entirely, since we now use the
hashfile API instead of this custom hashing code.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The do_write_index() method in read-cache.c has its own hashing logic
and buffering mechanism. Specifically, the ce_write() method was
introduced by 4990aadc (Speed up index file writing by chunking it
nicely, 2005-04-20) and similar mechanisms were introduced a few months
later in c38138cd (git-pack-objects: write the pack files with a SHA1
csum, 2005-06-26). Based on the timing, in the early days of the Git
codebase, I figured that these roughly equivalent code paths were never
unified only because it got lost in the shuffle. The hashfile API has
since been used extensively in other file formats, such as pack-indexes,
multi-pack-indexes, and commit-graphs. Therefore, it seems prudent to
unify the index writing code to use the same mechanism.
I discovered this disparity while trying to create a new index format
that uses the chunk-format API. That API uses a hashfile as its base, so
it is incompatible with the custom code in read-cache.c.
This rewrite is rather straightforward. It replaces all writes to the
temporary file with writes to the hashfile struct. This takes care of
many of the direct interactions with the_hash_algo.
There are still some git_hash_ctx uses remaining: the extension headers
are hashed for use in the End of Index Entries (EOIE) extension. This
use of the git_hash_ctx is left as-is. There are multiple reasons to not
use a hashfile here, including the fact that the data is not actually
writing to a file, just a hash computation. These hashes do not block
our adoption of the chunk-format API in a future change to the index, so
leave it as-is.
The internals of the algorithms are mostly identical. Previously, the
hashfile API used a smaller 8KB buffer instead of the 128KB buffer from
read-cache.c. The previous change already unified these sizes.
There is one subtle point: we do not pass the CSUM_FSYNC to the
finalize_hashfile() method, which differs from most consumers of the
hashfile API. The extra fsync() call indicated by this flag causes a
significant peformance degradation that is noticeable for quick
commands that write the index, such as "git add". Other consumers can
absorb this cost with their more complicated data structure
organization, and further writing structures such as pack-files and
commit-graphs is rarely in the critical path for common user
interactions.
Some static methods become orphaned in this diff, so I marked them as
MAYBE_UNUSED. The diff is much harder to read if they are deleted during
this change. Instead, they will be deleted in the following change.
In addition to the test suite passing, I computed indexes using the
previous binaries and the binaries compiled after this change, and found
the index data to be exactly equal. Finally, I did extensive performance
testing of "git update-index --force-write" on repos of various sizes,
including one with over 2 million paths at HEAD. These tests
demonstrated less than 1% difference in behavior. As expected, the
performance should be considered unchanged. The previous changes to
increase the hashfile buffer size from 8K to 128K ensured this change
would not create a peformance regression.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The final part of "parallel checkout".
* mt/parallel-checkout-part-3:
ci: run test round with parallel-checkout enabled
parallel-checkout: add tests related to .gitattributes
t0028: extract encoding helpers to lib-encoding.sh
parallel-checkout: add tests related to path collisions
parallel-checkout: add tests for basic operations
checkout-index: add parallel checkout support
builtin/checkout.c: complete parallel checkout support
make_transient_cache_entry(): optionally alloc from mem_pool
|
|
SHA-256 transition.
* bc/hash-transition-interop-part-1:
hex: print objects using the hash algorithm member
hex: default to the_hash_algo on zero algorithm value
builtin/pack-objects: avoid using struct object_id for pack hash
commit-graph: don't store file hashes as struct object_id
builtin/show-index: set the algorithm for object IDs
hash: provide per-algorithm null OIDs
hash: set, copy, and use algo field in struct object_id
builtin/pack-redundant: avoid casting buffers to struct object_id
Use the final_oid_fn to finalize hashing of object IDs
hash: add a function to finalize object IDs
http-push: set algorithm when reading object ID
Always use oidread to read into struct object_id
hash: add an algo member to struct object_id
|
|
"git add" and "git rm" learned not to touch those paths that are
outside of sparse checkout.
* mt/add-rm-in-sparse-checkout:
rm: honor sparse checkout patterns
add: warn when asked to update SKIP_WORKTREE entries
refresh_index(): add flag to ignore SKIP_WORKTREE entries
pathspec: allow to ignore SKIP_WORKTREE entries on index matching
add: make --chmod and --renormalize honor sparse checkouts
t3705: add tests for `git add` in sparse checkouts
add: include magic part of pathspec on --refresh error
|
|
Cygwin pathname handling fix.
* ad/cygwin-no-backslashes-in-paths:
cygwin: disallow backslashes in file names
|
|
Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
|
|
The backslash character is not a valid part of a file name on Windows.
If, in Windows, Git attempts to write a file that has a backslash
character in the filename, it will be incorrectly interpreted as a
directory separator.
This caused CVE-2019-1354 in MinGW, as this behaviour can be manipulated
to cause the checkout to write to files it ought not write to, such as
adding code to the .git/hooks directory. This was fixed by e1d911dd4c
(mingw: disallow backslash characters in tree objects' file names,
2019-09-12). However, the vulnerability also exists in Cygwin: while
Cygwin mostly provides a POSIX-like path system, it will still interpret
a backslash as a directory separator.
To avoid this vulnerability, CVE-2021-29468, extend the previous fix to
also apply to Cygwin.
Similarly, extend the test case added by the previous version of the
commit. The test suite doesn't have an easy way to say "run this test
if in MinGW or Cygwin", so add a new test prerequisite that covers both.
As well as checking behaviour in the presence of paths containing
backslashes, the existing test also checks behaviour in the presence of
paths that differ only by the presence of a trailing ".". MinGW follows
normal Windows application behaviour and treats them as the same path,
but Cygwin more closely emulates *nix systems (at the expense of
compatibility with native Windows applications) and will create and
distinguish between such paths. Gate the relevant bit of that test
accordingly.
Reported-by: RyotaK <security@ryotak.me>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the future, we'll want oidread to automatically set the hash
algorithm member for an object ID we read into it, so ensure we use
oidread instead of hashcpy everywhere we're copying a hash value into a
struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Callers to index_name_pos() or index_name_stage_pos() have a specific
path in mind. If that happens to be a path with an ancestor being a
sparse-directory entry, it can lead to unexpected results.
In the case that we did not find the requested path, check to see if the
position _before_ the inserted position is a sparse directory entry that
matches the initial segment of the input path (including the directory
separator at the end of the directory name). If so, then expand the
index to be a full index and search again. This expansion will only
happen once per index read.
Future enhancements could be more careful to expand only the necessary
sparse directory entry, but then we would have a special "not fully
sparse, but also not fully expanded" mode that could affect writing the
index to file. Since this only occurs if a specific file is requested
outside of the sparse checkout definition, this is unlikely to be a
common situation.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.
This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
refresh_index() doesn't update SKIP_WORKTREE entries, but it still
matches them against the given pathspecs, marks the matches on the
seen[] array, check if unmerged, etc. In the following patch, one caller
will need refresh_index() to ignore SKIP_WORKTREE entries entirely, so
add a flag that implements this behavior.
While we are here, also realign the REFRESH_* flags and convert the hex
values to the more natural bit shift format, which makes it easier to
spot holes.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we have a full index, then we can convert it to a sparse index by
replacing directories outside of the sparse cone with sparse directory
entries. The convert_to_sparse() method does this, when the situation is
appropriate.
For now, we avoid converting the index to a sparse index if:
1. the index is split.
2. the index is already sparse.
3. sparse-checkout is disabled.
4. sparse-checkout does not use cone mode.
Finally, we currently limit the conversion to when the
GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
config will be added in a later change.
The trickiest thing about this conversion is that we might not be able
to mark a directory as a sparse directory just because it is outside the
sparse cone. There might be unmerged files within that directory, so we
need to look for those. Also, if there is some strange reason why a file
is not marked with CE_SKIP_WORKTREE, then we should give up on
converting that directory. There is still hope that some of its
subdirectories might be able to convert to sparse, so we keep looking
deeper.
The conversion process is assisted by the cache-tree extension. This is
calculated from the full index if it does not already exist. We then
abandon the cache-tree as it no longer applies to the newly-sparse
index. Thus, this cache-tree will be recalculated in every
sparse-full-sparse round-trip until we integrate the cache-tree
extension with the sparse index.
Some Git commands use the index after writing it. For example, 'git add'
will update the index, then write it to disk, then read its entries to
report information. To keep the in-memory index in a full state after
writing, we re-expand it to a full one after the write. This is wasteful
for commands that only write the index and do not read from it again,
but that is only the case until we make those commands "sparse aware."
We can compare the behavior of the sparse-index in
t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
when operating on the 'sparse-index' repo. We can also compare the two
sparse repos directly, such as comparing their indexes (when expanded to
full in the case of the 'sparse-index' repo). We also verify that the
index is actually populated with sparse directory entries.
The 'checkout and reset (mixed)' test is marked for failure when
comparing a sparse repo to a full repo, but we can compare the two
sparse-checkout cases directly to ensure that we are not changing the
behavior when using a sparse index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The index format does not currently allow for sparse directory entries.
This violates some expectations that older versions of Git or
third-party tools might not understand. We need an indicator inside the
index file to warn these tools to not interact with a sparse index
unless they are aware of sparse directory entries.
Add a new _required_ index extension, 'sdir', that indicates that the
index may contain sparse directory entries. This allows us to continue
to use the differences in index formats 2, 3, and 4 before we create a
new index version 5 in a later change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We will mark an in-memory index_state as having sparse directory entries
with the sparse_index bit. These currently cannot exist, but we will add
a mechanism for collapsing a full index to a sparse one in a later
change. That will happen at write time, so we must first allow parsing
the format before writing it.
Commands or methods that require a full index in order to operate can
call ensure_full_index() to expand that index in-memory. This requires
parsing trees using that index's repository.
Sparse directory entries have a specific 'ce_mode' value. The macro
S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type.
This ce_mode is not possible with the existing index formats, so we don't
also verify all properties of a sparse-directory entry, which are:
1. ce->ce_mode == 0040000
2. ce->flags & CE_SKIP_WORKTREE is true
3. ce->name[ce->namelen - 1] == '/' (ends in dir separator)
4. ce->oid references a tree object.
These are all semi-enforced in ensure_full_index() to some extent. Any
deviation will cause a warning at minimum or a failure in the worst
case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
CALLOC_ARRAY() macro replaces many uses of xcalloc().
* rs/calloc-array:
cocci: allow xcalloc(1, size)
use CALLOC_ARRAY
git-compat-util.h: drop trailing semicolon from macro definition
|
|
The data structure used by fsmonitor interface was not properly
duplicated during an in-core merge, leading to use-after-free etc.
* js/fsmonitor-unpack-fix:
fsmonitor: do not forget to release the token in `discard_index()`
fsmonitor: fix memory corruption in some corner cases
|
|
In 56c6910028a (fsmonitor: change last update timestamp on the
index_state to opaque token, 2020-01-07), we forgot to adjust
`discard_index()` to release the "last-update" token: it is no longer a
64-bit number, but a free-form string that has been allocated.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Raise the buffer size used when writing the index file out from
(obviously too small) 8kB to (clearly sufficiently large) 128kB.
* ns/raise-write-index-buffer-size:
read-cache: make the index write buffer size 128K
|
|
Preliminary changes to fsmonitor integration.
* jh/fsmonitor-prework:
fsmonitor: refactor initialization of fsmonitor_last_update token
fsmonitor: allow all entries for a folder to be invalidated
fsmonitor: log FSMN token when reading and writing the index
fsmonitor: log invocation of FSMonitor hook to trace2
read-cache: log the number of scanned files to trace2
read-cache: log the number of lstat calls to trace2
preload-index: log the number of lstat calls to trace2
p7519: add trace logging during perf test
p7519: move watchman cleanup earlier in the test
p7519: fix watchman watch-list test on Windows
p7519: do not rely on "xargs -d" in test
|
|
Writing an index 8K at a time invokes the OS filesystem and caching code
very frequently, introducing noticeable overhead while writing large
indexes. When experimenting with different write buffer sizes on Windows
writing the Windows OS repo index (260MB), most of the benefit came by
bumping the index write buffer size to 64K. I picked 128K to ensure that
we're past the knee of the curve.
With this change, the time under do_write_index for an index with 3M
files goes from ~1.02s to ~0.72s.
Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Report the number of files in the working directory that were read and
their hashes verified in `refresh_index()`.
FSMonitor improves the performance of commands like `git status` by
avoiding scanning the disk for changed files. Let's measure this.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Report the total number of calls made to lstat() inside of refresh_index().
FSMonitor improves the performance of commands like `git status` by
avoiding scanning the disk for changed files. This can be seen in
`refresh_index()`. Let's measure this.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Similar to the previous commits, try to avoid peeking into the `struct
lock_file`. We also have some `struct tempfile`s -- let's avoid looking
into those as well.
Note that `do_write_index()` takes a tempfile and that when we call it,
we either have a tempfile which we can easily hand down, or we have a
lock file, from which we need to somehow obtain the internal tempfile.
So we need to leave that one instance of peeking-into. Nevertheless,
this commit leaves us not relying on exactly how the path of the
tempfile / lock file is stored internally.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
44c7e1a7e0 (mem-pool: use more standard initialization and finalization,
2020-08-15) moved the allocation of the mem-pool structure to callers.
It also added an allocation to load_cache_entries_threaded(), but for an
unrelated mem-pool. Fix that by allocating the correct one instead --
the one that is initialized two lines later.
Reported-by: Sandor Bodo-Merle <sbodomerle@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A typical memory type, such as strbuf, hashmap, or string_list can be
stored on the stack or embedded within another structure. mem_pool
cannot be, because of how mem_pool_init() and mem_pool_discard() are
written. mem_pool_init() does essentially the following (simplified
for purposes of explanation here):
void mem_pool_init(struct mem_pool **pool...)
{
*pool = xcalloc(1, sizeof(*pool));
It seems weird to require that mem_pools can only be accessed through a
pointer. It also seems slightly dangerous: unlike strbuf_release() or
strbuf_reset() or string_list_clear(), all of which put the data
structure into a state where it can be re-used after the call,
mem_pool_discard(pool) will leave pool pointing at free'd memory.
read-cache (and split-index) are the only current users of mem_pools,
and they haven't fallen into a use-after-free mistake here, but it seems
likely to be problematic for future users especially since several of
the current callers of mem_pool_init() will only call it when the
mem_pool* is not already allocated (i.e. is NULL).
This type of mechanism also prevents finding synchronization
points where one can free existing memory and then resume more
operations. It would be natural at such points to run something like
mem_pool_discard(pool...);
and, if necessary,
mem_pool_init(&pool...);
and then carry on continuing to use the pool. However, this fails badly
if several objects had a copy of the value of pool from before these
commands; in such a case, those objects won't get the updated value of
pool that mem_pool_init() overwrites pool with and they'll all instead
be reading and writing from free'd memory.
Modify mem_pool_init()/mem_pool_discard() to behave more like
strbuf_init()/strbuf_release()
or
string_list_init()/string_list_clear()
In particular: (1) make mem_pool_init() just take a mem_pool* and have
it only worry about allocating struct mp_blocks, not the struct mem_pool
itself, (2) make mem_pool_discard() free the memory that the pool was
responsible for, but leave it in a state where it can be used to
allocate more memory afterward (without the need to call mem_pool_init()
again).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
has_dir_name() has some optimizations for the case where entries are
added to an index in the correct order. They kick in if the new entry
sorts after the last one. One of them exits early if the last entry has
a longer name than the directory of the new entry. Here's its comment:
/*
* The directory prefix lines up with part of
* a longer file or directory name, but sorts
* after it, so this sub-directory cannot
* collide with a file.
*
* last: xxx/yy-file (because '-' sorts before '/')
* this: xxx/yy/abc
*/
However, a file named xxx/yy would be sorted before xxx/yy-file because
'-' sorts after NUL, so the length check against the last entry is not
sufficient to rule out a collision. Remove it.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Further tweak to a "no backslash in indexed paths" for Windows port
we applied earlier.
* js/mingw-loosen-overstrict-tree-entry-checks:
mingw: safeguard better against backslashes in file names
|
|
In 224c7d70fa1 (mingw: only test index entries for backslashes, not tree
entries, 2019-12-31), we relaxed the check for backslashes in tree
entries to check only index entries.
However, the code change was incorrect: it was added to
`add_index_entry_with_check()`, not to `add_index_entry()`, so under
certain circumstances it was possible to side-step the protection.
Besides, the description of that commit purported that all index entries
would be checked when in fact they were only checked when being added to
the index (there are code paths that do not do that, constructing
"transient" index entries).
In any case, it was pointed out in one insightful review at
https://github.com/git-for-windows/git/pull/2437#issuecomment-566771835
that it would be a much better idea to teach `verify_path()` to perform
the check for a backslash. This is safer, even if it comes with two
notable drawbacks:
- `verify_path()` cannot say _what_ is wrong with the path, therefore
the user will no longer be told that there was a backslash in the
path, only that the path was invalid.
- The `git apply` command also calls the `verify_path()` function, and
might have been able to handle Windows-style paths (i.e. with
backslashes instead of forward slashes). This will no longer be
possible unless the user (temporarily) sets `core.protectNTFS=false`.
Note that `git add <windows-path>` will _still_ work because
`normalize_path_copy_len()` will convert the backslashes to forward
slashes before hitting the code path that creates an index entry.
The clear advantage is that `verify_path()`'s purpose is to check the
validity of the file name, therefore we naturally tap into all the code
paths that need safeguarding, also implicitly into future code paths.
The benefits of that approach outweigh the downsides, so let's move the
check from `add_index_entry_with_check()` to `verify_path()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
An earlier update to Git for Windows declared that a tree object is
invalid if it has a path component with backslash in it, which was
overly strict, which has been corrected. The only protection the
Windows users need is to prevent such path (or any path that their
filesystem cannot check out) from entering the index.
* js/mingw-loosen-overstrict-tree-entry-checks:
mingw: only test index entries for backslashes, not tree entries
|
|
During a clone of a repository that contained a file with a backslash in
its name in the past, as of v2.24.1(2), Git for Windows prints errors
like this:
error: filename in tree entry contains backslash: '\'
The idea is to prevent Git from even trying to write files with
backslashes in their file names: while these characters are valid in
file names on other platforms, on Windows it is interpreted as directory
separator (which would obviously lead to ambiguities, e.g. when there is
a file `a\b` and there is also a file `a/b`).
Arguably, this is the wrong layer for that error: As long as the user
never checks out the files whose names contain backslashes, there should
not be any problem in the first place.
So let's loosen the requirements: we now leave tree entries with
backslashes in their file names alone, but we do require any entries
that are added to the Git index to contain no backslashes on Windows.
Note: just as before, the check is guarded by `core.protectNTFS` (to
allow overriding the check by toggling that config setting), and it
is _only_ performed on Windows, as the backslash is not a directory
separator elsewhere, even when writing to NTFS-formatted volumes.
An alternative approach would be to try to prevent creating files with
backslashes in their file names. However, that comes with its own set of
problems. For example, `git config -f C:\ProgramData\Git\config ...` is
a very valid way to specify a custom config location, and we obviously
do _not_ want to prevent that. Therefore, the approach chosen in this
patch would appear to be better.
This addresses https://github.com/git-for-windows/git/issues/2435
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
* maint-2.23: (44 commits)
Git 2.23.1
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
...
|
|
* maint-2.22: (43 commits)
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
...
|
|
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
|
|
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
|
|
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
|
|
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
|
|
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
|
|
* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...
|
|
* maint-2.15: (29 commits)
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
...
|
|
* maint-2.14: (28 commits)
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
...
|
|
When creating a directory on Windows whose path ends in a space or a
period (or chains thereof), the Win32 API "helpfully" trims those. For
example, `mkdir("abc ");` will return success, but actually create a
directory called `abc` instead.
This stems back to the DOS days, when all file names had exactly 8
characters plus exactly 3 characters for the file extension, and the
only way to have shorter names was by padding with spaces.
Sadly, this "helpful" behavior is a bit inconsistent: after a successful
`mkdir("abc ");`, a `mkdir("abc /def")` will actually _fail_ (because
the directory `abc ` does not actually exist).
Even if it would work, we now have a serious problem because a Git
repository could contain directories `abc` and `abc `, and on Windows,
they would be "merged" unintentionally.
As these paths are illegal on Windows, anyway, let's disallow any
accesses to such paths on that Operating System.
For practical reasons, this behavior is still guarded by the
config setting `core.protectNTFS`: it is possible (and at least two
regression tests make use of it) to create commits without involving the
worktree. In such a scenario, it is of course possible -- even on
Windows -- to create such file names.
Among other consequences, this patch disallows submodules' paths to end
in spaces on Windows (which would formerly have confused Git enough to
try to write into incorrect paths, anyway).
While this patch does not fix a vulnerability on its own, it prevents an
attack vector that was exploited in demonstrations of a number of
recently-fixed security bugs.
The regression test added to `t/t7417-submodule-path-url.sh` reflects
that attack vector.
Note that we have to adjust the test case "prevent git~1 squatting on
Windows" in `t/t7415-submodule-names.sh` because of a very subtle issue.
It tries to clone two submodules whose names differ only in a trailing
period character, and as a consequence their git directories differ in
the same way. Previously, when Git tried to clone the second submodule,
it thought that the git directory already existed (because on Windows,
when you create a directory with the name `b.` it actually creates `b`),
but with this patch, the first submodule's clone will fail because of
the illegal name of the git directory. Therefore, when cloning the
second submodule, Git will take a different code path: a fresh clone
(without an existing git directory). Both code paths fail to clone the
second submodule, both because the the corresponding worktree directory
exists and is not empty, but the error messages are worded differently.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
|
|
The config setting `core.protectNTFS` is specifically designed to work
not only on Windows, but anywhere, to allow for repositories hosted on,
say, Linux servers to be protected against NTFS-specific attack vectors.
As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated
paths (but does not do the same for paths separated by forward slashes),
under the assumption that the backslash might not be a valid directory
separator on the _current_ Operating System.
However, the two callers, `verify_path()` and `fsck_tree()`, are
supposed to feed only individual path segments to the `is_ntfs_dotgit()`
function.
This causes a lot of duplicate scanning (and very inefficient scanning,
too, as the inner loop of `is_ntfs_dotgit()` was optimized for
readability rather than for speed.
Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of
splitting the paths by backslashes as directory separators on the
callers of said function.
Consequently, the `verify_path()` function, which already splits the
path by directory separators, now treats backslashes as directory
separators _explicitly_ when `core.protectNTFS` is turned on, even on
platforms where the backslash is _not_ a directory separator.
Note that we have to repeat some code in `verify_path()`: if the
backslash is not a directory separator on the current Operating System,
we want to allow file names like `\`, but we _do_ want to disallow paths
that are clearly intended to cause harm when the repository is cloned on
Windows.
The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now
needs to look for backslashes in tree entries' names specifically when
`core.protectNTFS` is turned on. While it would be tempting to
completely disallow backslashes in that case (much like `fsck` reports
names containing forward slashes as "full paths"), this would be
overzealous: when `core.protectNTFS` is turned on in a non-Windows
setup, backslashes are perfectly valid characters in file names while we
_still_ want to disallow tree entries that are clearly designed to
exploit NTFS-specific behavior.
This simplification will make subsequent changes easier to implement,
such as turning `core.protectNTFS` on by default (not only on Windows)
or protecting against attack vectors involving NTFS Alternate Data
Streams.
Incidentally, this change allows for catching malicious repositories
that contain tree entries of the form `dir\.gitmodules` already on the
server side rather than only on the client side (and previously only on
Windows): in contrast to `is_ntfs_dotgit()`, the
`is_ntfs_dotgitmodules()` function already expects the caller to split
the paths by directory separators.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
|
|
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
CI updates.
* js/azure-pipelines-msvc:
ci: also build and test with MS Visual Studio on Azure Pipelines
ci: really use shallow clones on Azure Pipelines
tests: let --immediate and --write-junit-xml play well together
test-tool run-command: learn to run (parts of) the testsuite
vcxproj: include more generated files
vcxproj: only copy `git-remote-http.exe` once it was built
msvc: work around a bug in GetEnvironmentVariable()
msvc: handle DEVELOPER=1
msvc: ignore some libraries when linking
compat/win32/path-utils.h: add #include guards
winansi: use FLEX_ARRAY to avoid compiler warning
msvc: avoid using minus operator on unsigned types
push: do not pretend to return `int` from `die_push_simple()`
|
|
"git stash" learned to write refreshed index back to disk.
* tg/stash-refresh-index:
stash: make sure to write refreshed cache
merge: use refresh_and_write_cache
factor out refresh_and_write_cache function
|
|
MSVC complains about this with `-Wall`, which can be taken as a sign
that this is indeed a real bug. The symptom is:
C4146: unary minus operator applied to unsigned type, result
still unsigned
Let's avoid this warning in the minimal way, e.g. writing `-1 -
<unsigned value>` instead of `-<unsigned value> - 1`.
Note that the change in the `estimate_cache_size()` function is
needed because MSVC considers the "return type" of the `sizeof()`
operator to be `size_t`, i.e. unsigned, and therefore it cannot be
negated using the unary minus operator.
Even worse, that arithmetic is doing extra work, in vain. We want to
calculate the entry extra cache size as the difference between the
size of the `cache_entry` structure minus the size of the
`ondisk_cache_entry` structure, padded to the appropriate alignment
boundary.
To that end, we start by assigning that difference to the `per_entry`
variable, and then abuse the `len` parameter of the
`align_padding_size()` macro to take the negative size of the ondisk
entry size. Essentially, we try to avoid passing the already calculated
difference to that macro by passing the operands of that difference
instead, when the macro expects operands of an addition:
#define align_padding_size(size, len) \
((size + (len) + 8) & ~7) - (size + len)
Currently, we pass A and -B to that macro instead of passing A - B and
0, where A - B is already stored in the `per_entry` variable, ready to
be used.
This is neither necessary, nor intuitive. Let's fix this, and have code
that is both easier to read and that also does not trigger MSVC's
warning.
While at it, we take care of reporting overflows (which are unlikely,
but hey, defensive programming is good!).
We _also_ take pains of casting the unsigned value to signed: otherwise,
the signed operand (i.e. the `-1`) would be cast to unsigned before
doing the arithmetic.
Helped-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Getting the lock for the index, refreshing it and then writing it is a
pattern that happens more than once throughout the codebase, and isn't
trivial to get right. Factor out the refresh_and_write_cache function
from builtin/am.c to read-cache.c, so it can be re-used in other
places in a subsequent commit.
Note that we return different error codes for failing to refresh the
cache, and failing to write the index. The current caller only cares
about failing to write the index. However for other callers we're
going to convert in subsequent patches we will need this distinction.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A mechanism to affect the default setting for a (related) group of
configuration variables is introduced.
* ds/feature-macros:
repo-settings: create feature.experimental setting
repo-settings: create feature.manyFiles setting
repo-settings: parse core.untrackedCache
commit-graph: turn on commit-graph by default
t6501: use 'git gc' in quiet mode
repo-settings: consolidate some config settings
|
|
The core.untrackedCache config setting is slightly complicated,
so clarify its use and centralize its parsing into the repo
settings.
The default value is "keep" (returned as -1), which persists the
untracked cache if it exists.
If the value is set as "false" (returned as 0), then remove the
untracked cache if it exists.
If the value is set as "true" (returned as 1), then write the
untracked cache and persist it.
Instead of relying on magic values of -1, 0, and 1, split these
options into an enum. This allows the use of "-1" as a
default value. After parsing the config options, if the value is
unset we can initialize it to UNTRACKED_CACHE_KEEP.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are a few important config settings that are not loaded
during git_default_config. These are instead loaded on-demand.
Centralize these config options to a single scan, and store
all of the values in a repo_settings struct. The values for
each setting are initialized as negative to indicate "unset".
This centralization will be particularly important in a later
change to introduce "meta" config settings that change the
defaults for these config settings.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up to avoid signed integer overlaps during binary search.
* rs/avoid-overflow-in-midpoint-computation:
cleanup: fix possible overflow errors in binary search, part 2
|
|
Clean-up an error codepath.
* vn/xmmap-gently:
read-cache.c: do not die if mmap fails
|
|
Clean-up an error codepath.
* vn/xmmap-gently:
read-cache.c: do not die if mmap fails
|
|
do_read_index() mmaps the index, or tries to die with an error message
on failure. It should call xmmap_gently(), which returns MAP_FAILED,
rather than xmmap(), which dies with its own error message.
An easy way to cause this mmap to fail is by setting $GIT_INDEX_FILE to
a path to a directory and then invoking any command that reads from the
index.
Signed-off-by: Varun Naik <vcnaik94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up to avoid signed integer overlaps during binary search.
* rs/avoid-overflow-in-midpoint-computation:
cleanup: fix possible overflow errors in binary search, part 2
|
|
The data collected by fsmonitor was not properly written back to
the on-disk index file, breaking t7519 tests occasionally, which
has been corrected.
* js/fsmonitor-unflake:
mark_fsmonitor_valid(): mark the index as changed if needed
fill_stat_cache_info(): prepare for an fsmonitor fix
|
|
* jk/unused-params-final-batch:
verify-commit: simplify parameters to run_gpg_verify()
show-branch: drop unused parameter from show_independent()
rev-list: drop unused void pointer from finish_commit()
remove_all_fetch_refspecs(): drop unused "remote" parameter
receive-pack: drop unused "commands" from prepare_shallow_update()
pack-objects: drop unused rev_info parameters
name-rev: drop unused parameters from is_better_name()
mktree: drop unused length parameter
wt-status: drop unused status parameter
read-cache: drop unused parameter from threaded load
clone: drop dest parameter from copy_alternates()
submodule: drop unused prefix parameter from some functions
builtin: consistently pass cmd_* prefix to parse_options
cmd_{read,write}_tree: rename "unused" variable that is used
|
|
Calculating the sum of two array indexes to find the midpoint between
them can overflow, i.e. code like this is unsafe for big arrays:
mid = (first + last) >> 1;
Make sure the intermediate value stays within the boundaries instead,
like this:
mid = first + ((last - first) >> 1);
The loop condition of the binary search makes sure that 'last' is
always greater than 'first', so this is safe as long as 'first' is
not negative. And that can be verified easily using the pre-context
of each change, except for name-hash.c, so add an assertion to that
effect there.
The unsafe calculations were found with:
git grep '(.*+.*) *>> *1'
This is a continuation of 19716b21a4 (cleanup: fix possible overflow
errors in binary search, 2017-10-08).
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Without this bug fix, t7519's four "status doesn't detect unreported
modifications" test cases would fail occasionally (and, oddly enough,
*a lot* more frequently on Windows).
The reason is that these test cases intentionally use the side effect of
`git status` to re-write the index if any updates were detected: they
first clean the worktree, run `git status` to update the index as well
as show the output to the casual reader, then make the worktree dirty
again and expect no changes to reported if running with a mocked
fsmonitor hook.
The problem with this strategy was that the index was written during
said `git status` on the clean worktree for the *wrong* reason: not
because the index was marked as changed (it wasn't), but because the
recorded mtimes were racy with the index' own mtime.
As the mtime granularity on Windows is 100 nanoseconds (see e.g.
https://docs.microsoft.com/en-us/windows/desktop/SysInfo/file-times),
the mtimes of the files are often enough *not* racy with the index', so
that that `git status` call currently does not always update the index
(including the fsmonitor extension), causing the test case to fail.
The obvious fix: if we change *any* index entry's `CE_FSMONITOR_VALID`
flag, we should also mark the index as changed. That will cause the
index to be written upon `git status`, *including* an updated fsmonitor
extension.
Side note: Even though the reader might think that the t7519 issue
should be *much* more prevalent on Linux, given that the ext4 filesystem
(that seems to be used by every Linux distribution) stores mtimes in
nanosecond precision. However, ext4 uses `current_kernel_time()` (see
https://unix.stackexchange.com/questions/11599#comment762968_11599; it
is *amazingly* hard to find any proper source of information about such
ext4 questions) whose accuracy seems to depend on many factors but is
safely worse than the 100-nanosecond granularity of NTFS (again, it is
*horribly* hard to find anything remotely authoritative about this
question). So it seems that the racy index condition that hid the bug
fixed by this patch simply is a lot more likely on Linux than on
Windows. But not impossible ;-)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We will need to pass down the `struct index_state` to
`mark_fsmonitor_valid()` for an upcoming bug fix, and this here function
calls that there function, so we need to extend the signature of
`fill_stat_cache_info()` first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The fsmonitor interface got out of sync after the in-core index
file gets discarded, which has been corrected.
* js/fsmonitor-refresh-after-discarding-index:
fsmonitor: force a refresh after the index was discarded
fsmonitor: demonstrate that it is not refreshed after discard_index()
|
|
A few embarrassing bugfixes.
* jh/trace2:
trace2: fix up a missing "leave" entry point
trace2: fix incorrect function pointer check
|
|
The load_cache_entries_threaded() function takes a src_offset parameter
that it doesn't use. This has been there since its inception in
77ff1127a4 (read-cache: load cache entries on worker threads,
2018-10-10).
Digging on the mailing list, that parameter was part of an earlier
iteration of the series[1], but became unnecessary when the code
switched to using the IEOT extension.
[1] https://public-inbox.org/git/20180906210227.54368-5-benpeart@microsoft.com/
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix a trivial bug that's been here since the shared/do_write_index
tracing was added in 42fee7a388 ("trace2:data: add trace2
instrumentation to index read/write", 2019-02-22). We should have
enter/leave points, not two enter/enter points. This resulted in an
"enter" event without a corresponding "leave" event.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Running "git add" on a repository created inside the current
repository is an explicit indication that the user wants to add it
as a submodule, but when the HEAD of the inner repository is on an
unborn branch, it cannot be added as a submodule. Worse, the files
in its working tree can be added as if they are a part of the outer
repository, which is not what the user wants. These problems are
being addressed.
* km/empty-repo-is-still-a-repo:
add: error appropriately on repository with no commits
dir: do not traverse repositories with no commits
submodule: refuse to add repository with no commits
|
|
With this change, the `index_state` struct becomes the new home for the
flag that says whether the fsmonitor hook has been run, i.e. it is now
per-index.
It also gets re-set when the index is discarded, fixing the bug
demonstrated by the "test_expect_failure" test added in the preceding
commit. In that case fsmonitor-enabled Git would miss updates under
certain circumstances, see that preceding commit for details.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Conversion from unsigned char[20] to struct object_id continues.
* bc/hash-transition-16: (35 commits)
gitweb: make hash size independent
Git.pm: make hash size independent
read-cache: read data in a hash-independent way
dir: make untracked cache extension hash size independent
builtin/difftool: use parse_oid_hex
refspec: make hash size independent
archive: convert struct archiver_args to object_id
builtin/get-tar-commit-id: make hash size independent
get-tar-commit-id: parse comment record
hash: add a function to lookup hash algorithm by length
remote-curl: make hash size independent
http: replace sha1_to_hex
http: compute hash of downloaded objects using the_hash_algo
http: replace hard-coded constant with the_hash_algo
http-walker: replace sha1_to_hex
http-push: remove remaining uses of sha1_to_hex
http-backend: allow 64-character hex names
http-push: convert to use the_hash_algo
builtin/pull: make hash-size independent
builtin/am: make hash size independent
...
|
|
A new hook "post-index-change" is called when the on-disk index
file changes, which can help e.g. a virtualized working tree
implementation.
* bp/post-index-change-hook:
read-cache: add post-index-change hook
|
|
The previous commit made 'git add' abort when given a repository that
doesn't have a commit checked out. However, the output upon failure
isn't appropriate:
% git add repo
warning: adding embedded git repository: repo
hint: You've added another git repository inside your current repository.
hint: [...]
error: unable to index file 'repo/'
fatal: adding files failed
The hint doesn't apply in this case, and the error message doesn't
tell the user why 'repo' couldn't be added to the index.
Provide better output by teaching add_to_index() to error when given a
git directory where HEAD can't be resolved. To avoid the embedded
repository warning and hint, call check_embedded_repo() only after
add_file_to_index() succeeds because, in general, its output doesn't
make sense if adding to the index fails.
Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Index entries are structured with a variety of fields up front, followed
by a hash and one or two flags fields. Because the hash field is stored
in the middle of the structure, it's difficult to use one fixed-size
structure that easily allows access to the hash and flags fields.
Adjust the structure to hold the maximum amount of data that may be
needed using a member called "data" and read and write this field
independently in the various places that need to read and write the
structure.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A more structured way to obtain execution trace has been added.
* jh/trace2:
trace2: add for_each macros to clang-format
trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh
trace2:data: add subverb for rebase
trace2:data: add subverb to reset command
trace2:data: add subverb to checkout command
trace2:data: pack-objects: add trace2 regions
trace2:data: add trace2 instrumentation to index read/write
trace2:data: add trace2 hook classification
trace2:data: add trace2 transport child classification
trace2:data: add trace2 sub-process classification
trace2:data: add editor/pager child classification
trace2:data: add trace2 regions to wt-status
trace2: collect Windows-specific process information
trace2: create new combined trace facility
trace2: Documentation/technical/api-trace2.txt
|
|
Split-index fix.
* nd/split-index-null-base-fix:
read-cache.c: fix writing "link" index ext with null base oid
|
|
"git checkout --no-overlay" can be used to trigger a new mode of
checking out paths out of the tree-ish, that allows paths that
match the pathspec that are in the current index and working tree
and are not in the tree-ish.
* tg/checkout-no-overlay:
revert "checkout: introduce checkout.overlayMode config"
checkout: introduce checkout.overlayMode config
checkout: introduce --{,no-}overlay option
checkout: factor out mark_cache_entry_for_checkout function
checkout: clarify comment
read-cache: add invalidate parameter to remove_marked_cache_entries
entry: support CE_WT_REMOVE flag in checkout_entry
entry: factor out unlink_entry function
move worktree tests to t24*
|
|
Add trace2 events to measure reading and writing the index.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a post-index-change hook that is invoked after the index is written in
do_write_locked_index().
This hook is meant primarily for notification, and cannot affect
the outcome of git commands that trigger the index write.
The hook is passed a flag to indicate whether the working directory was
updated or not and a flag indicating if a skip-worktree bit could have
changed. These flags enable the hook to optimize its response to the
index change notification.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since commit 7db118303a (unpack_trees: fix breakage when o->src_index !=
o->dst_index - 2018-04-23) and changes in merge code to use separate
index_state for source and destination, when doing a merge with split
index activated, we may run into this line in unpack_trees():
o->result.split_index = init_split_index(&o->result);
This is by itself not wrong. But this split index information is not
fully populated (and it's only so when move_cache_to_base_index() is
called, aka force splitting the index, or loading index_state from a
file). Both "base_oid" and "base" in this case remain null.
So when writing the main index down, we link to this index with null
oid (default value after init_split_index()), which also means "no split
index" internally. This triggers an incorrect base index refresh:
warning: could not freshen shared index '.../sharedindex.0{40}'
This patch makes sure we will not refresh null base_oid (because the
file is never there). It also makes sure not to write "link" extension
with null base_oid in the first place (no point having it at
all). Read code already has protection against null base_oid.
There is also another side fix in remove_split_index() that causes a
crash when doing "git update-index --no-split-index" when base_oid in
the index file is null. In this case we will not load
istate->split_index->base but we dereference it anyway and are rewarded
with a segfault. This should not happen anymore, but it's still wrong to
dereference a potential NULL pointer, especially when we do check for
NULL pointer in the next code.
Reported-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A hotfix to an incomplete fix made earlier.
* jk/add-ignore-errors-bit-assignment-fix:
add_to_index(): convert forgotten HASH_RENORMALIZE check
|
|
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.
* nd/the-index-final:
cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
read-cache.c: remove the_* from index_has_changes()
merge-recursive.c: remove implicit dependency on the_repository
merge-recursive.c: remove implicit dependency on the_index
sha1-name.c: remove implicit dependency on the_index
read-cache.c: replace update_index_if_able with repo_&
read-cache.c: kill read_index()
checkout: avoid the_index when possible
repository.c: replace hold_locked_index() with repo_hold_locked_index()
notes-utils.c: remove the_repository references
grep: use grep_opt->repo instead of explict repo argument
|
|
Commit 9e5da3d055 (add: use separate ADD_CACHE_RENORMALIZE flag,
2019-01-17) switched out using HASH_RENORMALIZE in our flags field for a
new ADD_CACHE_RENORMALIZE flag. However, it forgot to convert one of the
checks for HASH_RENORMALIZE into the new flag, which totally broke "git
add --renormalize".
To make matters even more confusing, the resulting code would racily
pass the tests! The forgotten check was responsible for defeating the
up-to-date check of the index entry. That meant that "git add
--renormalize" would refuse to renormalize things that appeared
stat-clean. But most of the time the test commands run fast enough that
the file mtime is the same as the index mtime. And thus we err on the
conservative side and re-hash the file, which is what "--renormalize"
would have wanted.
But if you're unlucky and cross that one-second boundary between writing
the file and writing the index (which is more likely to happen on a slow
or heavily-loaded system), then the file appears stat-clean. And
"--renormalize" would effectively do nothing.
The fix is straightforward: convert this check to use the right flag.
Noticed-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git add --ignore-errors" did not work as advertised and instead
worked as an unintended synonym for "git add --renormalize", which
has been fixed.
* jk/add-ignore-errors-bit-assignment-fix:
add: use separate ADD_CACHE_RENORMALIZE flag
|
|
By default, index compat macros are off from now on, because they
could hide the_index dependency.
Only those in builtin can use it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 9472935d81 (add: introduce "--renormalize", 2017-11-16) taught
git-add to pass HASH_RENORMALIZE to add_to_index(), which then passes
the flag along to index_path(). However, the flags taken by
add_to_index() and the ones taken by index_path() are distinct
namespaces. We cannot take HASH_* flags in add_to_index(), because they
overlap with the ADD_CACHE_* flags we already take (in this case,
HASH_RENORMALIZE conflicts with ADD_CACHE_IGNORE_ERRORS).
We can solve this by adding a new ADD_CACHE_RENORMALIZE flag, and using
it to set HASH_RENORMALIZE within add_to_index(). In order to make it
clear that these two flags come from distinct sets, let's also change
the name "newflags" in the function to "hash_flags".
Reported-by: Dmitriy Smirnov <dmitriy.smirnov@jetbrains.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|