Age | Commit message (Collapse) | Author | Files | Lines |
|
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
|
|
There is an error message in that function to report a missing tree; In
contrast to three other, similar error messages, it is not marked for
translation yet.
Mark it for translation, and while at it, make the error message
consistent with the others by enclosing the SHA in parentheses.
This requires a change to t6030 which expects the previous format of the
commit message. Theoretically, this could present problems with existing
scripts that use `git bisect` and parse its output (because Git does not
provide other means for callers to discern between error conditions).
However, this is unlikely to matter in practice because the most common
course of action to deal with fatal corruptions is to report the error
message to the user and exit, rather than trying to do something with
the reported SHA of the missing tree.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.
Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.
Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.
Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
builtin/fast-import.c and tree-walk.c have almost identical version of
get_mode. The two functions started out the same but have diverged
slightly. The version in fast-import changed mode to a uint16_t to
save memory. The version in tree-walk started erroring if no mode was
present.
As far as I can tell both of these changes are valid for both of the
callers, so add the both changes and place the common parsing helper
in object.h
Rename the helper from get_mode to parse_mode so it does not
conflict with another helper named get_mode in diff-no-index.c
This will be used shortly in a new helper decode_tree_entry_raw
which is used to compute cmpatibility objects as part of
the sha256 transition.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The tree-walk.c code walks trees recursively, and may run out of stack
space. The easiest way to see this is with git-archive; on my 64-bit
Linux system it runs out of stack trying to generate a tarfile with a
tree depth of 13,772.
I've picked 4100 as the depth for our "big" test. I ran it with a much
higher value to confirm that we do get a segfault without this patch.
But really anything over 4096 is sufficient for its stated purpose,
which is to find out if our default limit of 4096 is low enough to
prevent segfaults on all platforms. Keeping it small saves us time on
the test setup.
The tree-walk code that's touched here underlies unpack_trees(), so this
protects any programs which use it, not just git-archive (but archive is
easy to test, and was what alerted me to this issue in a real-world
case).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "error" variable in traverse_trees() shadows the global error()
function (meaning we can't call error() from here). Let's call the local
variable "ret" instead, which matches the idiom in other functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The traverse_trees() and traverse_trees_recursive() functions call each
other recursively. In a deep tree, this can result in running out of
stack space and crashing.
There's obviously going to be some limit here based on available stack,
but the problem is exacerbated by a few large structs, many of which we
over-allocate. For example, in traverse_trees() we store a name_entry
and tree_desc_x per tree, both of which contain an object_id (which is
now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even
though many traversals will only look at 1 or 2.
Interestingly, we used to allocate these on the heap, prior to
8dd40c0472 (traverse_trees(): use stack array for name entries,
2020-01-30). That commit was trying to simplify away allocation size
computations, and naively assumed that the sizes were small enough not
to matter. And they don't in normal cases, but on my stock Debian system
I see a crash running "git archive" on a tree with ~3600 entries.
That's deep enough we wouldn't see it in practice, but probably shallow
enough that we'd prefer not to make it a hard limit. Especially because
other systems may have even smaller stacks.
We can replace these stack variables with a few malloc invocations. This
reduces the stack sizes for the two functions from 1128 and 752 bytes,
respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on
my machine (the improvement isn't in linear proportion because my
numbers don't count the size of parameters and other function overhead).
The possible downsides are:
1. We now have to remember to free(). But both functions have an easy
single exit (and already had to clean up other bits anyway).
2. The extra malloc()/free() overhead might be measurable. I tested
this by setting up a 3000-depth tree with a single blob and running
"git archive" on it. After switching to the heap, it consistently
runs 2-3% faster! Presumably this is because the 1K+ of wasted
stack space penalized memory caches.
On a more real-world case like linux.git, the speed difference isn't
measurable at all, simply because most trees aren't that deep and
there's so much other work going on (like accessing the objects
themselves). So the improvement I saw should be taken as evidence that
we're not making anything worse, but isn't really that interesting on
its own. The main motivation here is that we're now less likely to run
out of stack space and crash.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code simplification.
* jc/tree-walk-drop-base-offset:
tree-walk: drop unused base_offset from do_match()
tree-walk: lose base_offset that is never used in tree_entry_interesting
|
|
The tree-walk.c:do_match() function takes base_offset but just like
tree_entry_interesting() we dealt with earlier, nobody passes a
value other than 0 in it. Get rid of the parameter to avoid having
to worry about potential bugs lurking unexercised.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The tree_entry_interesting() function takes base_offset, allowing
its callers to potentially pass a non-zero number to skip the early
part of the path string.
The feature is never exercised and we do not even know what bugs are
lurking there, as all callers pass 0 to the parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When using init_tree_desc() and tree_entry() to iterate over a tree, we
always canonicalize the modes coming out of the tree. This is a good
thing to prevent bugs or oddities in normal code paths, but it's
counter-productive for tools like fsck that want to see the exact
contents.
We can address this by adding an option to avoid the extra
canonicalization. A few notes on the implementation:
- I've attached the new option to the tree_desc struct itself. The
actual code change is in decode_tree_entry(), which is in turn
called by the public update_tree_entry(), tree_entry(), and
init_tree_desc() functions, plus their "gently" counterparts.
By letting it ride along in the struct, we can avoid changing the
signature of those functions, which are called many times. Plus it's
conceptually simpler: you really want a particular iteration of a
tree to be "raw" or not, rather than individual calls.
- We still have to set the new option somewhere. The struct is
initialized by init_tree_desc(). I added the new flags field only to
the "gently" version. That avoids disturbing the much more numerous
non-gentle callers, and it makes sense that anybody being careful
about looking at raw modes would also be careful about bogus trees
(i.e., the caller will be something like fsck in the first place).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the read_object_with_reference() function to take an "enum
object_type". It was not prepared to handle an arbitrary "const
char *type", as it was itself calling type_from_string().
Let's change the only caller that passes in user data to use
type_from_string(), and convert the rest to use e.g. "OBJ_TREE"
instead of "tree_type".
The "cat-file" caller is not on the codepath that
handles"--allow-unknown", so the type_from_string() there is safe. Its
use of type_from_string() doesn't functionally differ from that of the
pre-image.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the future, we'll want oidread to automatically set the hash
algorithm member for an object ID we read into it, so ensure we use
oidread instead of hashcpy everywhere we're copying a hash value into a
struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The traverse_trees() method recursively walks through trees, but also
prunes the tree-walk based on a callback. Some callers, such as
unpack_trees(), are quite complicated and can have wildly different
performance between two different commands.
Create constants that count these values and then report the results at
the end of a process. These counts are cumulative across multiple "root"
instances of traverse_trees(), but they provide reproducible values for
demonstrating improvements to the pruning algorithm when possible.
This change is modeled after a similar statistics reporting in 42e50e78
(revision.c: add trace2 stats around Bloom filter usage, 2020-04-06).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Submodules should be handled the same as regular directories with
respect to the presence of a trailing slash, i.e. commands like:
git diff rev1 rev2 -- $path
git rev-list HEAD -- $path
should produce the same output whether $path is 'submod' or 'submod/'.
This has been fixed in commit 74b4f7f277 (tree-walk.c: ignore trailing
slash on submodule in tree_entry_interesting(), 2014-01-23).
Unfortunately, that commit had the unintended side effect to handle
'submod/anything' the same as 'submod' and 'submod/' as well, e.g.:
$ git log --oneline --name-only -- sha1collisiondetection/whatever
4125f78222 sha1dc: update from upstream
sha1collisiondetection
07a20f569b Makefile: fix unaligned loads in sha1dc with UBSan
sha1collisiondetection
23e37f8e9d sha1dc: update from upstream
sha1collisiondetection
86cfd61e6b sha1dc: optionally use sha1collisiondetection as a submodule
sha1collisiondetection
Fix this by rejecting submodules as partial pathnames when their
trailing slash is followed by anything.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The unpack-trees API depends on the tree-walk API. But we've recently
introduced a dependency in tree-walk.c on MAX_UNPACK_TREES, which
doesn't otherwise care about unpack-trees at all.
Let's break that dependency by reversing the constants: we'll introduce
a new MAX_TRAVERSE_TREES which belongs to the tree-walk API. And then we
can define MAX_UNPACK_TREES in terms of that (since unpack-trees cannot
possibly work with more trees than it can traverse at once via
tree-walk).
The value for both will remain at 8. This is somewhat arbitrary and
probably more than is necessary, per ca885a4fe6 (read-tree() and
unpack_trees(): use consistent limit, 2008-03-13), but there's not
really any pressing need to reduce it.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We heap-allocate our arrays of name_entry structs, etc, with one entry
per tree we're asked to traverse. The code does a raw multiplication in
the xmalloc() call, which I find when auditing for integer overflows
during allocation.
We could "fix" this by using ALLOC_ARRAY() instead. But as it turns out,
the maximum size of these arrays is limited at compile time:
- merge_trees() always passes in 3 trees
- unpack_trees() and its brethren never pass in more than
MAX_UNPACK_TREES
So we can simplify even further by just using a stack array and bounding
it with MAX_UNPACK_TREES. There should be no concern with overflowing
the stack, since MAX_UNPACK_TREES is only 8 and the structs themselves
are small.
Note that since we're replacing xcalloc(), we have to move one of the
NULL initializations into a loop.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
An earlier update to Git for Windows declared that a tree object is
invalid if it has a path component with backslash in it, which was
overly strict, which has been corrected. The only protection the
Windows users need is to prevent such path (or any path that their
filesystem cannot check out) from entering the index.
* js/mingw-loosen-overstrict-tree-entry-checks:
mingw: only test index entries for backslashes, not tree entries
|
|
During a clone of a repository that contained a file with a backslash in
its name in the past, as of v2.24.1(2), Git for Windows prints errors
like this:
error: filename in tree entry contains backslash: '\'
The idea is to prevent Git from even trying to write files with
backslashes in their file names: while these characters are valid in
file names on other platforms, on Windows it is interpreted as directory
separator (which would obviously lead to ambiguities, e.g. when there is
a file `a\b` and there is also a file `a/b`).
Arguably, this is the wrong layer for that error: As long as the user
never checks out the files whose names contain backslashes, there should
not be any problem in the first place.
So let's loosen the requirements: we now leave tree entries with
backslashes in their file names alone, but we do require any entries
that are added to the Git index to contain no backslashes on Windows.
Note: just as before, the check is guarded by `core.protectNTFS` (to
allow overriding the check by toggling that config setting), and it
is _only_ performed on Windows, as the backslash is not a directory
separator elsewhere, even when writing to NTFS-formatted volumes.
An alternative approach would be to try to prevent creating files with
backslashes in their file names. However, that comes with its own set of
problems. For example, `git config -f C:\ProgramData\Git\config ...` is
a very valid way to specify a custom config location, and we obviously
do _not_ want to prevent that. Therefore, the approach chosen in this
patch would appear to be better.
This addresses https://github.com/git-for-windows/git/issues/2435
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
Doc update for the mailing list archiving and nntp service.
* jk/lore-is-the-archive:
doc: replace public-inbox links with lore.kernel.org
doc: recommend lore.kernel.org over public-inbox.org
|
|
* maint-2.23: (44 commits)
Git 2.23.1
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
...
|
|
* maint-2.22: (43 commits)
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
...
|
|
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
|
|
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
|
|
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
|
|
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
|
|
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
|
|
* maint-2.15: (29 commits)
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
...
|
|
* maint-2.14: (28 commits)
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
...
|
|
The backslash character is not a valid part of a file name on Windows.
Hence it is dangerous to allow writing files that were unpacked from
tree objects, when the stored file name contains a backslash character:
it will be misinterpreted as directory separator.
This not only causes ambiguity when a tree contains a blob `a\b` and a
tree `a` that contains a blob `b`, but it also can be used as part of an
attack vector to side-step the careful protections against writing into
the `.git/` directory during a clone of a maliciously-crafted
repository.
Let's prevent that, addressing CVE-2019-1354.
Note: we guard against backslash characters in tree objects' file names
_only_ on Windows (because on other platforms, even on those where NTFS
volumes can be mounted, the backslash character is _not_ a directory
separator), and _only_ when `core.protectNTFS = true` (because users
might need to generate tree objects for other platforms, of course
without touching the worktree, e.g. using `git update-index
--cacheinfo`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
|
|
Since we're now recommending lore.kernel.org (and because the
public-inbox.org domain might eventually go away), let's update our
internal references to use it, too. That future-proofs our references,
and sets the example we want people to follow.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Codepaths to walk tree objects have been audited for integer
overflows and hardened.
* jk/tree-walk-overflow:
tree-walk: harden make_traverse_path() length computations
tree-walk: add a strbuf wrapper for make_traverse_path()
tree-walk: accept a raw length for traverse_path_len()
tree-walk: use size_t consistently
tree-walk: drop oid from traverse_info
setup_traverse_info(): stop copying oid
|
|
The make_traverse_path() function isn't very careful about checking its
output buffer boundaries. In fact, it doesn't even _know_ the size of
the buffer it's writing to, and just assumes that the caller used
traverse_path_len() correctly. And even then we assume that our
traverse_info.pathlen components are all correct, and just blindly write
into the buffer.
Let's improve this situation a bit:
- have the caller pass in their allocated buffer length, which we'll
check against our own computations
- check for integer underflow as we do our backwards-insertion of
pathnames into the buffer
- check that we do not run out items in our list to traverse before
we've filled the expected number of bytes
None of these should be triggerable in practice (especially since our
switch to size_t everywhere in a previous commit), but it doesn't hurt
to check our assumptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
All but one of the callers of make_traverse_path() allocate a new heap
buffer to store the path. Let's give them an easy way to write to a
strbuf, which saves them from computing the length themselves (which is
especially tricky when they want to add to the path). It will also make
it easier for us to change the make_traverse_path() interface in a
future patch to improve its bounds-checking.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We store and manipulate the cumulative traverse_info.pathlen as an
"int", which can overflow when we are fed ridiculously long pathnames
(e.g., ones at the edge of 2GB or 4GB, even if the individual tree entry
names are smaller than that). The results can be confusing, though
after some prodding I was not able to use this integer overflow to cause
an under-allocated buffer.
Let's consistently use size_t to generate and store these, and make
sure our addition doesn't overflow.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As the previous commit shows, the presence of an oid in each level of
the traverse_info is confusing and ultimately not necessary. Let's drop
it to make it clear that it will not always be set (as well as convince
us that it's unused, and let the compiler catch any merges with other
branches that do add new uses).
Since the oid is part of name_entry, we'll actually stop embedding a
name_entry entirely, and instead just separately hold the pathname, its
length, and the mode.
This makes the resulting code slightly more verbose as we have to pass
those elements around individually. But it also makes it more clear what
each code path is going to use (and in most of the paths, we really only
care about the pathname itself).
A few of these conversions are noisier than they need to be, as they
also take the opportunity to rename "len" to "namelen" for clarity
(especially where we also have "pathlen" or "ce_len" alongside).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We assume that if setup_traverse_info() is passed a non-empty "base"
string, that string is pointing into a tree object and we can read the
object oid by skipping past the trailing NUL.
As it turns out, this is not true for either of the two calls, and we
may end up reading garbage bytes:
1. In git-merge-tree, our base string is either empty (in which case
we'd never run this code), or it comes from our traverse_path()
helper. The latter overallocates a buffer by the_hash_algo->rawsz
bytes, but then fills it with only make_traverse_path(), leaving
those extra bytes uninitialized (but part of a legitimate heap
buffer).
2. In unpack_trees(), we pass o->prefix, which is some arbitrary
string from the caller. In "git read-tree --prefix=foo", for
instance, it will point to the command-line parameter, and we'll
read 20 bytes past the end of the string.
Interestingly, tools like ASan do not detect (2) because the process
argv is part of a big pre-allocated buffer. So we're reading trash, but
it's trash that's probably part of the next argument, or the
environment.
You can convince it to fail by putting something like this at the
beginning of common-main.c's main() function:
{
int i;
for (i = 0; i < argc; i++)
argv[i] = xstrdup_or_null(argv[i]);
}
That puts the arguments into their own heap buffers, so running:
make SANITIZE=address test
will find problems when "read-tree --prefix" is used (e.g., in t3030).
Doubly interesting, even with the hackery above, this does not fail
prior to ea82b2a085 (tree-walk: store object_id in a separate member,
2019-01-15). That commit switched setup_traverse_info() to actually
copying the hash, rather than simply pointing to it. That pointer was
always pointing to garbage memory, but that commit started actually
dereferencing the bytes, which is what triggers ASan.
That also implies that nobody actually cares about reading these oid
bytes anyway (or at least no path covered by our tests). And manual
inspection of the code backs that up (I'll follow this patch with some
cleanups that show definitively this is the case, but they're quite
invasive, so it's worth doing this fix on its own).
So let's drop the bogus hashcpy(), along with the confusing oversizing
in merge-tree.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While at there, clean up the_repo usage in builtin/merge-tree.c a tiny
bit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
struct diff_filespec defines mode to be an 'unsigned short'. Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int). This caused
problems when taking addresses, so switch to unsigned short.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git cat-file --batch" reported a dangling symbolic link by
mistake, when it wanted to report that a given name is ambiguous.
* dt/cat-file-batch-ambiguous:
t1512: test ambiguous cat-file --batch and --batch-output
Do not print 'dangling' for cat-file in case of ambiguity
|
|
The code to walk tree objects has been taught that we may be
working with object names that are not computed with SHA-1.
* bc/tree-walk-oid:
cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
tree-walk: store object_id in a separate member
match-trees: use hashcpy to splice trees
match-trees: compute buffer offset correctly when splicing
tree-walk: copy object ID before use
|
|
The return values -1 and -2 from get_oid could mean two different
things, depending on whether they were from an enum returned by
get_tree_entry_follow_symlinks, or from a different code path. This
caused 'dangling' to be printed from a git cat-file in the case of an
ambiguous (-2) result.
Unify the results of get_oid* and get_tree_entry_follow_symlinks to be
one common type, with unambiguous values.
Signed-off-by: David Turner <novalis@novalis.org>
Reported-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.
Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a future commit, the pointer returned by tree_entry_extract will
point into the struct tree_desc, causing its lifetime to be bound to
that of the struct tree_desc itself. To ensure this code path keeps
working, copy the object_id into a local variable so that it lives long
enough.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This lets us use :(attr) with "git grep <tree-ish>" or "git log".
:(attr) requires another round of checking before we can declare that
a path is matched. This is done after path matching since we have lots
of optimization to take a shortcut when things don't match.
Note that if :(attr) is present, we can't return
all_entries_interesting / all_entries_not_interesting anymore because
we can't be certain about that. Not until match_pathspec_attrs() can
tell us "yes all these paths satisfy :(attr)".
Second note. Even though we walk a specific tree, we use attributes
from _worktree_ (or falling back to the index), not from .gitattributes
files on that tree. This by itself is not necessarily wrong, but the
user just have to be aware of this.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In order to support :(attr) when matching pathspec on a tree,
tree_entry_interesting() needs to take an index (because
git_check_attr() needs it). This is the preparation step for it. This
also makes it clearer what index we fall back to when looking up
attributes during an unpack-trees operation: the source index.
This also fixes revs->pruning.repo initialization that should have
been done in 2abf350385 (revision.c: remove implicit dependency on
the_index - 2018-09-21). Without it, skip_uninteresting() will
dereference a NULL pointer through this call chain
get_revision(revs)
get_revision_internal
get_revision_1
try_to_simplify_commit
rev_compare_tree
diff_tree_oid(..., &revs->pruning)
ll_diff_tree_oid
diff_tree_paths
ll_diff_tree
skip_uninteresting
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Pathspec matching against a tree object were buggy when negative
pathspec elements were involved, which has been fixed.
* nd/tree-walk-path-exclusion:
tree-walk.c: fix overoptimistic inclusion in :(exclude) matching
|
|
tree_entry_interesting() is used for matching pathspec on a tree. The
interesting thing about this function is that, because the tree
entries are known to be sorted, this function can return more than
just "yes, matched" and "no, not matched". It can also say "yes, this
entry is matched and so is the remaining entries in the tree".
This is where I made a mistake when matching exclude pathspec. For
exclude pathspec, we do matching twice, one with positive patterns and
one with negative ones, then a rule table is applied to determine the
final "include or exclude" result. Note that "matched" does not
necessarily mean include. For negative patterns, "matched" means
exclude.
This particular rule is too eager to include everything. Rule 8 says
that "if all entries are positively matched" and the current entry is
not negatively matched (i.e. not excluded), then all entries are
positively matched and therefore included. But this is not true. If
the _current_ entry is not negatively matched, it does not mean the
next one will not be and we cannot conclude right away that all
remaining entries are positively matched and can be included.
Rules 8 and 18 are now updated to be less eager. We conclude that the
current entry is positively matched and included. But we say nothing
about remaining entries. tree_entry_interesting() will be called again
for those entries where we will determine entries individually.
Reported-by: Christophe Bliard <christophe.bliard@trux.info>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Conversion from uchar[40] to struct object_id continues.
* bc/object-id:
pretty: switch hard-coded constants to the_hash_algo
sha1-file: convert constants to uses of the_hash_algo
log-tree: switch GIT_SHA1_HEXSZ to the_hash_algo->hexsz
diff: switch GIT_SHA1_HEXSZ to use the_hash_algo
builtin/merge-recursive: make hash independent
builtin/merge: switch to use the_hash_algo
builtin/fmt-merge-msg: make hash independent
builtin/update-index: simplify parsing of cacheinfo
builtin/update-index: convert to using the_hash_algo
refs/files-backend: use the_hash_algo for writing refs
sha1-name: use the_hash_algo when parsing object names
strbuf: allocate space with GIT_MAX_HEXSZ
commit: express tree entry constants in terms of the_hash_algo
hex: switch to using the_hash_algo
tree-walk: replace hard-coded constants with the_hash_algo
cache: update object ID functions for the_hash_algo
|
|
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
|
|
Remove the hard-coded 20-based values and replace them with uses of
the_hash_algo.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since the only caller of this function already uses struct object_id,
update get_tree_entry_follow_symlinks to use it in parameters and
internally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Use the_hash_algo to look up the length of our current hash instead of
hard-coding the value 20.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert read_object_with_reference to take pointers to struct object_id.
Update the internals of the function accordingly.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert get_tree_entry and find_tree_entry to take pointers to struct
object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert the internals of this function to use struct object_id. This is
one of the last remaining callers of read_sha1_file_extended that has
not been converted yet.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
An v2.12-era regression in pathspec match logic, which made it look
into submodule tree even when it is not desired, has been fixed.
* bw/pathspec-match-submodule-boundary:
pathspec: only match across submodule boundaries when requested
|
|
Commit 74ed43711fd (grep: enable recurse-submodules to work on <tree>
objects, 2016-12-16) taught 'tree_entry_interesting()' to be able to
match across submodule boundaries in the presence of wildcards. This is
done by performing literal matching up to the first wildcard and then
punting to the submodule itself to perform more accurate pattern
matching. Instead of introducing a new flag to request this behavior,
commit 74ed43711fd overloaded the already existing 'recursive' flag in
'struct pathspec' to request this behavior.
This leads to a bug where whenever any other caller has the 'recursive'
flag set as well as a pathspec with wildcards that all submodules will
be indicated as matches. One simple example of this is:
git init repo
cd repo
git init submodule
git -C submodule commit -m initial --allow-empty
touch "[bracket]"
git add "[bracket]"
git commit -m bracket
git add submodule
git commit -m submodule
git rev-list HEAD -- "[bracket]"
Fix this by introducing the new flag 'recurse_submodules' in 'struct
pathspec' and using this flag to determine if matches should be allowed
to cross submodule boundaries.
This fixes https://github.com/git-for-windows/git/issues/1371.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many codepaths have been updated to squelch -Wsign-compare
warnings.
* rj/no-sign-compare:
ALLOC_GROW: avoid -Wsign-compare warnings
cache.h: hex2chr() - avoid -Wsign-compare warnings
commit-slab.h: avoid -Wsign-compare warnings
git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
|
|
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
All callers of fill_tree_descriptor() have been converted to object_id
already, so convert that function as well. As a nice side-effect we get
rid of NULL checks in tree-diff.c, as fill_tree_descriptor() already
does them for us.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The result from "git diff" that compares two blobs, e.g. "git diff
$commit1:$path $commit2:$path", used to be shown with the full
object name as given on the command line, but it is more natural to
use the $path in the output and use it to look up .gitattributes.
* jk/diff-blob:
diff: use blob path for blob/file diffs
diff: use pending "path" if it is available
diff: use the word "path" instead of "name" for blobs
diff: pass whole pending entry in blobinfo
handle_revision_arg: record paths for pending objects
handle_revision_arg: record modes for "a..b" endpoints
t4063: add tests of direct blob diffs
get_sha1_with_context: dynamically allocate oc->path
get_sha1_with_context: always initialize oc->symlink_path
sha1_name: consistently refer to object_context as "oc"
handle_revision_arg: add handle_dotdot() helper
handle_revision_arg: hoist ".." check out of range parsing
handle_revision_arg: stop using "dotdot" as a generic pointer
handle_revision_arg: simplify commit reference lookups
handle_revision_arg: reset "dotdot" consistently
|
|
The get_sha1_with_context() function zeroes out the
oc->symlink_path strbuf, but doesn't use strbuf_init() to
set up the usual invariants (like pointing to the slopbuf).
We don't actually write to the oc->symlink_path strbuf
unless we call get_tree_entry_follow_symlinks(), and that
function does initialize it. However, readers may still look
at the zero'd strbuf.
In practice this isn't a triggerable bug. The only caller
that looks at it only does so when the mode we found is 0.
This doesn't happen for non-tree-entries (where we return
S_IFINVALID). A broken tree entry could have a mode of 0,
but canon_mode() quietly rewrites that into S_IFGITLINK.
So the "0" mode should only come up when we did indeed find
a symlink.
This is mostly just an accident of how the code happens to
work, though. Let's future-proof ourselves to make sure the
strbuf is properly initialized for all calls (it's only a
few struct member assignments, not a heap allocation).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.
When grep is provided with a <tree> object, the name of the object is
prefixed to all output. In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD` from:
HEAD:file
<commit sha1 of submodule>:sub/file
to:
HEAD:file
HEAD:sub/file
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue. Now fsck can tell the user which tree is
bad, too.
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the tree-walker runs into an error, it just calls
die(), and the message is always "corrupt tree file".
However, we are actually covering several cases here; let's
give the user a hint about what happened.
Let's also avoid using the word "corrupt", which makes it
seem like the data bit-rotted on disk. Our sha1 check would
already have found that. These errors are ones of data that
is malformed in the first place.
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In traverse_trees, we generate the complete traverse path for a
traverse_info. Later, in do_compare_entry, we used to go do a bunch
of work to compare the traverse_info to a cache_entry's name without
computing that path. But since we already have that path, we don't
need to do all that work. Instead, we can just put the generated
path into the traverse_info, and do the comparison more directly.
We copy the path because prune_traversal might mutate `base`. This
doesn't happen in any codepaths where do_compare_entry is called,
but it's better to be safe.
This makes git checkout much faster -- about 25% on Twitter's
monorepo. Deeper directory trees are likely to benefit more than
shallower ones.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new function, get_tree_entry_follow_symlinks, to tree-walk.[ch].
The function is not yet used. It will be used to implement git
cat-file --batch --follow-symlinks.
The function locates an object by path, following symlinks in the
repository. If the symlinks lead outside the repository, the function
reports this to the caller.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.
To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically. In addition, the cache_name_compare() function in
read-cache.c is nearly identical. The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.
Unify these three functions by using the implementation from
cache_name_compare(). This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This continues 4651ece8 (Switch over tree descriptors to contain a
pre-parsed entry) and moves the only rest computational part
mode = canon_mode(mode)
from tree_entry_extract() to tree entry decode phase - to
decode_tree_entry().
The reason to do it, is that canon_mode() is at least 2 conditional
jumps for regular files, and that could be noticeable should canon_mode()
be invoked several times.
That does not matter for current Git codebase, where typical tree
traversal is
while (t->size) {
sha1 = tree_entry_extract(t, &path, &mode);
...
update_tree_entry(t);
}
i.e. we do t -> sha1,path.mode "extraction" only once per entry. In such
cases, it does not matter performance-wise, where that mode
canonicalization is done - either once in tree_entry_extract(), or once
in decode_tree_entry() called by update_tree_entry() - it is
approximately the same.
But for future code, which could need to work with several tree_desc's
in parallel, it could be handy to operate on tree_desc descriptors, and
do "extracts" only when needed, or at all, access only relevant part of
it through structure fields directly.
And for such situations, having canon_mode() be done once in decode
phase is better - we won't need to pay the performance price of 2 extra
conditional jumps on every t->mode access.
So let's move mode canonicalization to decode_tree_entry(). That was the
final bit. Now after tree entry is decoded, it is fully ready and could
be accessed either directly via field, or through tree_entry_extract()
which this time got really "totally trivial".
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* as/tree-walk-fix-aggressive-short-cut:
tree_entry_interesting: match against all pathspecs
|
|
The current basedir compare aborts early in order to avoid futile
recursive searches. However, a match may still be found by another
pathspec. This can cause an error while checking out files from a branch
when using multiple pathspecs:
$ git checkout master -- 'a/*.txt' 'b/*.txt'
error: pathspec 'a/*.txt' did not match any file(s) known to git.
Signed-off-by: Andy Spencer <andy753421@gmail.com>
Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We do ignore trailing slash on a directory, so pathspec "abc/" matches
directory "abc". A submodule is also a directory. Apply the same logic
to it. This makes "git log submodule-path" and "git log submodule-path/"
produce the same output.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
|
|
The variable name "ret" sounds like the variable to be returned, but
since e6c111b4 we return error, and it is misleading.
As this variable tells us which trees in t[] array were used in the
callback function, so that this caller can know the entries in which
of the trees need advancing, "trees_used" is a better name.
Also the assignment to 0 was removed at the start of the function as
well after the "if (interesting)" block. Those are unneeded as that
variable is set to the callback return value any time we enter the
"if (interesting)" block, so we'd overwrite old values anyway.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
:(glob)path differs from plain pathspec that it uses wildmatch with
WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The
difference lies in how '*' (and '**') is processed.
With the introduction of :(glob) and :(literal) and their global
options --[no]glob-pathspecs, the user can:
- make everything literal by default via --noglob-pathspecs
--literal-pathspecs cannot be used for this purpose as it
disables _all_ pathspec magic.
- individually turn on globbing with :(glob)
- make everything globbing by default via --glob-pathspecs
- individually turn off globbing with :(literal)
The implication behind this is, there is no way to gain the default
matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get
new globbing or literal. The old fnmatch behavior is considered
deprecated and discouraged to use.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
GUARD_PATHSPEC() marks pathspec-sensitive code, basically all those
that touch anything in 'struct pathspec' except fields "nr" and
"original". GUARD_PATHSPEC() is not supposed to fail. It's mainly to
help the designers catch unsupported codepaths.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
match_pathspec_depth() and tree_entry_interesting() check max_depth
field in order to support "git grep --max-depth". The feature
activation is tied to "recursive" field, which led to some unwanted
activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec
matching in unpack_trees - 2012-01-15).
This patch decouples the activation from "recursive" field, puts it in
"magic" field instead. This makes sure that only "git grep" can
activate this feature. And because parse_pathspec knows when the
feature is not used, it does not need to sort pathspec (required for
max_depth to work correctly). A small win for non-grep cases.
Even though a new magic flag is introduced, no magic syntax is. The
magic can be only enabled by parse_pathspec() caller. We might someday
want to support ":(maxdepth:10)src." It all depends on actual use
cases.
max_depth feature cannot be enabled via init_pathspec() anymore. But
that's ok because init_pathspec() is on its way to /dev/null.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Currently we treat "*.c" and "path/to/*.c" the same way. Which means
we check all possible paths in repo against "path/to/*.c". One could
see that "path/elsewhere/foo.c" obviously cannot match "path/to/*.c"
and we only need to check all paths _inside_ "path/to/" against that
pattern.
This patch checks the leading fixed part of a pathspec against base
directory and exit early if possible. We could even optimize further
in "path/to/something*.c" case (i.e. check the fixed part against
name_entry as well) but that's more complicated and probably does not
gain us much.
-O2 build on linux-2.6, without and with this patch respectively:
$ time git rev-list --quiet HEAD -- 'drivers/*.c'
real 1m9.484s
user 1m9.128s
sys 0m0.181s
$ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c'
real 0m15.710s
user 0m15.564s
sys 0m0.107s
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When a pattern contains only a single asterisk as wildcard,
e.g. "foo*bar", after literally comparing the leading part "foo" with
the string, we can compare the tail of the string and make sure it
matches "bar", instead of running fnmatch() on "*bar" against the
remainder of the string.
-O2 build on linux-2.6, without the patch:
$ time git rev-list --quiet HEAD -- '*.c'
real 0m40.770s
user 0m40.290s
sys 0m0.256s
With the patch
$ time ~/w/git/git rev-list --quiet HEAD -- '*.c'
real 0m34.288s
user 0m33.997s
sys 0m0.205s
The above command is not supposed to be widely popular. It's chosen
because it exercises pathspec matching a lot. The point is it cuts
down matching time for popular patterns like *.c, which could be used
as pathspec in other places.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We mark pathspec with wildcards with the field use_wildcard. We
could do better by saving the length of the non-wildcard part, which
can be used for optimizations such as f9f6e2c (exclude: do strcmp as
much as possible before fnmatch - 2012-06-07).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* nd/tree-walk-enum-cleanup:
tree-walk: use enum interesting instead of integer
|
|
Commit d688cf0 (tree_entry_interesting(): give meaningful names to
return values - 2011-10-24) converts most of the tree_entry_interesting
values to the new enum, except "never_interesting". This completes the
conversion.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It's actually unlimited recursion if wildcards are active regardless
--max-depth
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is a basic code hygiene to avoid magic constants that are unnamed.
Besides, this helps extending the value later on for "interesting, but
cannot decide if the entry truely matches yet" (ie. prefix matches)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We know we will find nothing.
This incidentally squelches false warning from gcc about potentially
uninitialized usage of t.entry fields. For an empty tree, it is true that
init_tree_desc() does not call decode_tree_entry() and the tree_desc is
left uninitialized, but find_tree_entry() only calls tree_entry_extract()
that uses the tree_desc while it has more things to read from the tree, so
the uninitialized t.entry fields are never used in such a case anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
tree_entry_len() does not simply take two random arguments and return
a tree length. The two pointers must point to a tree item structure,
or struct name_entry. Passing random pointers will return incorrect
value.
Force callers to pass struct name_entry instead of two pointers (with
hope that they don't manually construct struct name_entry themselves)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* dm/tree-walk:
tree-walk: micro-optimization in tree_entry_interesting
tree-walk: drop unused parameter from match_dir_prefix
|
|
In particular, gcc complains as follows:
CC tree-walk.o
tree-walk.c: In function `traverse_trees':
tree-walk.c:347: warning: 'e' might be used uninitialized in this \
function
CC builtin/revert.o
builtin/revert.c: In function `verify_opt_mutually_compatible':
builtin/revert.c:113: warning: 'opt2' might be used uninitialized in \
this function
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the case of a wide breadth top-level tree (~2400 entries, all trees
in this case), we can see a noticeable cost in the profiler calling
strncmp() here. Most of the time we are at the base level of the
repository, so base is "" and baselen == 0, which means we will always
test true. Break out this one tiny case so we can short circuit the
strncmp() call.
Test cases are as follows. packages.git is the Arch Linux git-svn clone
of the packages repository which has the characteristics above.
Commands:
[1] packages.git, /usr/bin/time git log >/dev/null
[2] packages.git, /usr/bin/time git log -- autogen/trunk pacman/trunk wget/trunk >/dev/null
[3] linux.git, /usr/bin/time git log >/dev/null
[4] linux.git, /usr/bin/time git log -- drivers/ata drivers/uio tools >/dev/null
Results:
before after %faster
[1] 2.56 2.55 0.4%
[2] 51.82 48.66 6.5%
[3] 5.58 5.61 -0.5%
[4] 1.55 1.51 0.2%
The takeaway here is this doesn't matter in many operations, but it does
for a certain style of repository and operation where it nets a 6.5%
measured improvement. The other changes are likely not significant by
reasonable statistics methods.
Note: the measured improvement when originally submitted was ~11% (43 to
38 secs) for operation [2]. At the time, the repository had 117220
commits; it now has 137537 commits.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The traverse_trees() machinery is primarily meant for merging two (or
more) trees, and because a merge is a full tree operation, it doesn't
support any pruning with pathspec.
Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(),
2008-01-19), however, we use unpack_trees() to traverse_trees() callchain
to perform "diff-index", which could waste a lot of work traversing trees
outside the user-supplied pathspec, only to discard at the blob comparison
level in diff-lib.c::oneway_diff() which is way too late.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As the point of the last change is to allow use of strings as
literals no matter what characters are in them, "has_wildcard"
does not match what we use this field for anymore.
It is used to decide if the wildcard matching should be used, so
rename it to match the usage better.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If base is already matched, skip that part when calling
fnmatch(). This happens quite often if users start a command from
worktree's subdirectory and prefix is usually prepended to all
pathspecs.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
never_interesting optimization is disabled if there is any wildcard
pathspec, even if it only matches exactly on trees.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Suppose we have two pathspecs 'a' and 'a/b' (both are dirs) and depth
limit 1. In current code, pathspecs are checked in input order. When
'a/b' is checked against pathspec 'a', it fails depth limit and
therefore is excluded, although it should match 'a/b' pathspec.
This patch reorders all pathspecs alphabetically, then teaches
tree_entry_interesting() to check against the deepest pathspec first,
so depth limit of a shallower pathspec won't affect a deeper one.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is needed to replace pathspec_matches() in builtin/grep.c.
max_depth == -1 means infinite depth. Depth limit is only effective
when pathspec.recursive == 1. When pathspec.recursive == 0, the
behavior depends on match functions: non-recursive for
tree_entry_interesting() and recursive for match_pathspec{,_depth}
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In traversing trees, a full path is splitted into two parts: base
directory and entry. They are however quite often concatenated
whenever a full path is needed. Current code allocates a new buffer,
do two memcpy(), use it, then release.
Instead this patch turns "base" to a writable, extendable buffer. When
a concatenation is needed, the callee only needs to append "entry" to
base, use it, then truncate the entry out again. "base" must remain
unchanged before and after entering a function.
This avoids quite a bit of malloc() and memcpy().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When an error is encountered, it calls add_rejected_file() which either
- directly displays the error message and stops if in plumbing mode
(i.e. if show_all_errors is not initialized at 1)
- or stores it so that it will be displayed at the end with display_error_msgs(),
Storing the files by error type permits to have a list of files for
which there is the same error instead of having a serie of almost
identical errors.
As each bind_overlap error combines a file and an old file, a list cannot be
done, therefore, theses errors are not stored but directly displayed.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-1.6.6:
fix minor memory leak in get_tree_entry()
|
|
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
traverse_trees() is supposed to call its callback with all the matching
entries from the given trees. The current algorithm keeps a pointer to
each of the tree being traversed, and feeds the entry with the earliest
name to the callback.
This breaks down if the trees being traversed looks like this:
A B
t-1 t
t-2 u
t/a v
When we are currently looking at an entry "t-1" in tree A, and tree B has
returned "t", feeding "t" from the B and not feeding anything from A, only
because "t-1" sorts later than "t", will miss an entry for a subtree "t"
behind the current entry in tree A.
This introduces extended_entry_extract() helper function that gives what
name is expected from the tree, and implements a mechanism to look-ahead
in the tree object using it, to make sure such a case is handled sanely.
Traversal in tree A in the above example will first return "t" to match
that of B, and then the next request for an entry to A then returns "t-1".
This roughly corresponds to what Linus's "prepare for one-entry lookahead"
wanted to do, but because this does implement look ahead, t6035 and one more
test in t1012 reveal that the approach would not work without adjusting the
side that walks the index in unpack_trees() as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we make the "root" tree-walk info entry have a pathname in it, we
need to have a ->prev pointer so that compare_entry will actually notice
and traverse into the root.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This makes the traverse_trees() entry comparator routine use the more
relaxed form of name comparison that considers files and directories
with the same name identical.
We pass in a separate mask for just the directory entries, so that the
callback routine can decide (if it wants to) to only handle one or the
other type, but generally most (all?) users are expected to really want
to see the case of a name 'foo' showing up in one tree as a file and in
another as a directory at the same time.
In particular, moving 'unpack_trees()' over to use this tree traversal
mechanism requires this.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This allows the callback to return an error value, but it can also
specify which of the tree entries that it actually used up by returning
a positive mask value.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This makes the calling convention a bit less obvious, but a lot more
flexible. Instead of allocating and extending a new 'base' string, we
just link the top-most name into a linked list of the 'info' structure
when traversing a subdirectory, and we can generate the basename by
following the list.
Perhaps even more importantly, the linked list of info structures also
gives us a place to naturally save off other information than just the
directory name.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The current code can access memory outside of the tree buffer in the
case of malformed tree entries.
This patch prevents this by:
* The rest of the buffer must be at least 24 bytes (at least 1 byte
mode, 1 blank, at least one byte path name, 1 NUL, 20 bytes sha1).
* Check that the last NUL (21 bytes before the end) is present.
This ensures that strlen() and get_mode() calls stay within the
buffer.
* The mode may not be empty. We have only to reject a blank at the
begin, as the rest is handled by if (c < '0' || c > '7').
* The blank is ensured by get_mode().
* The path must contain at least one character.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This makes the tree descriptor contain a "struct name_entry" as part of
it, and it gets filled in so that it always contains a valid entry. On
some benchmarks, it improves performance by up to 15%.
That makes tree entry "extract" trivial, and means that we only actually
need to decode each tree entry just once: we decode the first one when
we initialize the tree descriptor, and each subsequent one when doing
"update_tree_entry()". In particular, this means that we don't need to
do strlen() both at extract time _and_ at update time.
Finally, it also allows more sharing of code (entry_extract(), that
wanted a "struct name_entry", just got totally trivial, along with the
"tree_entry()" function).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This removes slightly more lines than it adds, but the real reason for
doing this is that future optimizations will require more setup of the
tree descriptor, and so we want to do it in one place.
Also renamed the "desc.buf" field to "desc.buffer" just to trigger
compiler errors for old-style manual initializations, making sure I
didn't miss anything.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Since we have the "tree_entry_len()" helper function these days, and
don't need to do a full strlen(), there's no point in saving the path
length - it's just redundant information.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This is a micro-optimization that grew out of the mailing list discussion
about "strlen()" showing up in profiles.
We used to pass regular C strings around to the low-level tree walking
routines, and while this worked fine, it meant that we needed to call
strlen() on strings that the caller always actually knew the size of
anyway.
So pass the length of the string down wih the string, and avoid
unnecessary calls to strlen(). Also, when extracting a pathname from a
tree entry, use "tree_entry_len()" instead of strlen(), since the length
of the pathname is directly calculable from the decoded tree entry itself
without having to actually do another strlen().
This shaves off another ~5-10% from some loads that are very tree
intensive (notably doing commit filtering by a pathspec).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>"
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This means that
git show HEAD:
will now return HEAD^{tree}, which is logically consistent with
git show HEAD:Documentation
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
The variable named entry is allocated using malloc() and then
forgotten, it being shadowed by an automatic variable of the
same name. Fixing the array size at 3 worked so far because
the only caller of traverse_trees() needed only as much
entries. Simply remove the shadowing varaible and we're able
to traverse more than three trees and save stack space at the
same time!
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.
A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*. This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.
[jc: Splitted the patch to "master" part, to be followed by a
patch for merge-recursive.c which is not in "master" yet.
Fixed the cast in the latter hunk to combine-diff.c which was
wrong in the original.
Also converted ones left-over in combine-diff.c, diff-lib.c and
upload-pack.c ]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in
various ways. Usually the strategy that required the least changes was used.
Signed-off-by: Florian Forster <octo@verplant.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".
It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.
This allows tree traversal with
struct tree_desc desc;
struct name_entry entry;
desc.buf = tree->buffer;
desc.size = tree->size;
while (tree_entry(&desc, &entry) {
... use "entry.{path, sha1, mode, pathlen}" ...
}
which is not only shorter than writing it out in full, it's hopefully less
error prone too.
[ It's actually a tad faster too - we don't need to recalculate the entry
pathlength in both extract and update, but need to do it only once.
Also, some callers can avoid doing a "strlen()" on the result, since
it's returned as part of the name_entry structure.
However, by now we're talking just 1% speedup on "git-rev-list --objects
--all", and we're definitely at the point where tree walking is no
longer the issue any more. ]
NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.
We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Doing an oprofile run on the result of my git rev-list memory leak fixes
and tree parsing cleanups, I was surprised by the third-highest entry
being
samples % image name app name symbol name
179751 2.7163 libc-2.4.so libc-2.4.so _IO_vfscanf@@GLIBC_2.4
where that 2.7% is actually more than 5% of one CPU, because this was run
on a dual CPU setup with the other CPU just being idle.
That seems to all be from the use of 'sscanf(tree, "%o", &mode)' for the
tree buffer parsing.
So do the trivial octal parsing by hand, which also gives us where the
first space in the string is (and thus where the pathname starts) so we
can get rid of the "strchr(tree, ' ')" call too.
This brings the "git rev-list --all --objects" time down from 63 seconds
to 55 seconds on the historical kernel archive for me, so it's quite
noticeable - tree parsing is a lot of what we end up doing when following
all the objects.
[ I also see a 5% speedup on a full "git fsck-objects" on the current
kernel archive, so that sscanf() really does seem to have hurt our
performance by a surprising amount ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This replaces occurences of "blob", "commit", "tag", and "tree",
where they're really used as type specifiers, which we already
have defined global constants for.
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Introduce tree-walk.[ch] and move "struct tree_desc" and
associated functions from various places.
Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
move it to cache.h. This macro returns the canonicalized
st_mode value in the host byte order for files, symlinks and
directories -- to be compared with a tree_desc entry.
create_ce_mode(mode) in cache.h is similar but is intended to be
used for index entries (so it does not work for directories) and
returns the value in the network byte order.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|