aboutsummaryrefslogtreecommitdiffstats
path: root/diffcore-pickaxe.c
AgeCommit message (Collapse)AuthorFilesLines
2023-06-21diff.h: remove unnecessary include of oidset.hElijah Newren1-0/+1
This also made it clear that several .c files depended upon various things that oidset included, but had omitted the direct #include for those headers. Add those now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23pretty.h: move has_non_ascii() declaration from commit.hElijah Newren1-2/+2
The function is defined in pretty.c, so this moves the declaration to a more logical place. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17diffcore-pickaxe: mark unused parameters in pickaxe functionsJeff King1-2/+2
We have a virtual pickaxe_fn for handling -G versus -S pickaxe options. They need to take the same set of parameters, but of course they care about different ones (e.g., a regex -G will never use a kwset). Mark the unused ones to appease -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11xdiff-interface: replace discard_hunk_line() with a flagÆvar Arnfjörð Bjarmason1-1/+2
Remove the dummy discard_hunk_line() function added in 3b40a090fd4 (diff: avoid generating unused hunk header lines, 2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for use along with the two existing and similar XDL_EMIT_* flags. Unlike the recently amended xdiff_emit_line_fn interface which'll be called in a loop in xdl_emit_diff(), the hunk header is only emitted once. It makes more sense to pass this as a flag than provide a dummy callback because that function may be able to skip doing certain work if it knows the caller is doing nothing with the hunk header. It would be possible to do so in the case of -U0 now, but the benefit of doing so is so small that I haven't bothered. But this leaves the door open to that, and more importantly makes the API use more intuitive. The reason we're putting a flag in the gap between 1<<0 and 1<<2 is that the old 1<<1 flag was removed in 907681e940d (xdiff: drop XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe -G: don't special-case create/deleteÆvar Arnfjörð Bjarmason1-11/+1
Instead of special-casing creations and deletions let's just generate a diff for them. This logic of not running a diff under -G if we don't have both sides dates back to the original implementation of -S in 52e9578985f ([PATCH] Introducing software archaeologist's tool "pickaxe"., 2005-05-21). In the case of -S we were not working with the xdiff interface and needed to do this, but when -G was implemented in f506b8e8b5f (git log/diff: add -G<regexp> that greps in the patch text, 2010-08-23) this logic was diligently copied over. But as the performance test added earlier in this series shows, this does not make much of a difference. With: time GIT_TEST_LONG= GIT_PERF_REPEAT_COUNT=10 GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3' ./run origin/next HEAD~ HEAD -- p4209-pickaxe.sh With the HEAD~ commit being the preceding "pickaxe -G: terminate early on matching lines" we get these results. Note that it's only the -G codepaths that are relevant to this change: Test origin/next HEAD~ HEAD ----------------------------------------------------------------------------------------------------------------------------------------- 4209.1: git log -S'int main' <limit-rev>.. 0.35(0.32+0.03) 0.35(0.33+0.02) +0.0% 0.35(0.30+0.05) +0.0% 4209.2: git log -S'æ' <limit-rev>.. 0.46(0.42+0.04) 0.46(0.41+0.05) +0.0% 0.46(0.42+0.04) +0.0% 4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>.. 0.65(0.62+0.02) 0.64(0.61+0.02) -1.5% 0.64(0.60+0.04) -1.5% 4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>.. 0.52(0.45+0.06) 0.52(0.50+0.01) +0.0% 0.54(0.47+0.04) +3.8% 4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.39(0.34+0.05) 0.39(0.34+0.04) +0.0% 0.39(0.36+0.03) +0.0% 4209.6: git log -G'(int|void|null)' <limit-rev>.. 0.60(0.55+0.04) 0.58(0.54+0.03) -3.3% 0.58(0.49+0.08) -3.3% 4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>.. 0.61(0.52+0.06) 0.59(0.53+0.05) -3.3% 0.59(0.54+0.05) -3.3% 4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.61(0.51+0.07) 0.58(0.54+0.04) -4.9% 0.57(0.51+0.06) -6.6% 4209.9: git log -i -S'int main' <limit-rev>.. 0.36(0.31+0.04) 0.36(0.34+0.02) +0.0% 0.35(0.32+0.03) -2.8% 4209.10: git log -i -S'æ' <limit-rev>.. 0.36(0.33+0.03) 0.39(0.34+0.01) +8.3% 0.36(0.32+0.03) +0.0% 4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>.. 0.83(0.77+0.05) 0.82(0.77+0.05) -1.2% 0.80(0.75+0.04) -3.6% 4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>.. 0.67(0.61+0.03) 0.64(0.61+0.03) -4.5% 0.63(0.61+0.02) -6.0% 4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.40(0.37+0.02) 0.40(0.37+0.03) +0.0% 0.40(0.36+0.04) +0.0% 4209.14: git log -i -G'(int|void|null)' <limit-rev>.. 0.58(0.51+0.07) 0.59(0.52+0.06) +1.7% 0.58(0.52+0.05) +0.0% 4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>.. 0.60(0.54+0.05) 0.60(0.54+0.06) +0.0% 0.60(0.56+0.03) +0.0% 4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.58(0.51+0.06) 0.57(0.52+0.05) -1.7% 0.60(0.48+0.09) +3.4% This small simplification really doesn't buy us much now, but I've got plans to both convert the pickaxe code to using a PCREv2 backend[1] and to implement additional pickaxe modes to do custom searches through the diff[2]. Always having the diff available under -G is going to help to simplify both of those changes. 1. https://lore.kernel.org/git/20210203032811.14979-22-avarab@gmail.com/ 2. https://lore.kernel.org/git/20190424152215.16251-3-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe -G: terminate early on matching linesÆvar Arnfjörð Bjarmason1-11/+19
Solve a long-standing item for "git log -Grx" of us e.g. finding "+ str" in the diff context and noting that we had a "hit", but xdiff diligently continuing to generate and spew the rest of the diff at us. This makes use of a new "early return" xdiff interface added by preceding commits. The TODO item (or, the NEEDSWORK comment) has been there since "git log -G" was implemented. See f506b8e8b5f (git log/diff: add -G<regexp> that greps in the patch text, 2010-08-23). But now with the support added in the preceding changes to the xdiff-interface we can return early. Let's assert the behavior of that new early-return xdiff-interface by having a BUG() call here to die if it ever starts handing us needless work again. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11xdiff-interface: prepare for allowing early returnÆvar Arnfjörð Bjarmason1-3/+4
Change the function prototype of xdiff_emit_line_fn to return an "int" instead of "void". Change all of those functions to "return 0", nothing checks those return values yet, and no behavior is being changed. In subsequent commits the interface will be changed to allow early return via this new return value. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe -S: slightly optimize contains()Ævar Arnfjörð Bjarmason1-3/+10
When the "log -S<pat>" switch counts occurrences of <pat> on the pre-image and post-image of a change. As soon as we know we had e.g. 1 before and 2 now we can stop, we don't need to keep counting past 2. With this change a diff between A and B may have different performance characteristics than between B and A. That's OK in this case, since we'll emit the same output, and the effect is to make one of them better. I'm picking a check of "one" first on the assumption that it's a more common case to have files grow over time than not. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe: rename variables in has_changes() for brevityÆvar Arnfjörð Bjarmason1-3/+3
Rename the {one,two}_contains variables to c{1,2}. This will make a follow-up change easier to read. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe -S: support content with NULs under --pickaxe-regexÆvar Arnfjörð Bjarmason1-2/+2
Fix a bug in the matching routine powering -S<rx> --pickaxe-regex so that we won't abort early on content that has NULs in it. We've had a hard requirement on REG_STARTEND since 2f8952250a8 (regex: add regexec_buf() that can work on a non NUL-terminated string, 2016-09-21), but this sanity check dates back to d01d8c67828 (Support for pickaxe matching regular expressions, 2006-03-29). It wasn't needed anymore, and as the now-passing test shows, actively getting in our way. Since we always require REG_STARTEND support we do not need to stop at NULs. If we are dealing with a haystack with NUL in it. The needle may be behind that NUL. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe: assert that we must have a needle under -G or -SÆvar Arnfjörð Bjarmason1-3/+3
Assert early in diffcore_pickaxe() that we've got a needle to work with under -G and -S. This code is redundant to the check -G and -S get from parse-options.c's get_arg(), which I'm adding a test for. This check dates back to e1b161161d (diffcore-pickaxe: fix infinite loop on zero-length needle, 2007-01-25) when "git log -S" could send this code into an infinite loop. It was then later refactored in 8fa4b09fb1 (pickaxe: hoist empty needle check, 2012-10-28) into its current form, but it seemingly wasn't noticed that in the meantime a move to the parse-options.c API in dea007fb4c (diff: parse separate options like -S foo, 2010-08-05) had made it redundant. Let's retain some of the paranoia here with a BUG(), but there's no need to be checking this in the pickaxe_match() inner loop. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe: refactor function selection in diffcore-pickaxe()Ævar Arnfjörð Bjarmason1-2/+21
It's hard to read this codepath at a glance and reason about exactly what combination of -G and -S will compile either regexes or kwset, and whether we'll then dispatch to "diff_grep" or "has_changes". Then in the "--find-object" case we aren't using the callback function, but were previously passing down "has_changes". Refactor this code to exhaustively check "opts", it's now more obvious what callback function (or none) we want under what mode. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11pickaxe/style: consolidate declarations and assignmentsÆvar Arnfjörð Bjarmason1-7/+3
Refactor contains() to do its assignments at the same time that it does its declarations. This code could have been refactored in ef90ab66e8e (pickaxe: use textconv for -S counting, 2012-10-28) when a function call between the declarations and assignments was removed. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'tb/log-G-binary'Junio C Hamano1-0/+6
"git log -G<regex>" looked for a hunk in the "git log -p" patch output that contained a string that matches the given pattern. Optimize this code to ignore binary files, which by default will not show any hunk that would match any pattern (unless textconv or the --text option is in effect, that is). * tb/log-G-binary: log -G: ignore binary files
2019-01-04Merge branch 'nd/the-index'Junio C Hamano1-2/+2
More codepaths become aware of working with in-core repository instance other than the default "the_repository". * nd/the-index: (22 commits) rebase-interactive.c: remove the_repository references rerere.c: remove the_repository references pack-*.c: remove the_repository references pack-check.c: remove the_repository references notes-cache.c: remove the_repository references line-log.c: remove the_repository reference diff-lib.c: remove the_repository references delta-islands.c: remove the_repository references cache-tree.c: remove the_repository references bundle.c: remove the_repository references branch.c: remove the_repository reference bisect.c: remove the_repository reference blame.c: remove implicit dependency the_repository sequencer.c: remove implicit dependency on the_repository sequencer.c: remove implicit dependency on the_index transport.c: remove implicit dependency on the_index notes-merge.c: remove implicit dependency the_repository notes-merge.c: remove implicit dependency on the_index list-objects.c: reduce the_repository references list-objects-filter.c: remove implicit dependency on the_index ...
2018-12-26log -G: ignore binary filesThomas Braun1-0/+6
The -G<regex> option of log looks for the differences whose patch text contains added/removed lines that match regex. Currently -G looks also into patches of binary files (which according to [1]) is binary as well. This has a couple of issues: - It makes the pickaxe search slow. In a proprietary repository of the author with only ~5500 commits and a total .git size of ~300MB searching takes ~13 seconds $time git log -Gwave > /dev/null real 0m13,241s user 0m12,596s sys 0m0,644s whereas when we ignore binary files with this patch it takes ~4s $time ~/devel/git/git log -Gwave > /dev/null real 0m3,713s user 0m3,608s sys 0m0,105s which is a speedup of more than fourfold. - The internally used algorithm for generating patch text is based on xdiff and its states in [1] > The output format of the binary patch file is proprietary > (and binary) and it is basically a collection of copy and insert > commands [..] which means that the current format could change once the internal algorithm is changed as the format is not standardized. In addition the git binary patch format used for preparing patches for git apply is *different* from the xdiff format as can be seen by comparing git log -p -a commit 6e95bf4bafccf14650d02ab57f3affe669be10cf Author: A U Thor <author@example.com> Date: Thu Apr 7 15:14:13 2005 -0700 modify binary file diff --git a/data.bin b/data.bin index f414c84..edfeb6f 100644 --- a/data.bin +++ b/data.bin @@ -1,2 +1,4 @@ a a^@a +a +a^@a with git log --binary commit 6e95bf4bafccf14650d02ab57f3affe669be10cf Author: A U Thor <author@example.com> Date: Thu Apr 7 15:14:13 2005 -0700 modify binary file diff --git a/data.bin b/data.bin index f414c84bd3aa25fa07836bb1fb73db784635e24b..edfeb6f501[..] GIT binary patch literal 12 QcmYe~N@Pgn0zx1O01)N^ZvX%Q literal 6 NcmYe~N@Pgn0ssWg0XP5v which seems unexpected. To resolve these issues this patch makes -G<regex> ignore binary files by default. Textconv filters are supported and also -a/--text for getting the old and broken behaviour back. The -S<block of text> option of log looks for differences that changes the number of occurrences of the specified block of text (i.e. addition/deletion) in a file. As we want to keep the current behaviour, add a test to ensure it stays that way. [1]: http://www.xmailserver.org/xdiff.html Signed-off-by: Thomas Braun <thomas.braun@virtuell-zuhause.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13Merge branch 'jk/xdiff-interface'Junio C Hamano1-1/+2
The interface into "xdiff" library used to discover the offset and size of a generated patch hunk by first formatting it into the textual hunk header "@@ -n,m +k,l @@" and then parsing the numbers out. A new interface has been introduced to allow callers a more direct access to them. * jk/xdiff-interface: xdiff-interface: drop parse_hunk_header() range-diff: use a hunk callback diff: convert --check to use a hunk callback combine-diff: use an xdiff hunk callback diff: use hunk callback for word-diff diff: discard hunk headers for patch-ids earlier diff: avoid generating unused hunk header lines xdiff-interface: provide a separate consume callback for hunks xdiff: provide a separate emit callback for hunks
2018-11-12notes-cache.c: remove the_repository referencesNguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05diff: avoid generating unused hunk header linesJeff King1-1/+2
Some callers of xdi_diff_outf() do not look at the generated hunk header lines at all. By plugging in a no-op hunk callback, this tells xdiff not to even bother formatting them. This patch introduces a stock no-op callback and uses it with a few callers whose line callbacks explicitly ignore hunk headers (because they look only for +/- lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-02xdiff-interface: provide a separate consume callback for hunksJeff King1-1/+1
The previous commit taught xdiff to optionally provide the hunk header data to a specialized callback. But most users of xdiff actually use our more convenient xdi_diff_outf() helper, which ensures that our callbacks are always fed whole lines. Let's plumb the special hunk-callback through this interface, too. It will follow the same rule as xdiff when the hunk callback is NULL (i.e., continue to pass a stringified hunk header to the line callback). Since we add NULL to each caller, there should be no behavior change yet. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21userdiff.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-2/+2
[jc: squashed in missing forward decl in userdiff.h found by Ramsay] Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21diff.c: remove the_index dependency in textconv() functionsNguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21regex: do not call `regfree()` if compilation failsMartin Ågren1-1/+0
It is apparently undefined behavior to call `regfree()` on a regex where `regcomp()` failed. The language in [1] is a bit muddy, at least to me, but the clearest hint is this (`preg` is the `regex_t *`): Upon successful completion, the regcomp() function shall return 0. Otherwise, it shall return an integer value indicating an error as described in <regex.h>, and the content of preg is undefined. Funnily enough, there is also the `regerror()` function which should be given a pointer to such a "failed" `regex_t` -- the content of which would supposedly be undefined -- and which may investigate it to come up with a detailed error message. In any case, the example in that document shows how `regfree()` is not called after `regcomp()` fails. We have quite a few users of this API and most get this right. These three users do not. Several implementations can handle this just fine [2] and these code paths supposedly have not wreaked havoc or we'd have heard about it. (These are all in code paths where git got bad input and is just about to die anyway.) But let's just avoid the issue altogether. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html [2] https://www.redhat.com/archives/libvir-list/2013-September/msg00262.html Researched-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-byi Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-04diff: properly error out when combining multiple pickaxe optionsStefan Beller1-1/+0
In f506b8e8b5 (git log/diff: add -G<regexp> that greps in the patch text, 2010-08-23) we were hesitant to check if the user requests both -S and -G at the same time. Now that the pickaxe family also offers --find-object, which looks slightly more different than the former two, let's add a check that those are not used at the same time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-04diffcore: add a pickaxe option to find a specific blobStefan Beller1-18/+27
Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) One might be tempted to extend git-describe to also work with blobs, such that `git describe <blob-id>` gives a description as '<commit-ish>:<path>'. This was implemented at [2]; as seen by the sheer number of responses (>110), it turns out this is tricky to get right. The hard part to get right is picking the correct 'commit-ish' as that could be the commit that (re-)introduced the blob or the blob that removed the blob; the blob could exist in different branches. Junio hinted at a different approach of solving this problem, which this patch implements. Teach the diff machinery another flag for restricting the information to what is shown. For example: $ ./git log --oneline --find-object=v2.0.0:Makefile b2feb64309 Revert the whole "ask curl-config" topic for now 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:" we observe that the Makefile as shipped with 2.0 was appeared in v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b. The reason why these commits both occur prior to v2.0.0 are evil merges that are not found using this new mechanism. [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob [2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/ Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-04diff: migrate diff_flags.pickaxe_ignore_case to a pickaxe_opts bitStefan Beller1-3/+3
Currently flags for pickaxing are found in different places. Unify the flags into the `pickaxe_opts` field, which will contain any pickaxe related flags. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01diff: make struct diff_flags members lowercaseBrandon Williams1-4/+4
Now that the flags stored in struct diff_flags are being accessed directly and not through macros, change all struct members from being uppercase to lowercase. This conversion is done using the following semantic patch: @@ expression E; @@ - E.RECURSIVE + E.recursive @@ expression E; @@ - E.TREE_IN_RECURSIVE + E.tree_in_recursive @@ expression E; @@ - E.BINARY + E.binary @@ expression E; @@ - E.TEXT + E.text @@ expression E; @@ - E.FULL_INDEX + E.full_index @@ expression E; @@ - E.SILENT_ON_REMOVE + E.silent_on_remove @@ expression E; @@ - E.FIND_COPIES_HARDER + E.find_copies_harder @@ expression E; @@ - E.FOLLOW_RENAMES + E.follow_renames @@ expression E; @@ - E.RENAME_EMPTY + E.rename_empty @@ expression E; @@ - E.HAS_CHANGES + E.has_changes @@ expression E; @@ - E.QUICK + E.quick @@ expression E; @@ - E.NO_INDEX + E.no_index @@ expression E; @@ - E.ALLOW_EXTERNAL + E.allow_external @@ expression E; @@ - E.EXIT_WITH_STATUS + E.exit_with_status @@ expression E; @@ - E.REVERSE_DIFF + E.reverse_diff @@ expression E; @@ - E.CHECK_FAILED + E.check_failed @@ expression E; @@ - E.RELATIVE_NAME + E.relative_name @@ expression E; @@ - E.IGNORE_SUBMODULES + E.ignore_submodules @@ expression E; @@ - E.DIRSTAT_CUMULATIVE + E.dirstat_cumulative @@ expression E; @@ - E.DIRSTAT_BY_FILE + E.dirstat_by_file @@ expression E; @@ - E.ALLOW_TEXTCONV + E.allow_textconv @@ expression E; @@ - E.TEXTCONV_SET_VIA_CMDLINE + E.textconv_set_via_cmdline @@ expression E; @@ - E.DIFF_FROM_CONTENTS + E.diff_from_contents @@ expression E; @@ - E.DIRTY_SUBMODULES + E.dirty_submodules @@ expression E; @@ - E.IGNORE_UNTRACKED_IN_SUBMODULES + E.ignore_untracked_in_submodules @@ expression E; @@ - E.IGNORE_DIRTY_SUBMODULES + E.ignore_dirty_submodules @@ expression E; @@ - E.OVERRIDE_SUBMODULE_CONFIG + E.override_submodule_config @@ expression E; @@ - E.DIRSTAT_BY_LINE + E.dirstat_by_line @@ expression E; @@ - E.FUNCCONTEXT + E.funccontext @@ expression E; @@ - E.PICKAXE_IGNORE_CASE + E.pickaxe_ignore_case @@ expression E; @@ - E.DEFAULT_FOLLOW_RENAMES + E.default_follow_renames Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01diff: remove DIFF_OPT_TST macroBrandon Williams1-4/+4
Remove the `DIFF_OPT_TST` macro and instead access the flags directly. This conversion is done using the following semantic patch: @@ expression E; identifier fld; @@ - DIFF_OPT_TST(&E, fld) + E.flags.fld @@ type T; T *ptr; identifier fld; @@ - DIFF_OPT_TST(ptr, fld) + ptr->flags.fld Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24Merge branch 'js/regexec-buf'Junio C Hamano1-2/+5
Fix for potential segv introduced in v2.11.0 and later (also v2.10.2). * js/regexec-buf: pickaxe: fix segfault with '-S<...> --pickaxe-regex'
2017-03-18pickaxe: fix segfault with '-S<...> --pickaxe-regex'SZEDER Gábor1-2/+5
'git {log,diff,...} -S<...> --pickaxe-regex' can segfault as a result of out-of-bounds memory reads. diffcore-pickaxe.c:contains() looks for all matches of the given regex in a buffer in a loop, advancing the buffer pointer to the end of the last match in each iteration. When we switched to REG_STARTEND in b7d36ffca (regex: use regexec_buf(), 2016-09-21), we started passing the size of that buffer to the regexp engine, too. Unfortunately, this buffer size is never updated on subsequent iterations, and as the buffer pointer advances on each iteration, this "bufptr+bufsize" points past the end of the buffer. This results in segmentation fault, if that memory can't be accessed. In case of 'git log' it can also result in erroneously listed commits, if the memory past the end of buffer is accessible and happens to contain data matching the regex. Reduce the buffer size on each iteration as the buffer pointer is advanced, thus maintaining the correct end of buffer location. Furthermore, make sure that the buffer pointer is not dereferenced in the control flow statements when we already reached the end of the buffer. The new test is flaky, I've never seen it fail on my Linux box even without the fix, but this is expected according to db5dfa3 (regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails, 2016-09-21). However, it did fail on Travis CI with the first (and incomplete) version of the fix, and based on that commit message I would expect the new test without the fix to fail most of the time on Windows. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26Merge branch 'js/regexec-buf'Junio C Hamano1-10/+8
Some codepaths in "git diff" used regexec(3) on a buffer that was mmap(2)ed, which may not have a terminating NUL, leading to a read beyond the end of the mapped region. This was fixed by introducing a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND extension. * js/regexec-buf: regex: use regexec_buf() regex: add regexec_buf() that can work on a non NUL-terminated string regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
2016-09-21regex: use regexec_buf()Johannes Schindelin1-10/+8
The new regexec_buf() function operates on buffers with an explicitly specified length, rather than NUL-terminated strings. We need to use this function whenever the buffer we want to pass to regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated). Note: the original motivation for this patch was to fix a bug where `git diff -G <regex>` would crash. This patch converts more callers, though, some of which allocated to construct NUL-terminated strings, or worse, modified buffers to temporarily insert NULs while calling regexec(3). By converting them to use regexec_buf(), the code has become much cleaner. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01diffcore-pickaxe: support case insensitive match on non-asciiNguyễn Thái Ngọc Duy1-0/+11
Similar to the "grep -F -i" case, we can't use kws on icase search outside ascii range, so we quote the string and pass it to regcomp as a basic regexp and let regex engine deal with case sensitivity. The new test is put in t7812 instead of t4209-log-pickaxe because lib-gettext.sh might cause problems elsewhere, probably. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01diffcore-pickaxe: Add regcomp_or_die()Nguyễn Thái Ngọc Duy1-9/+13
There's another regcomp code block coming in this function that needs the same error handling. This function can help avoid duplicating error handling code. Helped-by: Jeff King <peff@peff.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28react to errors in xdi_diffJeff King1-2/+2
When we call into xdiff to perform a diff, we generally lose the return code completely. Typically by ignoring the return of our xdi_diff wrapper, but sometimes we even propagate that return value up and then ignore it later. This can lead to us silently producing incorrect diffs (e.g., "git log" might produce no output at all, not even a diff header, for a content-level diff). In practice this does not happen very often, because the typical reason for xdiff to report failure is that it malloc() failed (it uses straight malloc, and not our xmalloc wrapper). But it could also happen when xdiff triggers one our callbacks, which returns an error (e.g., outf() in builtin/rerere.c tries to report a write failure in this way). And the next patch also plans to add more failure modes. Let's notice an error return from xdiff and react appropriately. In most of the diff.c code, we can simply die(), which matches the surrounding code (e.g., that is what we do if we fail to load a file for diffing in the first place). This is not that elegant, but we are probably better off dying to let the user know there was a problem, rather than simply generating bogus output. We could also just die() directly in xdi_diff, but the callers typically have a bit more context, and can provide a better message (and if we do later decide to pass errors up, we're one step closer to doing so). There is one interesting case, which is in diff_grep(). Here if we cannot generate the diff, there is nothing to match, and we silently return "no hits". This is actually what the existing code does already, but we make it a little more explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24pickaxe: simplify kwset loop in contains()René Scharfe1-5/+2
Inlining the variable "found" actually makes the code shorter and easier to read. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24pickaxe: call strlen only when necessary in diffcore_pickaxe_count()René Scharfe1-2/+1
We need to determine the search term's length only when fixed-string matching is used; regular expression compilation takes a NUL-terminated string directly. Only call strlen() in the former case. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24pickaxe: move pickaxe() after pickaxe_match()René Scharfe1-41/+38
pickaxe() calls pickaxe_match(); moving the definition of the former after the latter allows us to do without an explicit function declaration. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24pickaxe: merge diffcore_pickaxe_grep() and diffcore_pickaxe_count() into ↵René Scharfe1-37/+7
diffcore_pickaxe() diffcore_pickaxe_count() initializes the regular expression or kwset for the search term, calls pickaxe() with the callback has_changes() and cleans up afterwards. diffcore_pickaxe_grep() does the same, only it doesn't support kwset and uses the callback diff_grep() instead. Merge the two functions to form the new diffcore_pickaxe() and thus get rid of the duplicate regex setup and cleanup code. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24pickaxe: honor -i when used with -S and --pickaxe-regexRené Scharfe1-1/+4
accccde4 (pickaxe: allow -i to search in patch case-insensitively) allowed case-insenitive matching for -G and -S, but for the latter only if fixed string matching is used. Allow it for -S and regular expression matching as well to make the support complete. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12Merge branch 'rs/pickaxe-simplify'Junio C Hamano1-7/+4
* rs/pickaxe-simplify: diffcore-pickaxe: simplify has_changes and contains
2013-07-07diffcore-pickaxe: simplify has_changes and containsRené Scharfe1-7/+4
Halve the number of callsites of contains() to two using temporary variables, simplifying the code. While at it, get rid of the diff_options parameter, which became unused with 8fa4b09f. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03diffcore-pickaxe: make error messages more consistentRamkumar Ramachandra1-2/+2
Currently, diffcore-pickaxe reports two distinct errors for the same user error: $ git log --pickaxe-regex -S'\1' fatal: invalid pickaxe regex: Invalid back reference $ git log -G'\1' fatal: invalid log-grep regex: Invalid back reference This "log-grep" was only an internal name for the -G feature during development, and invite confusion with "git log --grep=<pattern>". Change the error messages to say "invalid regex". Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05diffcore-pickaxe: unify code for log -S/-GJeff King1-69/+49
The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05diffcore-pickaxe: fix leaks in "log -S<block>" and "log -G<pattern>"Junio C Hamano1-5/+7
The diff_grep() and has_changes() functions had early return codepaths for unmerged filepairs, which simply returned 0. When we taught textconv filter to them, one was ignored and continued to return early without freeing the result filtered by textconv, and the other had a failed attempt to fix, which allowed the planned return value 0 to be overwritten by a bogus call to contains(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05diffcore-pickaxe: port optimization from has_changes() to diff_grep()Junio C Hamano1-1/+6
These two functions are called in the same codeflow to implement "log -S<block>" and "log -G<pattern>", respectively, but the latter lacked two obvious optimizations the former implemented, namely: - When a pickaxe limit is not given at all, they should return without wasting any cycle; - When both sides of the filepair are the same, and the same textconv conversion apply to them, return early, as there will be no interesting differences between the two anyway. Also release the filespec data once the processing is done (this is not about leaking memory--it is about releasing data we finished looking at as early as possible). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05diffcore-pickaxe: respect --no-textconvSimon Ruderich1-4/+8
git log -S doesn't respect --no-textconv: $ echo '*.txt diff=wrong' > .gitattributes $ git -c diff.wrong.textconv='xxx' log --no-textconv -Sfoo error: cannot run xxx: No such file or directory fatal: unable to read files to diff Reported-by: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-04diffcore-pickaxe: remove fill_one()Jeff King1-20/+10
fill_one is _almost_ identical to just calling fill_textconv; the exception is that for the !DIFF_FILE_VALID case, fill_textconv gives us an empty buffer rather than a NULL one. Since we currently use the NULL pointer as a signal that the file is not present on one side of the diff, we must now switch to using DIFF_FILE_VALID to make the same check. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-04diffcore-pickaxe: remove unnecessary call to get_textconv()Simon Ruderich1-9/+14
The fill_one() function is responsible for finding and filling the textconv filter as necessary, and is called by diff_grep() function that implements "git log -G<pattern>". The has_changes() function that implements "git log -S<block>" calls get_textconv() for two sides being compared, before it checks to see if it was asked to perform the pickaxe limiting. Move the code around to avoid this wastage. After has_changes() calls get_textconv() to obtain textconv for both sides, fill_one() is called to use them. By adding get_textconv() to diff_grep() and relieving fill_one() of responsibility to find the textconv filter, we can avoid calling get_textconv() twice in has_changes(). With this change it's also no longer necessary for fill_one() to modify the textconv argument, therefore pass a pointer instead of a pointer to a pointer. Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-28pickaxe: use textconv for -S countingJeff King1-17/+39
We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net>
2012-10-28pickaxe: hoist empty needle checkJeff King1-2/+3
If we are given an empty pickaxe needle like "git log -S ''", it is impossible for us to find anything (because no matter what the content, the count will always be 0). We currently check this at the lowest level of contains(). Let's hoist the logic much earlier to has_changes(), so that it is simpler to return our answer before loading any blob data. Signed-off-by: Jeff King <peff@peff.net>
2012-10-28diff_grep: use textconv buffers for add/deleted filesJeff King1-2/+2
If you use "-G" to grep a diff, we will apply a configured textconv filter to the data before generating the diff. However, if the diff is an addition or deletion, we do not bother running the diff at all, and just look for the token in the added (or removed) content. This works because we know that the diff must contain every line of content. However, while we used the textconv-derived buffers in the regular diff, we accidentally passed the original unmodified buffers to regexec when checking the added or removed content. This could lead to an incorrect answer. Worse, in some cases we might have a textconv buffer but no original buffer (e.g., if we pulled the textconv data from cache, or if we reused a working tree file when generating it). In that case, we could actually feed NULL to regexec and segfault. Reported-by: Peter Oberndorfer <kumbayo84@arcor.de> Signed-off-by: Jeff King <peff@peff.net>
2012-02-28pickaxe: allow -i to search in patch case-insensitivelyJunio C Hamano1-2/+7
"git log -S<string>" is a useful way to find the last commit in the codebase that touched the <string>. As it was designed to be used by a porcelain script to dig the history starting from a block of text that appear in the starting commit, it never had to look for anything but an exact match. When used by an end user who wants to look for the last commit that removed a string (e.g. name of a variable) that he vaguely remembers, however, it is useful to support case insensitive match. When given the "--regexp-ignore-case" (or "-i") option, which originally was designed to affect case sensitivity of the search done in the commit log part, e.g. "log --grep", the matches made with -S/-G pickaxe search is done case insensitively now. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07pickaxe: factor out pickaxeRené Scharfe1-67/+43
Move the duplicate diff queue loop into its own function that accepts a match function: has_changes() for -S and diff_grep() for -G. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07pickaxe: give diff_grep the same signature as has_changesRené Scharfe1-3/+4
Change diff_grep() to match the signature of has_changes() as a preparation for the next patch that will use function pointers to the two. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07pickaxe: pass diff_options to contains and has_changesRené Scharfe1-14/+14
Remove the unused parameter needle from contains() and has_changes(). Also replace the parameter len with a pointer to the diff_options. We can use its member pickaxe to check if the needle is an empty string and use the kwsmatch structure to find out the length of the match instead. This change is done as a preparation to unify the signatures of has_changes() and diff_grep(), which will be used in the patch after the next one to factor out common code. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07pickaxe: factor out has_changesRené Scharfe1-36/+21
Move duplicate if/else construct into its own helper function. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07pickaxe: plug regex/kws leakRené Scharfe1-6/+7
With -S... --pickaxe-all, free the regex or the kws before returning even if we found a match. Also get rid of the variable has_changes, as we can simply break out of the loop. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07pickaxe: plug regex leakRené Scharfe1-7/+6
With -G... --pickaxe-all, free the regex before returning even if we found a match. Also get rid of the variable has_changes, as we can simply break out of the loop. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07pickaxe: plug diff filespec leak with empty needleRené Scharfe1-2/+2
Check first for the unlikely case of an empty needle string and only then populate the filespec, lest we leak it. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-20Use kwset in pickaxeFredrik Kuivinen1-11/+23
Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06diffcore-pickaxe.c: a void function shouldn't try to return somethingBrandon Casey1-2/+2
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06Merge branch 'maint'Junio C Hamano1-2/+1
* maint: Documentation/git-clone: describe --mirror more verbosely do not depend on signed integer overflow work around buggy S_ISxxx(m) implementations xdiff: cast arguments for ctype functions to unsigned char init: plug tiny one-time memory leak diffcore-pickaxe.c: remove unnecessary curly braces t3020 (ls-files-error-unmatch): remove stray '1' from end of file setup: make sure git dir path is in a permanent buffer environment.c: remove unused variable git-svn: fix processing of decorated commit hashes git-svn: check_cherry_pick should exclude commits already in our history Documentation/git-svn: discourage "noMetadata"
2010-10-05diffcore-pickaxe.c: remove unnecessary curly bracesBrandon Casey1-2/+1
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-31git log/diff: add -G<regexp> that greps in the patch textJunio C Hamano1-1/+149
Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-31diff: pass the entire diff-options to diffcore_pickaxe()Junio C Hamano1-1/+3
That would make it easier to give enhanced feature to the pickaxe transformation. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-07Add a macro DIFF_QUEUE_CLEAR.Bo Yang1-2/+1
Refactor the diff_queue_struct code, this macro help to reset the structure. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-17pickaxe: count regex matches only onceRené Scharfe1-2/+4
When --pickaxe-regex is used, forward past the end of matches instead of advancing to the byte after their start. This way matches count only once, even if the regular expression matches their tail -- like in the fixed-string fork of the code. E.g.: /.*/ used to count the number of bytes instead of the number of lines. /aa/ resulted in a count of two in "aaa" instead of one. Also document the fact that regexec() needs a NUL-terminated string as its second argument by adding an assert(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-02diffcore-pickaxe: use memmem()René Scharfe1-10/+8
Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07War on whitespaceJunio C Hamano1-1/+1
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-05-07diff -S: release the image after looking for needle in itJunio C Hamano1-0/+1
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-25diffcore-pickaxe: fix infinite loop on zero-length needleJeff King1-0/+2
The "contains" algorithm runs into an infinite loop if the needle string has zero length. The loop could be modified to handle this, but it makes more sense to simply have an empty needle return no matches. Thus, a command like git log -S produces no output. We place the check at the top of the function so that we get the same results with or without --pickaxe-regex. Note that until now, git log -S --pickaxe-regex would match everything, not nothing. Arguably, an empty pickaxe string should simply produce an error message; however, this is still a useful assertion to add to the algorithm at this layer of the code. Noticed by Bill Lear. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20simplify inclusion of system header files.Junio C Hamano1-2/+0
This is a mechanical clean-up of the way *.c files include system header files. (1) sources under compat/, platform sha-1 implementations, and xdelta code are exempt from the following rules; (2) the first #include must be "git-compat-util.h" or one of our own header file that includes it first (e.g. config.h, builtin.h, pkt-line.h); (3) system headers that are included in "git-compat-util.h" need not be included in individual C source files. (4) "git-compat-util.h" does not have to include subsystem specific header files (e.g. expat.h). Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04On some platforms, certain headers need to be included before regex.hJohannes Schindelin1-2/+2
Happily, these are already included in cache.h, which is included anyway... so: change the order of includes. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04Support for pickaxe matching regular expressionsPetr Baudis1-16/+50
git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz>
2005-07-23[PATCH] diffcore-pickaxe: switch to "counting" behaviour.Junio C Hamano1-6/+17
Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29[PATCH] Do not include unused header files.Junio C Hamano1-1/+0
Some source files were including "delta.h" without actually needing it. Remove them. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29[PATCH] Optimize diff-tree -[CM] --stdinJunio C Hamano1-1/+1
This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29[PATCH] Add --pickaxe-all to diff-* brothers.Junio C Hamano1-20/+57
When --pickaxe-all is given in addition to -S, pickaxe shows the entire diffs contained in the changeset, not just the diffs for the filepair that touched the sought-after string. This is useful to see the changes in context. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29[PATCH] Introduce diff_free_filepair() funcion.Junio C Hamano1-1/+1
This introduces a new function to free a common data structure, and plugs some leaks. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-23[PATCH] Performance fix for pickaxe.Junio C Hamano1-1/+2
The pickaxe was expanding the blobs and searching in them even when it should have already known that both sides are the same. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-23[PATCH] Rename/copy detection fix.Junio C Hamano1-1/+1
The rename/copy detection logic in earlier round was only good enough to show patch output and discussion on the mailing list about the diff-raw format updates revealed many problems with it. This patch fixes all the ones known to me, without making things I want to do later impossible, mostly related to patch reordering. (1) Earlier rename/copy detector determined which one is rename and which one is copy too early, which made it impossible to later introduce diffcore transformers to reorder patches. This patch fixes it by moving that logic to the very end of the processing. (2) Earlier output routine diff_flush() was pruning all the "no-change" entries indiscriminatingly. This was done due to my false assumption that one of the requirements in the diff-raw output was not to show such an entry (which resulted in my incorrect comment about "diff-helper never being able to be equivalent to built-in diff driver"). My special thanks go to Linus for correcting me about this. When we produce diff-raw output, for the downstream to be able to tell renames from copies, sometimes it _is_ necessary to output "no-change" entries, and this patch adds diffcore_prune() function for doing it. (3) Earlier diff_filepair structure was trying to be not too specific about rename/copy operations, but the purpose of the structure was to record one or two paths, which _was_ indeed about rename/copy. This patch discards xfrm_msg field which was trying to be generic for this wrong reason, and introduces a couple of fields (rename_score and rename_rank) that are explicitly specific to rename/copy logic. One thing to note is that the information in a single diff_filepair structure _still_ does not distinguish renames from copies, and it is deliberately so. This is to allow patches to be reordered in later stages. (4) This patch also adds some tests about diff-raw format output and makes sure that necessary "no-change" entries appear on the output. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-22[PATCH] Diffcore updates.Junio C Hamano1-8/+7
This moves the path selection logic from individual programs to a new diffcore transformer (diff-tree still needs to have its own for performance reasons). Also the header printing code in diff-tree was tweaked not to produce anything when pickaxe is in effect and there is nothing interesting to report. An interesting example is the following in the GIT archive itself: $ git-whatchanged -p -C -S'or something in a real script' Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-21[PATCH] The diff-raw format updates.Junio C Hamano1-3/+3
Update the diff-raw format as Linus and I discussed, except that it does not use sequence of underscore '_' letters to express nonexistence. All '0' mode is used for that purpose instead. The new diff-raw format can express rename/copy, and the earlier restriction that -M and -C _must_ be used with the patch format output is no longer necessary. The patch makes -M and -C flags independent of -p flag, so you need to say git-whatchanged -M -p to get the diff/patch format. Updated are both documentations and tests. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-21[PATCH] Prepare diffcore interface for diff-tree header supression.Junio C Hamano1-1/+2
This does not actually supress the extra headers when pickaxe is used, but prepares enough support for diff-tree to implement it. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-21[PATCH] Introducing software archaeologist's tool "pickaxe".Junio C Hamano1-0/+56
This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>