Commit graph

58 commits

Author SHA1 Message Date
Junio C Hamano 0efeb5ca12 Merge branch 'js/regexec-buf'
Fix for potential segv introduced in v2.11.0 and later (also
v2.10.2).

* js/regexec-buf:
  pickaxe: fix segfault with '-S<...> --pickaxe-regex'
2017-03-24 13:07:35 -07:00
SZEDER Gábor f53c5de29c pickaxe: fix segfault with '-S<...> --pickaxe-regex'
'git {log,diff,...} -S<...> --pickaxe-regex' can segfault as a result
of out-of-bounds memory reads.

diffcore-pickaxe.c:contains() looks for all matches of the given regex
in a buffer in a loop, advancing the buffer pointer to the end of the
last match in each iteration.  When we switched to REG_STARTEND in
b7d36ffca (regex: use regexec_buf(), 2016-09-21), we started passing
the size of that buffer to the regexp engine, too.  Unfortunately,
this buffer size is never updated on subsequent iterations, and as the
buffer pointer advances on each iteration, this "bufptr+bufsize"
points past the end of the buffer.  This results in segmentation
fault, if that memory can't be accessed.  In case of 'git log' it can
also result in erroneously listed commits, if the memory past the end
of buffer is accessible and happens to contain data matching the
regex.

Reduce the buffer size on each iteration as the buffer pointer is
advanced, thus maintaining the correct end of buffer location.
Furthermore, make sure that the buffer pointer is not dereferenced in
the control flow statements when we already reached the end of the
buffer.

The new test is flaky, I've never seen it fail on my Linux box even
without the fix, but this is expected according to db5dfa3 (regex:
-G<pattern> feeds a non NUL-terminated string to regexec() and fails,
2016-09-21).  However, it did fail on Travis CI with the first (and
incomplete) version of the fix, and based on that commit message I
would expect the new test without the fix to fail most of the time on
Windows.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-18 12:22:33 -07:00
Junio C Hamano 6a67695268 Merge branch 'js/regexec-buf'
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region.  This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.

* js/regexec-buf:
  regex: use regexec_buf()
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
2016-09-26 16:09:19 -07:00
Johannes Schindelin b7d36ffca0 regex: use regexec_buf()
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3).  By converting them to use regexec_buf(), the code has
become much cleaner.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 13:56:15 -07:00
Nguyễn Thái Ngọc Duy b51a9c1479 diffcore-pickaxe: support case insensitive match on non-ascii
Similar to the "grep -F -i" case, we can't use kws on icase search
outside ascii range, so we quote the string and pass it to regcomp as
a basic regexp and let regex engine deal with case sensitivity.

The new test is put in t7812 instead of t4209-log-pickaxe because
lib-gettext.sh might cause problems elsewhere, probably.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:57 -07:00
Nguyễn Thái Ngọc Duy 3d5b23a362 diffcore-pickaxe: Add regcomp_or_die()
There's another regcomp code block coming in this function that needs
the same error handling. This function can help avoid duplicating
error handling code.

Helped-by: Jeff King <peff@peff.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 12:44:57 -07:00
Jeff King 3efb988098 react to errors in xdi_diff
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later.  This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).

In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper).  But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.

Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.

We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).

There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 14:57:10 -07:00
René Scharfe e4aab50475 pickaxe: simplify kwset loop in contains()
Inlining the variable "found" actually makes the code shorter and
easier to read.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24 15:13:17 -07:00
René Scharfe 542b2aa2c9 pickaxe: call strlen only when necessary in diffcore_pickaxe_count()
We need to determine the search term's length only when fixed-string
matching is used; regular expression compilation takes a NUL-terminated
string directly.  Only call strlen() in the former case.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24 15:13:17 -07:00
René Scharfe 3753bd1f69 pickaxe: move pickaxe() after pickaxe_match()
pickaxe() calls pickaxe_match(); moving the definition of the former
after the latter allows us to do without an explicit function
declaration.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24 15:13:10 -07:00
René Scharfe 63b52afaa8 pickaxe: merge diffcore_pickaxe_grep() and diffcore_pickaxe_count() into diffcore_pickaxe()
diffcore_pickaxe_count() initializes the regular expression or kwset for
the search term, calls pickaxe() with the callback has_changes() and
cleans up afterwards.  diffcore_pickaxe_grep() does the same, only it
doesn't support kwset and uses the callback diff_grep() instead.  Merge
the two functions to form the new diffcore_pickaxe() and thus get rid of
the duplicate regex setup and cleanup code.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24 15:12:45 -07:00
René Scharfe 218c45a45c pickaxe: honor -i when used with -S and --pickaxe-regex
accccde4 (pickaxe: allow -i to search in patch case-insensitively)
allowed case-insenitive matching for -G and -S, but for the latter
only if fixed string matching is used.  Allow it for -S and regular
expression matching as well to make the support complete.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-24 15:12:45 -07:00
Junio C Hamano d5a3897f94 Merge branch 'rs/pickaxe-simplify'
* rs/pickaxe-simplify:
  diffcore-pickaxe: simplify has_changes and contains
2013-07-12 12:04:17 -07:00
René Scharfe 3bdb5b9f1f diffcore-pickaxe: simplify has_changes and contains
Halve the number of callsites of contains() to two using temporary
variables, simplifying the code.  While at it, get rid of the
diff_options parameter, which became unused with 8fa4b09f.

Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-07 10:24:11 -07:00
Ramkumar Ramachandra 276b22d333 diffcore-pickaxe: make error messages more consistent
Currently, diffcore-pickaxe reports two distinct errors for the same
user error:

    $ git log --pickaxe-regex -S'\1'
    fatal: invalid pickaxe regex: Invalid back reference

    $ git log -G'\1'
    fatal: invalid log-grep regex: Invalid back reference

This "log-grep" was only an internal name for the -G feature during
development, and invite confusion with "git log --grep=<pattern>".

Change the error messages to say "invalid regex".

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03 10:50:22 -07:00
Jeff King 61690bf4a1 diffcore-pickaxe: unify code for log -S/-G
The logic flow of has_changes() used for "log -S" and diff_grep()
used for "log -G" are essentially the same.  See if we have both
sides that could be different in any interesting way, slurp the
contents in core, possibly after applying textconv, inspect the
contents, clean-up and report the result.  The only difference
between the two is how "inspect" step works.

Unify this codeflow in a helper, pickaxe_match(), which takes a
callback function that implements the specific "inspect" step.

After removing the common scaffolding code from the existing
has_changes() and diff_grep(), they each becomes such a callback
function suitable for passing to pickaxe_match().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05 10:31:09 -07:00
Junio C Hamano 88ff684dd5 diffcore-pickaxe: fix leaks in "log -S<block>" and "log -G<pattern>"
The diff_grep() and has_changes() functions had early return
codepaths for unmerged filepairs, which simply returned 0.  When we
taught textconv filter to them, one was ignored and continued to
return early without freeing the result filtered by textconv, and
the other had a failed attempt to fix, which allowed the planned
return value 0 to be overwritten by a bogus call to contains().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05 10:31:09 -07:00
Junio C Hamano ebb7226258 diffcore-pickaxe: port optimization from has_changes() to diff_grep()
These two functions are called in the same codeflow to implement
"log -S<block>" and "log -G<pattern>", respectively, but the latter
lacked two obvious optimizations the former implemented, namely:

 - When a pickaxe limit is not given at all, they should return
   without wasting any cycle;

 - When both sides of the filepair are the same, and the same
   textconv conversion apply to them, return early, as there will be
   no interesting differences between the two anyway.

Also release the filespec data once the processing is done (this is
not about leaking memory--it is about releasing data we finished
looking at as early as possible).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05 10:31:09 -07:00
Simon Ruderich a8f6109428 diffcore-pickaxe: respect --no-textconv
git log -S doesn't respect --no-textconv:

    $ echo '*.txt diff=wrong' > .gitattributes
    $ git -c diff.wrong.textconv='xxx' log --no-textconv -Sfoo
    error: cannot run xxx: No such file or directory
    fatal: unable to read files to diff

Reported-by: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-05 10:30:44 -07:00
Jeff King 7cdb9b42c3 diffcore-pickaxe: remove fill_one()
fill_one is _almost_ identical to just calling fill_textconv; the
exception is that for the !DIFF_FILE_VALID case, fill_textconv gives us
an empty buffer rather than a NULL one. Since we currently use the NULL
pointer as a signal that the file is not present on one side of the
diff, we must now switch to using DIFF_FILE_VALID to make the same
check.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-04 20:33:19 -07:00
Simon Ruderich bc6158981b diffcore-pickaxe: remove unnecessary call to get_textconv()
The fill_one() function is responsible for finding and filling the
textconv filter as necessary, and is called by diff_grep() function
that implements "git log -G<pattern>".

The has_changes() function that implements "git log -S<block>" calls
get_textconv() for two sides being compared, before it checks to see
if it was asked to perform the pickaxe limiting.  Move the code
around to avoid this wastage.

After has_changes() calls get_textconv() to obtain textconv for both
sides, fill_one() is called to use them.

By adding get_textconv() to diff_grep() and relieving fill_one() of
responsibility to find the textconv filter, we can avoid calling
get_textconv() twice in has_changes().

With this change it's also no longer necessary for fill_one() to
modify the textconv argument, therefore pass a pointer instead of a
pointer to a pointer.

Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-04 20:33:19 -07:00
Jeff King ef90ab66e8 pickaxe: use textconv for -S counting
We currently just look at raw blob data when using "-S" to
pickaxe. This is mostly historical, as pickaxe predates the
textconv feature. If the user has bothered to define a
textconv filter, it is more likely that their search string will be
on the textconv output, as that is what they will see in the
diff (and we do not even provide a mechanism for them to
search for binary needles that contain NUL characters).

This patch teaches "-S" to use textconv, just as we
already do for "-G".

Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 08:48:17 -04:00
Jeff King 8fa4b09fb1 pickaxe: hoist empty needle check
If we are given an empty pickaxe needle like "git log -S ''",
it is impossible for us to find anything (because no matter
what the content, the count will always be 0). We currently
check this at the lowest level of contains(). Let's hoist
the logic much earlier to has_changes(), so that it is
simpler to return our answer before loading any blob data.

Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 08:48:09 -04:00
Jeff King b1c2f57db3 diff_grep: use textconv buffers for add/deleted files
If you use "-G" to grep a diff, we will apply a configured
textconv filter to the data before generating the diff.
However, if the diff is an addition or deletion, we do not
bother running the diff at all, and just look for the token
in the added (or removed) content. This works because we
know that the diff must contain every line of content.

However, while we used the textconv-derived buffers in the
regular diff, we accidentally passed the original unmodified
buffers to regexec when checking the added or removed
content. This could lead to an incorrect answer.

Worse, in some cases we might have a textconv buffer but no
original buffer (e.g., if we pulled the textconv data from
cache, or if we reused a working tree file when generating
it). In that case, we could actually feed NULL to regexec
and segfault.

Reported-by: Peter Oberndorfer <kumbayo84@arcor.de>
Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 07:59:44 -04:00
Junio C Hamano accccde483 pickaxe: allow -i to search in patch case-insensitively
"git log -S<string>" is a useful way to find the last commit in the
codebase that touched the <string>. As it was designed to be used by a
porcelain script to dig the history starting from a block of text that
appear in the starting commit, it never had to look for anything but an
exact match.

When used by an end user who wants to look for the last commit that
removed a string (e.g. name of a variable) that he vaguely remembers,
however, it is useful to support case insensitive match.

When given the "--regexp-ignore-case" (or "-i") option, which originally
was designed to affect case sensitivity of the search done in the commit
log part, e.g. "log --grep", the matches made with -S/-G pickaxe search is
done case insensitively now.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-28 16:15:29 -08:00
René Scharfe 8a94151d61 pickaxe: factor out pickaxe
Move the duplicate diff queue loop into its own function that accepts
a match function: has_changes() for -S and diff_grep() for -G.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07 15:46:14 -07:00
René Scharfe db99cb7000 pickaxe: give diff_grep the same signature as has_changes
Change diff_grep() to match the signature of has_changes() as a
preparation for the next patch that will use function pointers to
the two.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07 15:46:14 -07:00
René Scharfe 5d176fb6b6 pickaxe: pass diff_options to contains and has_changes
Remove the unused parameter needle from contains() and has_changes().

Also replace the parameter len with a pointer to the diff_options.  We
can use its member pickaxe to check if the needle is an empty string
and use the kwsmatch structure to find out the length of the match
instead.

This change is done as a preparation to unify the signatures of
has_changes() and diff_grep(), which will be used in the patch after
the next one to factor out common code.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07 15:46:13 -07:00
René Scharfe 15dafaf80d pickaxe: factor out has_changes
Move duplicate if/else construct into its own helper function.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07 15:46:13 -07:00
René Scharfe 8e854b00d8 pickaxe: plug regex/kws leak
With -S... --pickaxe-all, free the regex or the kws before returning
even if we found a match.  Also get rid of the variable has_changes,
as we can simply break out of the loop.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07 15:46:13 -07:00
René Scharfe 2b5f07f16c pickaxe: plug regex leak
With -G... --pickaxe-all, free the regex before returning even if we
found a match.  Also get rid of the variable has_changes, as we can
simply break out of the loop.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07 15:46:13 -07:00
René Scharfe 05ac978495 pickaxe: plug diff filespec leak with empty needle
Check first for the unlikely case of an empty needle string and only
then populate the filespec, lest we leak it.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-07 15:46:12 -07:00
Fredrik Kuivinen b95c5ada99 Use kwset in pickaxe
Benchmarks in the hot cache case:

before:
$ perf stat --repeat=5 git log -Sqwerty

Performance counter stats for 'git log -Sqwerty' (5 runs):

       47,092,744 cache-misses             #      2.825 M/sec   ( +-   1.607% )
      123,368,389 cache-references         #      7.400 M/sec   ( +-   0.812% )
      330,040,998 branch-misses            #      3.134 %       ( +-   0.257% )
   10,530,896,750 branches                 #    631.663 M/sec   ( +-   0.121% )
   62,037,201,030 instructions             #      1.399 IPC     ( +-   0.142% )
   44,331,294,321 cycles                   #   2659.073 M/sec   ( +-   0.326% )
           96,794 page-faults              #      0.006 M/sec   ( +-  11.952% )
               25 CPU-migrations           #      0.000 M/sec   ( +-  25.266% )
            1,424 context-switches         #      0.000 M/sec   ( +-   0.540% )
     16671.708650 task-clock-msecs         #      0.997 CPUs    ( +-   0.343% )

      16.728692052  seconds time elapsed   ( +-   0.344% )

after:
$ perf stat --repeat=5 git log -Sqwerty

Performance counter stats for 'git log -Sqwerty' (5 runs):

       51,385,522 cache-misses             #      4.619 M/sec   ( +-   0.565% )
      129,177,880 cache-references         #     11.611 M/sec   ( +-   0.219% )
      319,222,775 branch-misses            #      6.946 %       ( +-   0.134% )
    4,595,913,233 branches                 #    413.086 M/sec   ( +-   0.112% )
   31,395,042,533 instructions             #      1.062 IPC     ( +-   0.129% )
   29,558,348,598 cycles                   #   2656.740 M/sec   ( +-   0.204% )
           93,224 page-faults              #      0.008 M/sec   ( +-   4.487% )
               19 CPU-migrations           #      0.000 M/sec   ( +-  10.425% )
              950 context-switches         #      0.000 M/sec   ( +-   0.360% )
     11125.796039 task-clock-msecs         #      0.997 CPUs    ( +-   0.239% )

      11.164216599  seconds time elapsed   ( +-   0.240% )

So the kwset code is about 33% faster.

Signed-off-by: Fredrik Kuivinen <frekui@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-20 22:33:57 -07:00
Brandon Casey 8520913cc5 diffcore-pickaxe.c: a void function shouldn't try to return something
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06 13:45:18 -07:00
Junio C Hamano 90215bf300 Merge branch 'maint'
* maint:
  Documentation/git-clone: describe --mirror more verbosely
  do not depend on signed integer overflow
  work around buggy S_ISxxx(m) implementations
  xdiff: cast arguments for ctype functions to unsigned char
  init: plug tiny one-time memory leak
  diffcore-pickaxe.c: remove unnecessary curly braces
  t3020 (ls-files-error-unmatch): remove stray '1' from end of file
  setup: make sure git dir path is in a permanent buffer
  environment.c: remove unused variable
  git-svn: fix processing of decorated commit hashes
  git-svn: check_cherry_pick should exclude commits already in our history
  Documentation/git-svn: discourage "noMetadata"
2010-10-06 12:10:02 -07:00
Brandon Casey 95ae69b95b diffcore-pickaxe.c: remove unnecessary curly braces
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-05 08:41:58 -07:00
Junio C Hamano f506b8e8b5 git log/diff: add -G<regexp> that greps in the patch text
Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the
"git diff" family of commands.  This limits the diff queue to filepairs
whose patch text actually has an added or a deleted line that matches the
given regexp.  Unlike "-S<regexp>", changing other parts of the line that
has a substring that matches the given regexp IS counted as a change, as
such a change would appear as one deletion followed by one addition in a
patch text.

Unlike -S (pickaxe) that is intended to be used to quickly detect a commit
that changes the number of occurrences of hits between the preimage and
the postimage to serve as a part of larger toolchain, this is meant to be
used as the top-level Porcelain feature.

The implementation unfortunately has to run "diff" twice if you are
running "log" family of commands to produce patches in the final output
(e.g. "git log -p" or "git format-patch").  I think we _could_ cache the
result in-core if we wanted to, but that would require larger surgery to
the diffcore machinery (i.e. adding an extra pointer in the filepair
structure to keep a pointer to a strbuf around, stuff the textual diff to
the strbuf inside diffgrep_consume(), and make use of it in later stages
when it is available) and it may not be worth it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-31 14:30:29 -07:00
Junio C Hamano 382f013bc4 diff: pass the entire diff-options to diffcore_pickaxe()
That would make it easier to give enhanced feature to the
pickaxe transformation.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-31 14:30:28 -07:00
Bo Yang 9ca5df9061 Add a macro DIFF_QUEUE_CLEAR.
Refactor the diff_queue_struct code, this macro help
to reset the structure.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-07 09:34:27 -07:00
René Scharfe 50fd6997c6 pickaxe: count regex matches only once
When --pickaxe-regex is used, forward past the end of matches instead of
advancing to the byte after their start.  This way matches count only
once, even if the regular expression matches their tail -- like in the
fixed-string fork of the code.

E.g.: /.*/ used to count the number of bytes instead of the number of
lines.  /aa/ resulted in a count of two in "aaa" instead of one.

Also document the fact that regexec() needs a NUL-terminated string as
its second argument by adding an assert().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-17 15:10:12 -07:00
René Scharfe ce163c793d diffcore-pickaxe: use memmem()
Use memmem() instead of open-coding it.  The system libraries usually have a
much faster version than the memcmp()-loop here.  Even our own fall-back in
compat/, which is used on Windows, is slightly faster.

The following commands were run in a Linux kernel repository and timed, the
best of five results is shown:

  $ STRING='Ensure that the real time constraints are schedulable.'
  $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null

On Ubuntu 8.10 x64, before (v1.6.2-rc2):

  8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+30952minor)pagefaults 0swaps

And with the patch:

  1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+30645minor)pagefaults 0swaps

On Fedora 10 x64, before:

  8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+29268minor)pagefaults 0swaps

And with the patch:

  1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+32253minor)pagefaults 0swaps

On Windows Vista x64, before:

  real    0m9.204s
  user    0m0.000s
  sys     0m0.000s

And with the patch:

  real    0m8.470s
  user    0m0.000s
  sys     0m0.000s

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-02 18:28:03 -08:00
Junio C Hamano a6080a0a44 War on whitespace
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time.  There are a few files that need
to have trailing whitespaces (most notably, test vectors).  The results
still passes the test, and build result in Documentation/ area is unchanged.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07 00:04:01 -07:00
Junio C Hamano a0cb94006c diff -S: release the image after looking for needle in it
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-07 15:54:32 -07:00
Jeff King e1b161161d diffcore-pickaxe: fix infinite loop on zero-length needle
The "contains" algorithm runs into an infinite loop if the needle string
has zero length. The loop could be modified to handle this, but it makes
more sense to simply have an empty needle return no matches. Thus, a
command like
  git log -S
produces no output.

We place the check at the top of the function so that we get the same
results with or without --pickaxe-regex. Note that until now,
  git log -S --pickaxe-regex
would match everything, not nothing.

Arguably, an empty pickaxe string should simply produce an error
message; however, this is still a useful assertion to add to the
algorithm at this layer of the code.

Noticed by Bill Lear.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-25 21:17:19 -08:00
Junio C Hamano 85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Johannes Schindelin 46b8dec038 On some platforms, certain headers need to be included before regex.h
Happily, these are already included in cache.h, which is included anyway...
so: change the order of includes.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04 17:14:06 -07:00
Petr Baudis d01d8c6782 Support for pickaxe matching regular expressions
git-diff-* --pickaxe-regex will change the -S pickaxe to match
POSIX extended regular expressions instead of fixed strings.

The regex.h library is a rather stupid interface and I like pcre too, but
with any luck it will be everywhere we will want to run Git on, it being
POSIX.2 and all. I'm not sure if we can expect platforms like AIX to
conform to POSIX.2 or if win32 has regex.h. We might add a flag to
Makefile if there is a portability trouble potential.

Signed-off-by: Petr Baudis <pasky@suse.cz>
2006-04-04 13:44:15 -07:00
Junio C Hamano 2002eed6c9 [PATCH] diffcore-pickaxe: switch to "counting" behaviour.
Instead of finding old/new pair that one side has and the
other side does not have the specified string, find old/new pair
that contains the specified string as a substring different
number of times.  This would still not catch a case where you
introduce two static variable declarations and remove two static
function definitions from a file with -S"static", but would make
it behave a bit more intuitively.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-23 20:27:49 -07:00
Junio C Hamano 844e6e4d58 [PATCH] Do not include unused header files.
Some source files were including "delta.h" without actually
needing it.  Remove them.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:42:29 -07:00
Junio C Hamano f0c6b2a2fd [PATCH] Optimize diff-tree -[CM] --stdin
This attempts to optimize "diff-tree -[CM] --stdin", which
compares successible tree pairs.  This optimization does not
make much sense for other commands in the diff-* brothers.

When reading from --stdin and using rename/copy detection, the
patch makes diff-tree to read the current index file first.
This is done to reuse the optimization used by diff-cache in the
non-cached case.  Similarity estimator can avoid expanding a
blob if the index says what is in the work tree has an exact
copy of that blob already expanded.

Another optimization the patch makes is to check only file sizes
first to terminate similarity estimation early.  In order for
this to work, it needs a way to tell the size of the blob
without expanding it.  Since an obvious way of doing it, which
is to keep all the blobs previously used in the memory, is too
costly, it does so by keeping the filesize for each object it
has already seen in memory.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:17:44 -07:00