No description
Find a file
Ævar Arnfjörð Bjarmason 94da9193a6 grep: add support for PCRE v2
Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

    $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~5 HEAD~ HEAD p7820-grep-engines.sh
    [...]
    Test                                            HEAD~5            HEAD~                    HEAD
    -----------------------------------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.31(1.10+0.48)   0.21(0.35+0.56) -32.3%   0.21(0.34+0.55) -32.3%
    7820.7: perl grep '^how to'                     0.56(2.70+0.40)   0.24(0.64+0.52) -57.1%   0.20(0.28+0.60) -64.3%
    7820.11: perl grep '[how] to'                   0.56(2.66+0.38)   0.29(0.95+0.45) -48.2%   0.23(0.45+0.54) -58.9%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       1.02(5.77+0.42)   0.31(1.02+0.54) -69.6%   0.23(0.50+0.54) -77.5%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.38(1.57+0.42)   0.27(0.85+0.46) -28.9%   0.21(0.33+0.57) -44.7%

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

    [...]
    Test                                            HEAD~             HEAD
    ----------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.21(0.42+0.52)   0.21(0.31+0.58) +0.0%
    7820.7: perl grep '^how to'                     0.25(0.65+0.50)   0.20(0.31+0.57) -20.0%
    7820.11: perl grep '[how] to'                   0.30(0.90+0.50)   0.23(0.46+0.53) -23.3%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.30(1.19+0.38)   0.23(0.51+0.51) -23.3%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.27(0.84+0.48)   0.21(0.34+0.57) -22.2%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 08:29:05 +09:00
block-sha1
builtin grep: skip pthreads overhead when using one thread 2017-05-26 12:59:05 +09:00
ci Merge branch 'rg/a-the-typo' 2017-05-04 16:26:47 +09:00
compat
contrib Merge branch 'jk/complete-checkout-sans-dwim-remote' 2017-05-01 14:14:41 +09:00
Documentation log: add -P as a synonym for --perl-regexp 2017-05-26 12:59:05 +09:00
ewah Merge branch 'jk/ewah-use-right-type-in-sizeof' into maint 2017-03-21 15:03:24 -07:00
git-gui
gitk-git
gitweb
mergetools
perl
po Merge branch 'master' of git://github.com/nafmo/git-l10n-sv 2017-05-09 22:12:34 +08:00
ppc
refs Merge branch 'mh/separate-ref-cache' 2017-04-26 15:39:13 +09:00
sha1dc sha1dc: avoid CPP macro collisions 2017-03-26 15:34:44 -07:00
t grep: add support for PCRE v2 2017-06-02 08:29:05 +09:00
templates
vcs-svn
xdiff
.gitattributes
.gitignore
.mailmap Git 2.12.2 2017-03-24 13:31:01 -07:00
.travis.yml Merge branch 'ls/travis-stricter-linux32-builds' 2017-05-01 14:14:44 +09:00
abspath.c prefix_filename: simplify windows #ifdef 2017-03-21 11:18:41 -07:00
aclocal.m4
advice.c
advice.h
alias.c
alloc.c
apply.c prefix_filename: return newly allocated string 2017-03-21 11:18:41 -07:00
apply.h
archive-tar.c
archive-zip.c
archive.c
archive.h
argv-array.c
argv-array.h
attr.c pathspec: allow querying for attributes 2017-03-13 15:28:54 -07:00
attr.h pathspec: allow querying for attributes 2017-03-13 15:28:54 -07:00
base85.c
bisect.c Merge branch 'jk/war-on-git-path' 2017-04-26 15:39:08 +09:00
bisect.h
blob.c
blob.h
branch.c create_branch: use xstrfmt for reflog message 2017-03-30 14:59:50 -07:00
branch.h
builtin.h
bulk-checkin.c encode_in_pack_object_header: respect output buffer length 2017-03-24 12:34:07 -07:00
bulk-checkin.h
bundle.c
bundle.h
cache-tree.c
cache-tree.h
cache.h Merge branch 'jh/add-index-entry-optim' 2017-04-26 15:39:07 +09:00
check-builtins.sh
check-racy.c
check_bindir
color.c
color.h
column.c
column.h
combine-diff.c Merge branch 'bc/object-id' 2017-04-19 21:37:13 -07:00
command-list.txt
commit-slab.h
commit.c Merge branch 'rs/commit-parsing-optim' into maint 2017-03-21 15:03:29 -07:00
commit.h Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
common-main.c
config.c Merge branch 'nd/conditional-config-in-early-config' 2017-04-26 15:39:05 +09:00
config.mak.in
config.mak.uname grep: un-break building with PCRE >= 8.32 without --enable-jit 2017-06-02 08:29:05 +09:00
configure.ac grep: add support for PCRE v2 2017-06-02 08:29:05 +09:00
connect.c Merge branch 'sf/putty-w-args' 2017-04-26 15:39:10 +09:00
connect.h
connected.c
connected.h
convert.c
convert.h
copy.c
COPYING
credential-cache--daemon.c
credential-cache.c Merge branch 'nd/conditional-config-include' 2017-04-23 22:07:46 -07:00
credential-store.c path.c: and an option to call real_path() in expand_user_path() 2017-04-14 23:51:38 -07:00
credential.c
credential.h
csum-file.c
csum-file.h
ctype.c
daemon.c Merge branch 'dt/xgethostname-nul-termination' 2017-04-23 22:07:57 -07:00
date.c
decorate.c
decorate.h
delta.h
diff-delta.c
diff-lib.c
diff-no-index.c prefix_filename: return newly allocated string 2017-03-21 11:18:41 -07:00
diff.c fix minor typos 2017-05-01 11:01:52 +09:00
diff.h Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
diffcore-break.c
diffcore-delta.c
diffcore-order.c
diffcore-pickaxe.c Merge branch 'js/regexec-buf' into maint 2017-03-28 13:52:24 -07:00
diffcore-rename.c
diffcore.h
dir-iterator.c
dir-iterator.h
dir.c Merge branch 'sb/checkout-recurse-submodules' 2017-03-28 14:05:58 -07:00
dir.h
editor.c
entry.c entry.c: create submodules when interesting 2017-03-16 14:07:16 -07:00
environment.c Merge branch 'jk/snprintf-cleanups' 2017-04-16 23:29:26 -07:00
exec_cmd.c
exec_cmd.h
fast-import.c Merge branch 'jk/war-on-git-path' 2017-04-26 15:39:08 +09:00
fetch-pack.c Merge branch 'dt/xgethostname-nul-termination' 2017-04-23 22:07:57 -07:00
fetch-pack.h Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
fmt-merge-msg.h
fsck.c Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
fsck.h Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
generate-cmdlist.sh
gettext.c
gettext.h
git-add--interactive.perl Merge branch 'va/i18n-perl-scripts' 2017-04-19 21:37:17 -07:00
git-archimport.perl
git-bisect.sh
git-compat-util.h Merge branch 'dt/xgethostname-nul-termination' 2017-04-23 22:07:57 -07:00
git-cvsexportcommit.perl
git-cvsimport.perl
git-cvsserver.perl
git-difftool--helper.sh
git-filter-branch.sh
git-instaweb.sh
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh
git-mergetool.sh
git-p4.py git-p4: don't use name-rev to get current branch 2017-04-16 21:13:26 -07:00
git-parse-remote.sh
git-quiltimport.sh
git-rebase--am.sh
git-rebase--interactive.sh
git-rebase--merge.sh
git-rebase.sh Merge branch 'gb/rebase-signoff' 2017-04-26 15:39:02 +09:00
git-remote-testgit.sh
git-request-pull.sh
git-send-email.perl Merge branch 'jh/send-email-one-cc' into maint 2017-03-21 15:03:30 -07:00
git-sh-i18n.sh
git-sh-setup.sh
git-stash.sh stash: keep untracked files intact in stash -k 2017-03-22 14:55:56 -07:00
git-submodule.sh submodule: prevent backslash expantion in submodule names 2017-04-16 20:09:36 -07:00
git-svn.perl
GIT-VERSION-GEN Git 2.13 2017-05-09 23:26:02 +09:00
git-web--browse.sh
git.c Merge branch 'bw/recurse-submodules-relative-fix' 2017-03-30 14:07:15 -07:00
git.rc
gpg-interface.c
gpg-interface.h
graph.c
graph.h
grep.c grep: add support for PCRE v2 2017-06-02 08:29:05 +09:00
grep.h grep: add support for PCRE v2 2017-06-02 08:29:05 +09:00
hash.h Makefile: add DC_SHA1 knob 2017-03-17 10:40:25 -07:00
hashmap.c hashmap: add disallow_rehash setting 2017-03-22 13:41:41 -07:00
hashmap.h hashmap: add disallow_rehash setting 2017-03-22 13:41:41 -07:00
help.c
help.h
hex.c Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ 2017-03-26 22:08:21 -07:00
http-backend.c
http-fetch.c
http-push.c http-push: don't check return value of lookup_unknown_object() 2017-03-18 10:14:07 -07:00
http-walker.c Merge branch 'ew/http-alternates-as-redirects-warning' into maint 2017-03-28 13:52:23 -07:00
http.c Merge branch 'dt/http-postbuffer-can-be-large' 2017-04-23 22:07:45 -07:00
http.h http.postbuffer: allow full range of ssize_t values 2017-04-13 18:24:32 -07:00
ident.c Merge branch 'dt/xgethostname-nul-termination' 2017-04-23 22:07:57 -07:00
imap-send.c convert unchecked snprintf into xsnprintf 2017-03-30 14:59:50 -07:00
INSTALL
iterator.h
khash.h
kwset.c
kwset.h
levenshtein.c
levenshtein.h
LGPL-2.1
line-log.c Merge branch 'vn/line-log-memcpy-size-fix' into maint 2017-03-16 13:56:42 -07:00
line-log.h
line-range.c
line-range.h
list-objects.c
list-objects.h
list.h
ll-merge.c
ll-merge.h
lockfile.c
lockfile.h
log-tree.c
log-tree.h
mailinfo.c Merge branch 'lt/mailinfo-in-body-header-continuation' 2017-04-19 21:37:15 -07:00
mailinfo.h
mailmap.c
mailmap.h
Makefile grep: add support for PCRE v2 2017-06-02 08:29:05 +09:00
match-trees.c
merge-blobs.c
merge-blobs.h
merge-recursive.c
merge-recursive.h
merge.c
mergesort.c
mergesort.h
mru.c
mru.h
name-hash.c name-hash: fix buffer overrun 2017-03-31 20:57:18 -07:00
notes-cache.c
notes-cache.h
notes-merge.c replace strbuf_addstr(git_path()) with git_path_buf() 2017-04-20 21:04:20 -07:00
notes-merge.h
notes-utils.c
notes-utils.h
notes.c notes: do not break note_tree structure in note_tree_consolidate() 2017-03-27 21:21:25 -07:00
notes.h
object.c
object.h
oidset.c
oidset.h
pack-bitmap-write.c odb_mkstemp: write filename into strbuf 2017-03-28 15:28:04 -07:00
pack-bitmap.c
pack-bitmap.h
pack-check.c
pack-objects.c
pack-objects.h
pack-revindex.c
pack-revindex.h
pack-write.c odb_mkstemp: write filename into strbuf 2017-03-28 15:28:04 -07:00
pack.h pack.h: define largest possible encoded object size 2017-03-24 12:34:07 -07:00
pager.c Merge branch 'jk/pager-in-use' 2017-03-28 14:05:59 -07:00
parse-options-cb.c Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
parse-options.c prefix_filename: return newly allocated string 2017-03-21 11:18:41 -07:00
parse-options.h ref-filter: add --no-contains option to tag/branch/for-each-ref 2017-03-24 12:15:26 -07:00
patch-delta.c
patch-ids.c Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ 2017-03-26 22:08:21 -07:00
patch-ids.h Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ 2017-03-26 22:08:21 -07:00
path.c Merge branch 'nd/conditional-config-include' 2017-04-23 22:07:46 -07:00
pathspec.c Merge branch 'ps/pathspec-empty-prefix-origin' 2017-04-26 15:39:03 +09:00
pathspec.h pathspec: allow querying for attributes 2017-03-13 15:28:54 -07:00
pkt-line.c
pkt-line.h
preload-index.c
pretty.c
prio-queue.c Merge branch 'jk/prio-queue-avoid-swap-with-self' 2017-05-01 14:14:43 +09:00
prio-queue.h
progress.c
progress.h
prompt.c
prompt.h
quote.c
quote.h
reachable.c
reachable.h
read-cache.c i18n: read-cache: typofix 2017-05-01 11:08:02 +09:00
README.md
ref-filter.c Merge branch 'bc/object-id' 2017-04-19 21:37:13 -07:00
ref-filter.h Merge branch 'bc/object-id' 2017-04-19 21:37:13 -07:00
reflog-walk.c
reflog-walk.h
refs.c Merge branch 'mh/separate-ref-cache' 2017-04-26 15:39:13 +09:00
refs.h refs_verify_refname_available(): implement once for all backends 2017-04-16 21:32:45 -07:00
RelNotes Git 2.11.2 2017-05-05 13:29:43 +09:00
remote-curl.c Merge branch 'dt/http-postbuffer-can-be-large' 2017-04-23 22:07:45 -07:00
remote-testsvn.c
remote.c Merge branch 'bw/push-options-recursively-to-submodules' 2017-04-19 21:37:14 -07:00
remote.h Merge branch 'bw/push-options-recursively-to-submodules' 2017-04-19 21:37:14 -07:00
replace_object.c
rerere.c
rerere.h
resolve-undo.c
resolve-undo.h
revision.c log: add -P as a synonym for --perl-regexp 2017-05-26 12:59:05 +09:00
revision.h Merge branch 'rs/path-name-safety-cleanup' into maint 2017-03-28 13:52:27 -07:00
run-command.c Merge branch 'jk/execv-dashed-external' into maint 2017-03-28 13:52:23 -07:00
run-command.h
send-pack.c Merge branch 'bc/object-id' 2017-04-19 21:37:13 -07:00
send-pack.h Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
sequencer.c Merge branch 'sh/rebase-i-reread-todo-after-exec' 2017-05-01 14:14:44 +09:00
sequencer.h
server-info.c server-info: avoid calling fclose(3) twice in update_info_file() 2017-04-17 17:37:28 -07:00
setup.c Merge branch 'bw/recurse-submodules-relative-fix' 2017-03-30 14:07:15 -07:00
sh-i18n--envsubst.c
sha1-array.c Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
sha1-array.h Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
sha1-lookup.c
sha1-lookup.h
sha1_file.c Merge branch 'jk/loose-object-fsck' 2017-04-23 22:07:50 -07:00
sha1_name.c Merge branch 'bc/object-id' 2017-04-19 21:37:13 -07:00
shallow.c Rename sha1_array to oid_array 2017-03-31 08:33:56 -07:00
shell.c Merge branch 'maint-2.8' into maint-2.9 2017-05-05 13:13:48 +09:00
shortlog.h
show-index.c
sideband.c
sideband.h
sigchain.c
sigchain.h
split-index.c
split-index.h
strbuf.c Merge branch 'rs/freebsd-getcwd-workaround' 2017-03-30 14:07:15 -07:00
strbuf.h Merge branch 'jk/interpret-branch-name' into maint 2017-03-28 13:52:22 -07:00
streaming.c
streaming.h
string-list.c Merge branch 'jh/string-list-micro-optim' 2017-04-23 22:07:47 -07:00
string-list.h
submodule-config.c Merge branch 'sb/checkout-recurse-submodules' 2017-03-28 14:05:58 -07:00
submodule-config.h update submodules: add submodule config parsing 2017-03-15 18:15:54 -07:00
submodule.c Merge branch 'sb/checkout-recurse-submodules' 2017-04-23 22:07:54 -07:00
submodule.h Merge branch 'nd/files-backend-git-dir' 2017-04-19 21:37:19 -07:00
symlinks.c
tag.c
tag.h
tar.h
tempfile.c
tempfile.h
thread-utils.c
thread-utils.h
tmp-objdir.c
tmp-objdir.h
trace.c
trace.h
trailer.c
trailer.h
transport-helper.c transport-helper: replace checked snprintf with xsnprintf 2017-03-30 14:59:50 -07:00
transport.c Merge branch 'bw/push-options-recursively-to-submodules' 2017-04-19 21:37:14 -07:00
transport.h
tree-diff.c
tree-walk.c
tree-walk.h
tree.c
tree.h
unicode_width.h
unimplemented.sh
unix-socket.c
unix-socket.h
unpack-trees.c Merge branch 'jh/unpack-trees-micro-optim' 2017-04-23 22:07:48 -07:00
unpack-trees.h unpack-trees: check if we can perform the operation for submodules 2017-03-16 14:07:16 -07:00
upload-pack.c
url.c
url.h
urlmatch.c
urlmatch.h
usage.c
userdiff.c
userdiff.h
utf8.c
utf8.h
varint.c
varint.h
version.c
version.h
versioncmp.c
walker.c
walker.h
wildmatch.c
wildmatch.h
worktree.c Merge branch 'rs/strbuf-add-real-path' into maint 2017-03-28 13:52:19 -07:00
worktree.h
wrap-for-bin.sh
wrapper.c Merge branch 'dt/xgethostname-nul-termination' 2017-04-23 22:07:57 -07:00
write_or_die.c
ws.c
wt-status.c short status: improve reporting for submodule changes 2017-03-29 15:27:54 -07:00
wt-status.h Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ 2017-03-26 22:08:21 -07:00
xdiff-interface.c
xdiff-interface.h
zlib.c

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git/, http://marc.info/?l=git and other archival sites.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks