2023-04-27 22:22:22 +00:00
|
|
|
= coccinelle
|
2018-11-10 00:10:52 +00:00
|
|
|
|
2023-04-27 22:22:22 +00:00
|
|
|
This directory provides Coccinelle (http://coccinelle.lip6.fr/) semantic patches
|
|
|
|
that might be useful to developers.
|
|
|
|
|
|
|
|
== Types of semantic patches
|
2018-11-10 00:10:52 +00:00
|
|
|
|
|
|
|
* Using the semantic transformation to check for bad patterns in the code;
|
|
|
|
The target 'make coccicheck' is designed to check for these patterns and
|
|
|
|
it is expected that any resulting patch indicates a regression.
|
|
|
|
The patches resulting from 'make coccicheck' are small and infrequent,
|
|
|
|
so once they are found, they can be sent to the mailing list as per usual.
|
|
|
|
|
|
|
|
Example for introducing new patterns:
|
|
|
|
67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
|
|
|
|
b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
|
|
|
|
|
|
|
|
Example of fixes using this approach:
|
|
|
|
248f66ed8e (run-command: use strbuf_addstr() for adding a string to
|
|
|
|
a strbuf, 2018-03-25)
|
|
|
|
f919ffebed (Use MOVE_ARRAY, 2018-01-22)
|
|
|
|
|
|
|
|
These types of semantic patches are usually part of testing, c.f.
|
|
|
|
0860a7641b (travis-ci: fail if Coccinelle static analysis found something
|
|
|
|
to transform, 2018-07-23)
|
|
|
|
|
|
|
|
* Using semantic transformations in large scale refactorings throughout
|
|
|
|
the code base.
|
|
|
|
|
|
|
|
When applying the semantic patch into a real patch, sending it to the
|
|
|
|
mailing list in the usual way, such a patch would be expected to have a
|
|
|
|
lot of textual and semantic conflicts as such large scale refactorings
|
|
|
|
change function signatures that are used widely in the code base.
|
|
|
|
A textual conflict would arise if surrounding code near any call of such
|
|
|
|
function changes. A semantic conflict arises when other patch series in
|
|
|
|
flight introduce calls to such functions.
|
|
|
|
|
|
|
|
So to aid these large scale refactorings, semantic patches can be used.
|
|
|
|
However we do not want to store them in the same place as the checks for
|
|
|
|
bad patterns, as then automated builds would fail.
|
|
|
|
That is why semantic patches 'contrib/coccinelle/*.pending.cocci'
|
|
|
|
are ignored for checks, and can be applied using 'make coccicheck-pending'.
|
|
|
|
|
|
|
|
This allows to expose plans of pending large scale refactorings without
|
|
|
|
impacting the bad pattern checks.
|
cocci: optimistically use COMPUTE_HEADER_DEPENDENCIES
Improve the incremental rebuilding support of "coccicheck" by
piggy-backing on the computed dependency information of the
corresponding *.o file, rather than rebuilding all <RULE>/<FILE> pairs
if either their corresponding file changes, or if any header changes.
This in effect uses the same method that the "sparse" target was made
to use in c234e8a0ecf (Makefile: make the "sparse" target non-.PHONY,
2021-09-23), except that the dependency on the *.o file isn't a hard
one, we check with $(wildcard) if the *.o file exists, and if so we'll
depend on it.
This means that the common case of:
make
make coccicheck
Will benefit from incremental rebuilding, now changing e.g. a header
will only re-run "spatch" on those those *.c files that make use of
it:
By depending on the *.o we piggy-back on
COMPUTE_HEADER_DEPENDENCIES. See c234e8a0ecf (Makefile: make the
"sparse" target non-.PHONY, 2021-09-23) for prior art of doing that
for the *.sp files. E.g.:
make contrib/coccinelle/free.cocci.patch
make -W column.h contrib/coccinelle/free.cocci.patch
Will take around 15 seconds for the second command on my 8 core box if
I didn't run "make" beforehand to create the *.o files. But around 2
seconds if I did and we have those "*.o" files.
Notes about the approach of piggy-backing on *.o for dependencies:
* It *is* a trade-off since we'll pay the extra cost of running the C
compiler, but we're probably doing that anyway. The compiler is much
faster than "spatch", so even though we need to re-compile the *.o to
create the dependency info for the *.c for "spatch" it's
faster (especially if using "ccache").
* There *are* use-cases where some would like to have *.o files
around, but to have the "make coccicheck" ignore them. See:
https://lore.kernel.org/git/20220826104312.GJ1735@szeder.dev/
For those users a:
make
make coccicheck SPATCH_USE_O_DEPENDENCIES=
Will avoid considering the *.o files.
* If that *.o file doesn't exist we'll depend on an intermediate file
of ours which in turn depends on $(FOUND_H_SOURCES).
This covers both an initial build, or where "coccicheck" is run
without running "all" beforehand, and because we run "coccicheck"
on e.g. files in compat/* that we don't know how to build unless
the requisite flag was provided to the Makefile.
Most of the runtime of "incremental" runs is now spent on various
compat/* files, i.e. we conditionally add files to COMPAT_OBJS, and
therefore conflate whether we *can* compile an object and generate
dependency information for it with whether we'd like to link it
into our binary.
Before this change the distinction didn't matter, but now one way
to make this even faster on incremental builds would be to peel
those concerns apart so that we can see that e.g. compat/mmap.c
doesn't depend on column.h.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-01 22:35:51 +00:00
|
|
|
|
2023-04-27 22:22:22 +00:00
|
|
|
== Git-specific tips & things to know about how we run "spatch":
|
cocci: optimistically use COMPUTE_HEADER_DEPENDENCIES
Improve the incremental rebuilding support of "coccicheck" by
piggy-backing on the computed dependency information of the
corresponding *.o file, rather than rebuilding all <RULE>/<FILE> pairs
if either their corresponding file changes, or if any header changes.
This in effect uses the same method that the "sparse" target was made
to use in c234e8a0ecf (Makefile: make the "sparse" target non-.PHONY,
2021-09-23), except that the dependency on the *.o file isn't a hard
one, we check with $(wildcard) if the *.o file exists, and if so we'll
depend on it.
This means that the common case of:
make
make coccicheck
Will benefit from incremental rebuilding, now changing e.g. a header
will only re-run "spatch" on those those *.c files that make use of
it:
By depending on the *.o we piggy-back on
COMPUTE_HEADER_DEPENDENCIES. See c234e8a0ecf (Makefile: make the
"sparse" target non-.PHONY, 2021-09-23) for prior art of doing that
for the *.sp files. E.g.:
make contrib/coccinelle/free.cocci.patch
make -W column.h contrib/coccinelle/free.cocci.patch
Will take around 15 seconds for the second command on my 8 core box if
I didn't run "make" beforehand to create the *.o files. But around 2
seconds if I did and we have those "*.o" files.
Notes about the approach of piggy-backing on *.o for dependencies:
* It *is* a trade-off since we'll pay the extra cost of running the C
compiler, but we're probably doing that anyway. The compiler is much
faster than "spatch", so even though we need to re-compile the *.o to
create the dependency info for the *.c for "spatch" it's
faster (especially if using "ccache").
* There *are* use-cases where some would like to have *.o files
around, but to have the "make coccicheck" ignore them. See:
https://lore.kernel.org/git/20220826104312.GJ1735@szeder.dev/
For those users a:
make
make coccicheck SPATCH_USE_O_DEPENDENCIES=
Will avoid considering the *.o files.
* If that *.o file doesn't exist we'll depend on an intermediate file
of ours which in turn depends on $(FOUND_H_SOURCES).
This covers both an initial build, or where "coccicheck" is run
without running "all" beforehand, and because we run "coccicheck"
on e.g. files in compat/* that we don't know how to build unless
the requisite flag was provided to the Makefile.
Most of the runtime of "incremental" runs is now spent on various
compat/* files, i.e. we conditionally add files to COMPAT_OBJS, and
therefore conflate whether we *can* compile an object and generate
dependency information for it with whether we'd like to link it
into our binary.
Before this change the distinction didn't matter, but now one way
to make this even faster on incremental builds would be to peel
those concerns apart so that we can see that e.g. compat/mmap.c
doesn't depend on column.h.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-01 22:35:51 +00:00
|
|
|
|
|
|
|
* The "make coccicheck" will piggy-back on
|
|
|
|
"COMPUTE_HEADER_DEPENDENCIES". If you've built a given object file
|
|
|
|
the "coccicheck" target will consider its depednency to decide if
|
|
|
|
it needs to re-run on the corresponding source file.
|
|
|
|
|
|
|
|
This means that a "make coccicheck" will re-compile object files
|
|
|
|
before running. This might be unexpected, but speeds up the run in
|
|
|
|
the common case, as e.g. a change to "column.h" won't require all
|
|
|
|
coccinelle rules to be re-run against "grep.c" (or another file
|
|
|
|
that happens not to use "column.h").
|
|
|
|
|
|
|
|
To disable this behavior use the "SPATCH_USE_O_DEPENDENCIES=NoThanks"
|
|
|
|
flag.
|
cocci: run against a generated ALL.cocci
The preceding commits to make the "coccicheck" target incremental made
it slower in some cases. As an optimization let's not have the
many=many mapping of <*.cocci>=<*.[ch]>, but instead concat the
<*.cocci> into an ALL.cocci, and then run one-to-many
ALL.cocci=<*.[ch]>.
A "make coccicheck" is now around 2x as fast as it was on "master",
and around 1.5x as fast as the preceding change to make the run
incremental:
$ git hyperfine -L rev origin/master,HEAD~,HEAD -p 'make clean' 'make coccicheck SPATCH=spatch COCCI_SOURCES="$(echo $(ls o*.c builtin/h*.c))"' -r 3
Benchmark 1: make coccicheck SPATCH=spatch COCCI_SOURCES="$(echo $(ls o*.c builtin/h*.c))"' in 'origin/master
Time (mean ± σ): 4.258 s ± 0.015 s [User: 27.432 s, System: 1.532 s]
Range (min … max): 4.241 s … 4.268 s 3 runs
Benchmark 2: make coccicheck SPATCH=spatch COCCI_SOURCES="$(echo $(ls o*.c builtin/h*.c))"' in 'HEAD~
Time (mean ± σ): 5.365 s ± 0.079 s [User: 36.899 s, System: 1.810 s]
Range (min … max): 5.281 s … 5.436 s 3 runs
Benchmark 3: make coccicheck SPATCH=spatch COCCI_SOURCES="$(echo $(ls o*.c builtin/h*.c))"' in 'HEAD
Time (mean ± σ): 2.725 s ± 0.063 s [User: 14.796 s, System: 0.233 s]
Range (min … max): 2.667 s … 2.792 s 3 runs
Summary
'make coccicheck SPATCH=spatch COCCI_SOURCES="$(echo $(ls o*.c builtin/h*.c))"' in 'HEAD' ran
1.56 ± 0.04 times faster than 'make coccicheck SPATCH=spatch COCCI_SOURCES="$(echo $(ls o*.c builtin/h*.c))"' in 'origin/master'
1.97 ± 0.05 times faster than 'make coccicheck SPATCH=spatch COCCI_SOURCES="$(echo $(ls o*.c builtin/h*.c))"' in 'HEAD~'
This can be turned off with SPATCH_CONCAT_COCCI, but as the
beneficiaries of "SPATCH_CONCAT_COCCI=" would mainly be those
developing the *.cocci rules themselves, let's leave this optimization
on by default.
For more information see my "Optimizing *.cocci rules by concat'ing
them" (<220901.8635dbjfko.gmgdl@evledraar.gmail.com>) on the
cocci@inria.fr mailing list.
This potentially changes the results of our *.cocci rules, but as
noted in that discussion it should be safe for our use. We don't name
rules, or if we do their names don't conflict across our *.cocci
files.
To the extent that we'd have any inter-dependencies between rules this
doesn't make that worse, as we'd have them now if we ran "make
coccicheck", applied the results, and would then have (due to
hypothetical interdependencies) suggested changes on the subsequent
"make coccicheck".
Our "coccicheck-test" target makes use of the ALL.cocci when running
tests, e.g. when testing unused.{c,out} we test it against ALL.cocci,
not unused.cocci. We thus assert (to the extent that we have test
coverage) that this concatenation doesn't change the expected results
of running these rules.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-01 22:35:54 +00:00
|
|
|
|
|
|
|
* To speed up our rules the "make coccicheck" target will by default
|
|
|
|
concatenate all of the *.cocci files here into an "ALL.cocci", and
|
|
|
|
apply it to each source file.
|
|
|
|
|
|
|
|
This makes the run faster, as we don't need to run each rule
|
|
|
|
against each source file. See the Makefile for further discussion,
|
|
|
|
this behavior can be disabled with "SPATCH_CONCAT_COCCI=".
|
|
|
|
|
|
|
|
But since they're concatenated any <id> in the <rulname> (e.g. "@
|
|
|
|
my_name", v.s. anonymous "@@") needs to be unique across all our
|
|
|
|
*.cocci files. You should only need to name rules if other rules
|
|
|
|
depend on them (currently only one rule is named).
|
spatchcache: add a ccache-alike for "spatch"
Add a rather trivial "spatchcache", with this running e.g.:
make cocciclean
make contrib/coccinelle/free.cocci.patch \
SPATCH=contrib/coccicheck/spatchcache \
SPATCH_FLAGS=--very-quiet
Is cut down from ~20s to ~5s on my system. Much of that is either
fixable shell overhead, or the around 40 files we "CANTCACHE" (see the
implementation).
This uses "redis" as a cache by default, but it's configurable. See
the embedded documentation.
This is *not* like ccache in that we won't cache failed spatch
invocations, or those where spatch suggests changes for us. Those
cases are so rare that I didn't think it was worth the bother, by far
the most common case is that it has no suggested changes. We'll also
refuse to cache any "spatch" invocation that has output on stderr,
which means that "--very-quiet" must be added to "SPATCH_FLAGS".
Because we narrow the cache to that we don't need to save away stdout,
stderr & the exit code. We simply cache the cases where we had no
suggested changes.
Another benchmark is to compare this with the previous
SPATCH_BATCH_SIZE=N, as noted in [1]. Before this (on my 8 core system) running:
make clean; time make contrib/coccinelle/array.cocci.patch SPATCH_BATCH_SIZE=0
Would take 33s, but with the preceding changes running without this
"spatchcache" is slightly slower, or around 35s:
make clean; time make contrib/coccinelle/array.cocci.patch
Now doing the same with SPATCH=contrib/coccinelle/spatchcache will
take around 6s, but we'll need to compile the *.o files first to take
full advantage of it (which can be fast with "ccache"):
make clean; make; time make contrib/coccinelle/array.cocci.patch SPATCH=contrib/coccinelle/spatchcache
1. https://lore.kernel.org/git/YwdRqP1CyUAzCEn2@coredump.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-01 22:35:55 +00:00
|
|
|
|
|
|
|
* To speed up incremental runs even more use the "spatchcache" tool
|
|
|
|
in this directory as your "SPATCH". It aimns to be a "ccache" for
|
|
|
|
coccinelle, and piggy-backs on "COMPUTE_HEADER_DEPENDENCIES".
|
|
|
|
|
|
|
|
It caches in Redis by default, see it source for a how-to.
|
|
|
|
|
|
|
|
In one setup with a primed cache "make coccicheck" followed by a
|
|
|
|
"make clean && make" takes around 10s to run, but 2m30s with the
|
|
|
|
default of "SPATCH_CONCAT_COCCI=Y".
|
|
|
|
|
|
|
|
With "SPATCH_CONCAT_COCCI=" the total runtime is around ~6m, sped
|
|
|
|
up to ~1m with "spatchcache".
|
|
|
|
|
|
|
|
Most of the 10s (or ~1m) being spent on re-running "spatch" on
|
|
|
|
files we couldn't cache, as we didn't compile them (in contrib/*
|
|
|
|
and compat/* mostly).
|
|
|
|
|
|
|
|
The absolute times will differ for you, but the relative speedup
|
|
|
|
from caching should be on that order.
|
2023-04-27 22:22:23 +00:00
|
|
|
|
|
|
|
== Authoring and reviewing coccinelle changes
|
|
|
|
|
|
|
|
* When a .cocci is made, both the Git changes and .cocci file should be
|
|
|
|
reviewed. When reviewing such a change, do your best to understand the .cocci
|
|
|
|
changes (e.g. by asking the author to explain the change) and be explicit
|
|
|
|
about your understanding of the changes. This helps us decide whether input
|
|
|
|
from coccinelle experts is needed or not. If you aren't sure of the cocci
|
|
|
|
changes, indicate what changes you actively endorse and leave an Acked-by
|
|
|
|
(instead of Reviewed-by).
|
|
|
|
|
|
|
|
* Authors should consider that reviewers may not be coccinelle experts, thus the
|
|
|
|
the .cocci changes may not be self-evident. A plain text description of the
|
|
|
|
changes is strongly encouraged, especially when using more esoteric features
|
|
|
|
of the language.
|
|
|
|
|
|
|
|
* .cocci rules should target only the problem it is trying to solve; "collateral
|
|
|
|
damage" is not allowed. Reviewers should look out and flag overly-broad rules.
|
|
|
|
|
|
|
|
* Consider the cost-benefit ratio of .cocci changes. In particular, consider the
|
|
|
|
effect on the runtime of "make coccicheck", and how often your .cocci check
|
|
|
|
will catch something valuable. As a rule of thumb, rules that can bail early
|
|
|
|
if a file doesn't have a particular token will have a small impact on runtime,
|
|
|
|
and vice-versa.
|
|
|
|
|
|
|
|
* .cocci files used for refactoring should be temporarily kept in-tree to aid
|
|
|
|
the refactoring of out-of-tree code (e.g. in-flight topics). Periodically
|
|
|
|
evaluate the cost-benefit ratio to determine when the file should be removed.
|
|
|
|
For example, consider how many out-of-tree users are left and how much this
|
|
|
|
slows down "make coccicheck".
|