Commit graph

116 commits

Author SHA1 Message Date
Junio C Hamano
c9f1f88bb0 Merge branch 'la/format-trailer-info'
The code to format trailers have been cleaned up.

* la/format-trailer-info:
  trailer: finish formatting unification
  trailer: begin formatting unification
  format_trailer_info(): append newline for non-trailer lines
  format_trailer_info(): drop redundant unfold_value()
  format_trailer_info(): use trailer_item objects
2024-04-23 11:52:39 -07:00
Junio C Hamano
dce1e0b6da Merge branch 'jk/core-comment-string'
core.commentChar used to be limited to a single byte, but has been
updated to allow an arbitrary multi-byte sequence.

* jk/core-comment-string:
  config: add core.commentString
  config: allow multi-byte core.commentChar
  environment: drop comment_line_char compatibility macro
  wt-status: drop custom comment-char stringification
  sequencer: handle multi-byte comment characters when writing todo list
  find multi-byte comment chars in unterminated buffers
  find multi-byte comment chars in NUL-terminated strings
  prefer comment_line_str to comment_line_char for printing
  strbuf: accept a comment string for strbuf_add_commented_lines()
  strbuf: accept a comment string for strbuf_commented_addf()
  strbuf: accept a comment string for strbuf_stripspace()
  environment: store comment_line_char as a string
  strbuf: avoid shadowing global comment_line_char name
  commit: refactor base-case of adjust_comment_line_char()
  strbuf: avoid static variables in strbuf_add_commented_lines()
  strbuf: simplify comment-handling in add_lines() helper
  config: forbid newline as core.commentChar
2024-04-05 10:49:49 -07:00
Linus Arver
3452d17324 trailer: finish formatting unification
Rename format_trailer_info() to format_trailers(). Finally, both
interpret-trailers and format_trailers_from_commit() can call
"format_trailers()"!

Update the comment in <trailer.h> to remove the (now obsolete) caveats
about format_trailers_from_commit(). Those caveats come from
a388b10fc1 (pretty: move trailer formatting to trailer.c, 2017-08-15)
where it says:

    pretty: move trailer formatting to trailer.c

    The next commit will add many features to the %(trailer)
    placeholder in pretty.c. We'll need to access some internal
    functions of trailer.c for that, so our options are either:

      1. expose those functions publicly

    or

      2. make an entry point into trailer.c to do the formatting

    Doing (2) ends up exposing less surface area, though do note
    that caveats in the docstring of the new function.

which suggests format_trailers_from_commit() started out from pretty.c
and did not have access to all of the trailer implementation internals,
and was never intended to replace (unify) the formatting machinery in
trailer.c. The refactors leading up to this commit (as well as
additional refactors that will follow) expose additional functions
publicly, and is therefore choosing option (1) as described in
a388b10fc1.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-15 10:10:25 -07:00
Linus Arver
676c1db76e trailer: begin formatting unification
Now that the preparatory refactors are over, we can replace the call to
format_trailers() in interpret-trailers with format_trailer_info(). This
unifies the trailer formatting machinery

In order to avoid breakages in t7502 and t7513, we have to steal the
features present in format_trailers(). Namely, we have to teach
format_trailer_info() as follows:

  (1) make it aware of opts->trim_empty, and

  (2) make it avoid hardcoding ": " as the separator and space (which
  can result in double-printing these characters).

For (2), make it only print the separator and space if we cannot find
any recognized separator somewhere in the key (yes, keys may have a
trailing separator in it --- we will eventually fix this design but not
now). Do so by copying the code out of print_tok_val(), and deleting the
same function.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-15 10:10:25 -07:00
Linus Arver
9f0c9702de format_trailer_info(): append newline for non-trailer lines
This wraps up the preparatory refactors to unify the trailer formatters.

Two patches ago we made format_trailer_info() use trailer_item objects
instead of the "trailers" string array. The strings in the array
include trailing newlines, because the string array is split up with

    trailer_lines = strbuf_split_buf(str + trailer_block_start,
                                     end_of_log_message - trailer_block_start,
                                     '\n',
                                     0);

in trailer_info_get() and strbuf_split_buf() includes the terminator (in
this case the newline character '\n') for each split-up substring.

And before we made the transition to use trailer_item objects for it,
format_trailer_info() called parse_trailer() (which trims newlines) for
trailer lines but did _not_ call parse_trailer() for non-trailer lines.
So for trailer lines it had to add back the trimmed newline like this

    if (!opts->separator)
        strbuf_addch(out, '\n');

But for non-trailer lines it didn't have to add back the newline because
it could just reuse same string in the "trailers" string array (which
again, already included the trailing newline).

Now that format_trailer_info() uses trailer_item objects for all cases,
it can't rely on "trailers" string array anymore.  And so it must be
taught to add a newline back when printing non-trailer lines, just like
it already does for trailer lines. Do so now.

The test suite can pass again without the need to hide failures
with *_failure, so flip the affected test cases back to *_success. Now,
format_trailer_info() is in better shape to supersede format_trailers(),
which we'll do in the next commit.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-15 10:10:25 -07:00
Linus Arver
41ea0a9002 format_trailer_info(): drop redundant unfold_value()
This is another preparatory refactor to unify the trailer formatters.

In the last patch we made format_trailer_info() use trailer_item objects
instead of the "trailers" string array. This means that the call to
unfold_value() here is redundant because the trailer_item objects are
already unfolded in parse_trailers() which is a dependency of our
caller, format_trailers_from_commit().

Remove the redundant call.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-15 10:10:24 -07:00
Linus Arver
65b4ad82b8 format_trailer_info(): use trailer_item objects
This is another preparatory refactor to unify the trailer formatters.

Make format_trailer_info() operate on trailer_item objects, not the raw
string array.

We will continue to make improvements, culminating in the renaming of
format_trailer_info() to format_trailers(), at which point the
unification of these formatters will be complete.

In order to avoid breaking t4205 and t6300, flip *_success to *_failure
in the affected test cases. Add a temporary
"test_trailer_option_expect_failure" wrapper which we will use along
with "test_expect_failure" in the next commit to avoid breaking tests.
When the dust settles with the refactors a few more commits later, we
will drop the use of *_failure to make the tests truly pass again.

When the preparatory refactors are complete,
we'll be able to drop the use of *_failure that we introduce here.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-15 10:10:24 -07:00
Junio C Hamano
4fecb94887 Merge branch 'la/trailer-api'
Trailer API updates.

Acked-by: Christian Couder <christian.couder@gmail.com>
cf. <CAP8UFD1Zd+9q0z1JmfOf60S2vn5-sD3SafDvAJUzRFwHJKcb8A@mail.gmail.com>

* la/trailer-api:
  format_trailers_from_commit(): indirectly call trailer_info_get()
  format_trailer_info(): move "fast path" to caller
  format_trailers(): use strbuf instead of FILE
  trailer_info_get(): reorder parameters
  trailer: move interpret_trailers() to interpret-trailers.c
  trailer: reorder format_trailers_from_commit() parameters
  trailer: rename functions to use 'trailer'
  shortlog: add test for de-duplicating folded trailers
  trailer: free trailer_info _after_ all related usage
2024-03-14 14:05:24 -07:00
Jeff King
2ec225d397 find multi-byte comment chars in unterminated buffers
As with the previous patch, we need to swap out single-byte matching for
something like starts_with() to match all bytes of a multi-byte comment
character. But for cases where the buffer is not NUL-terminated (and we
instead have an explicit size or end pointer), it's not safe to use
starts_with(), as it might walk off the end of the buffer.

Let's introduce a new starts_with_mem() that does the same thing but
also accepts the length of the "haystack" str and makes sure not to walk
past it.

Note that in most cases the existing code did not need a length check at
all, since it was written in a way that knew we had at least one byte
available (and that was all we checked). So I had to read each one to
find the appropriate bounds. The one exception is sequencer.c's
add_commented_lines(), where we can actually get rid of the length
check. Just like starts_with(), our starts_with_mem() handles an empty
haystack variable by not matching (assuming a non-empty prefix).

A few notes on the implementation of starts_with_mem():

  - it would be equally correct to take an "end" pointer (and indeed,
    many of the callers have this and have to subtract to come up with
    the length). I think taking a ptr/size combo is a more usual
    interface for our codebase, though, and has the added benefit that
    the function signature makes it harder to mix up the three
    parameters.

  - we could obviously build starts_with() on top of this by passing
    strlen(str) as the length. But it's possible that starts_with() is a
    relatively hot code path, and it should not pay that penalty (it can
    generally return an answer proportional to the size of the prefix,
    not the whole string).

  - it naively feels like xstrncmpz() should be able to do the same
    thing, but that's not quite true. If you pass the length of the
    haystack buffer, then strncmp() finds that a shorter prefix string
    is "less than" than the haystack, even if the haystack starts with
    the prefix. If you pass the length of the prefix, then you risk
    reading past the end of the haystack if it is shorter than the
    prefix. So I think we really do need a new function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-12 13:28:10 -07:00
Jeff King
600559b716 find multi-byte comment chars in NUL-terminated strings
Several parts of the code need to identify lines that begin with the
comment character, and do so with a simple byte equality check. As part
of the transition to handling multi-byte characters, we need to match
all of the bytes. For cases where we are looking in a NUL-terminated
string, we can just use starts_with(), which checks all of the
characters in comment_line_str.

Note that we can drop the "line.len" check in wt-status.c's
read_rebase_todolist(). The starts_with() function handles the case of
an empty haystack buffer (it will always return false for a non-empty
prefix).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-12 13:28:10 -07:00
Linus Arver
35ca4411a0 format_trailers_from_commit(): indirectly call trailer_info_get()
This is another preparatory refactor to unify the trailer formatters.

For background, note that the "trailers" string array is the
`char **trailers` member in `struct trailer_info` and that the
trailer_item objects are the elements of the `struct list_head *head`
linked list.

Currently trailer_info_get() only populates `char **trailers`. And
parse_trailers() first calls trailer_info_get() so that it can use the
`char **trailers` to populate a list of `struct trailer_item` objects

Instead of calling trailer_info_get() directly from
format_trailers_from_commit(), make it call parse_trailers() instead
because parse_trailers() already calls trailer_info_get().

This change is a NOP because format_trailer_info() (which
format_trailers_from_commit() wraps around) only looks at the "trailers"
string array, not the trailer_item objects which parse_trailers()
populates. For now we do need to create a dummy

    LIST_HEAD(trailer_objects);

because parse_trailers() expects it in its signature.

In a future patch, we'll change format_trailer_info() to use the parsed
trailer_item objects (trailer_objects) instead of the `char **trailers`
array.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Linus Arver
2c948a78fd format_trailer_info(): move "fast path" to caller
This is another preparatory refactor to unify the trailer formatters.

This allows us to drop the "msg" parameter from format_trailer_info(),
so that it take 3 parameters, similar to format_trailers() which also
takes 3 parameters:

    void format_trailers(const struct process_trailer_options *opts,
                         struct list_head *trailers,
                         struct strbuf *out)

The short-term goal is to make format_trailer_info() be smart enough to
deprecate format_trailers(). And then ultimately we will rename
format_trailer_info() to format_trailers().

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Linus Arver
bf35e0a018 format_trailers(): use strbuf instead of FILE
This is another preparatory refactor to unify the trailer formatters.

Make format_trailers() also write to a strbuf, to align with
format_trailers_from_commit() which also does the same. Doing this makes
format_trailers() behave similar to format_trailer_info() (which will
soon help us replace one with the other).

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Linus Arver
9aa1b2bc89 trailer_info_get(): reorder parameters
This is another preparatory refactor to unify the trailer formatters.

Take

    const struct process_trailer_options *opts

as the first parameter, because these options are required for
parsing trailers (e.g., whether to treat "---" as the end of the log
message). And take

    struct trailer_info *info

last, because it's an "out parameter" (something that the caller wants
to use as the output of this function).

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Linus Arver
ae0ec2e0e0 trailer: move interpret_trailers() to interpret-trailers.c
The interpret-trailers.c builtin is the only place we need to call
interpret_trailers(), so move its definition there (together with a few
helper functions called only by it) and remove its external declaration
from <trailer.h>.

Several helper functions that are called by interpret_trailers() remain
in trailer.c because other callers in the same file still call them.
Declare them in <trailer.h> so that interpret_trailers() (now in
builtin/interpret-trailers.c) can continue calling them as a trailer API
user.

This enriches <trailer.h> with a more granular API, which can then be
unit-tested in the future (because interpret_trailers() by itself does
too many things to be able to be easily unit-tested).

Take this opportunity to demote some file-handling functions out of the
trailer API implementation, as these have nothing to do with trailers.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Linus Arver
0383dc5629 trailer: reorder format_trailers_from_commit() parameters
Currently there are two functions for formatting trailers in
<trailer.h>:

    void format_trailers(const struct process_trailer_options *,
                         struct list_head *trailers, FILE *outfile);

    void format_trailers_from_commit(struct strbuf *out, const char *msg,
                                     const struct process_trailer_options *opts);

and although they are similar enough (even taking the same
process_trailer_options struct pointer) they are used quite differently.
One might intuitively think that format_trailers_from_commit() builds on
top of format_trailers(), but this is not the case. Instead
format_trailers_from_commit() calls format_trailer_info() and
format_trailers() is never called in that codepath.

This is a preparatory refactor to help us deprecate format_trailers() in
favor of format_trailer_info() (at which point we can rename the latter
to the former). When the deprecation is complete, both
format_trailers_from_commit(), and the interpret-trailers builtin will
be able to call into the same helper function (instead of
format_trailers() and format_trailer_info(), respectively). Unifying the
formatters is desirable because it simplifies the API.

Reorder parameters for format_trailers_from_commit() to prefer

    const struct process_trailer_options *opts

as the first parameter, because these options are intimately tied to
formatting trailers. And take

    struct strbuf *out

last, because it's an "out parameter" (something that the caller wants
to use as the output of this function).

Similarly, reorder parameters for format_trailer_info(), because later
on we will unify the two together.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Linus Arver
7b1c6aa541 trailer: rename functions to use 'trailer'
Rename process_trailers() to interpret_trailers(), because it matches
the name for the builtin command of the same name
(git-interpret-trailers), which is the sole user of process_trailers().

In a following commit, we will move "interpret_trailers" from trailer.c
to builtin/interpret-trailers.c. That move will necessitate the growth
of the trailer.h API, forcing us to expose some additional functions in
trailer.h.

Rename relevant functions so that they include the term "trailer" in
their name, so that clients of the API will be able to easily identify
them by their "trailer" moniker, just like all the other functions
already exposed by trailer.h.

Rename `struct list_head *head` to `struct list_head *trailers` because
"head" conveys no additional information beyond the "list_head" type.

Reorder parameters for format_trailers_from_commit() to prefer

    const struct process_trailer_options *opts

as the first parameter, because these options are intimately tied to
formatting trailers. Parameters like `FILE *outfile` should be last
because they are a kind of 'out' parameter, so put such parameters at
the end. This will be the pattern going forward in this series.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Linus Arver
a082e28938 shortlog: add test for de-duplicating folded trailers
The shortlog builtin was taught to use the trailer iterator interface in
47beb37bc6 (shortlog: match commit trailers with --group, 2020-09-27).
The iterator always unfolds values and this has always been the case
since the time the iterator was first introduced in f0939a0eb1 (trailer:
add interface for iterating over commit trailers, 2020-09-27). Add a
comment line to remind readers of this behavior.

The fact that the iterator always unfolds values is important
(at least for shortlog) because unfolding allows it to recognize both
folded and unfolded versions of the same trailer for de-duplication.

Capture the existing behavior in a new test case to guard against
regressions in this area. This test case is based off of the existing
"shortlog de-duplicates trailers in a single commit" just above it. Now
if we were to remove the call to

    unfold_value(&iter->val);

inside the iterator, this new test case will break.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Linus Arver
0f3a461d4e trailer: free trailer_info _after_ all related usage
In de7c27a186 (trailer: use offsets for trailer_start/trailer_end,
2023-10-20), we started using trailer block offsets in trailer_info. In
particular, we dropped the use of a separate stack variable "size_t
trailer_end", in favor of accessing the new "trailer_block_end" member
of trailer_info (as "info.trailer_block_end").

At that time, we forgot to also move the

   trailer_info_release(&info);

line to be _after_ this new use of the trailer_info struct. Move it now.

Note that even without this patch, we didn't have leaks or any other
problems because trailer_info_release() only frees memory allocated on
the heap. The "trailer_block_end" member was allocated on the stack back
then (as it is now) so it was still safe to use for all this time.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 10:35:42 -08:00
Junio C Hamano
58aa645fc0 Merge branch 'la/trailer-cleanups'
Fix to an already-graduated topic.

* la/trailer-cleanups:
  trailer: fix comment/cut-line regression with opts->no_divider
2024-02-19 20:58:06 -08:00
Jeff King
bc47139f4f trailer: fix comment/cut-line regression with opts->no_divider
Commit 97e9d0b78a (trailer: find the end of the log message, 2023-10-20)
combined two code paths for finding the end of the log message. For the
"no_divider" case, we used to use find_trailer_end(), and that has now
been rolled into find_end_of_log_message(). But there's a regression;
that function returns early when no_divider is set, returning the whole
string.

That's not how find_trailer_end() behaved. Although it did skip the
"---" processing (which is what "no_divider" is meant to do), we should
still respect ignored_log_message_bytes(), which covers things like
comments, "commit -v" cut lines, and so on.

The bug is actually in the interpret-trailers command, but the obvious
way to experience it is by running "commit -v" with a "--trailer"
option. The new trailer will be added at the end of the verbose diff,
rather than before it (and consequently will be ignored entirely, since
everything after the diff's intro scissors line is thrown away).

I've added two tests here: one for interpret-trailers directly, which
shows the bug via the parsing routines, and one for "commit -v".

The fix itself is pretty simple: instead of returning early, no_divider
just skips the "---" handling but still calls ignored_log_message_bytes().

Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-19 19:06:18 -08:00
Junio C Hamano
59a29e1274 Merge branch 'la/trailer-cleanups'
Code clean-up.

* la/trailer-cleanups:
  trailer: use offsets for trailer_start/trailer_end
  trailer: find the end of the log message
  commit: ignore_non_trailer computes number of bytes to ignore
2024-01-02 13:51:29 -08:00
Linus Arver
de7c27a186 trailer: use offsets for trailer_start/trailer_end
Previously these fields in the trailer_info struct were of type "const
char *" and pointed to positions in the input string directly (to the
start and end positions of the trailer block).

Use offsets to make the intended usage less ambiguous. We only need to
reference the input string in format_trailer_info(), so update that
function to take a pointer to the input.

While we're at it, rename trailer_start to trailer_block_start to be
more explicit about these offsets (that they are for the entire trailer
block including other trailers). Ditto for trailer_end.

Reported-by: Glen Choo <glencbz@gmail.com>
Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-20 11:55:04 -08:00
Linus Arver
97e9d0b78a trailer: find the end of the log message
Previously, trailer_info_get() computed the trailer block end position
by

(1) checking for the opts->no_divider flag and optionally calling
    find_patch_start() to find the "patch start" location (patch_start), and
(2) calling find_trailer_end() to find the end of the trailer block
    using patch_start as a guide, saving the return value into
    "trailer_end".

The logic in (1) was awkward because the variable "patch_start" is
misleading if there is no patch in the input. The logic in (2) was
misleading because it could be the case that no trailers are in the
input (yet we are setting a "trailer_end" variable before even searching
for trailers, which happens later in find_trailer_start()). The name
"find_trailer_end" was misleading because that function did not look for
any trailer block itself --- instead it just computed the end position
of the log message in the input where the end of the trailer block (if
it exists) would be (because trailer blocks must always come after the
end of the log message).

Combine the logic in (1) and (2) together into find_patch_start() by
renaming it to find_end_of_log_message(). The end of the log message is
the starting point which find_trailer_start() needs to start searching
backward to parse individual trailers (if any).

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-20 11:55:04 -08:00
Jeff King
1b274c9834 trailer: handle NULL value when parsing trailer-specific config
When parsing the "key", "command", and "cmd" trailer config, we just
make a copy of the value string.  If we see an implicit bool like:

  [trailer "foo"]
  key

we'll segfault trying to copy a NULL pointer. We can fix this with the
usual config_error_nonbool() check.

I split this out from the other vanilla cases, because at first glance
it looks like a better fix here would be to move the NULL check out of
the switch statement. But it would change the behavior of other keys
like trailer.*.ifExists, where an implicit bool is interpreted as
EXISTS_DEFAULT.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09 08:24:47 +09:00
Jeff King
ba176db511 config: handle NULL value when parsing non-bools
When the config parser sees an "implicit" bool like:

  [core]
  someVariable

it passes NULL to the config callback. Any callback code which expects a
string must check for NULL. This usually happens via helpers like
git_config_string(), etc, but some custom code forgets to do so and will
segfault.

These are all fairly vanilla cases where the solution is just the usual
pattern of:

  if (!value)
        return config_error_nonbool(var);

though note that in a few cases we have to split initializers like:

  int some_var = initializer();

into:

  int some_var;
  if (!value)
        return config_error_nonbool(var);
  some_var = initializer();

There are still some broken instances after this patch, which I'll
address on their own in individual patches after this one.

Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09 08:24:39 +09:00
Linus Arver
7cb26a1722 commit: ignore_non_trailer computes number of bytes to ignore
ignore_non_trailer() returns the _number of bytes_ that should be
ignored from the end of the log message. It does not by itself "ignore"
anything.

Rename this function to remove the leading "ignore" verb, to sound more
like a quantity than an action.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-20 14:25:12 -07:00
Linus Arver
94430d03df trailer: split process_command_line_args into separate functions
Previously, process_command_line_args did two things:

    (1) parse trailers from the configuration, and
    (2) parse trailers defined on the command line.

Separate (1) outside to a new function, parse_trailers_from_config.
Rename the remaining logic to parse_trailers_from_command_line_args.

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-11 10:01:19 -07:00
Linus Arver
c2a8edf997 trailer: split process_input_file into separate pieces
Currently, process_input_file does three things:

    (1) parse the input string for trailers,
    (2) print text before the trailers, and
    (3) calculate the position of the input where the trailers end.

Rename this function to parse_trailers(), and make it only do
(1). The caller of this function, process_trailers, becomes responsible
for (2) and (3). These items belong inside process_trailers because they
are both concerned with printing the surrounding text around
trailers (which is already one of the immediate concerns of
process_trailers).

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-11 10:01:19 -07:00
Linus Arver
13211ae23f trailer: separate public from internal portion of trailer_iterator
The fields here are not meant to be used by downstream callers, so put
them behind an anonymous struct named as "internal" to warn against
their use. This follows the pattern in 576de3d956 (unpack_trees: start
splitting internal fields from public API, 2023-02-27).

Signed-off-by: Linus Arver <linusa@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-11 10:01:18 -07:00
Junio C Hamano
ce481ac8b3 Merge branch 'cw/compat-util-header-cleanup'
Further shuffling of declarations across header files to streamline
file dependencies.

* cw/compat-util-header-cleanup:
  git-compat-util: move alloc macros to git-compat-util.h
  treewide: remove unnecessary includes for wrapper.h
  kwset: move translation table from ctype
  sane-ctype.h: create header for sane-ctype macros
  git-compat-util: move wrapper.c funcs to its header
  git-compat-util: move strbuf.c funcs to its header
2023-07-17 11:30:42 -07:00
Calvin Wan
91c080dff5 git-compat-util: move alloc macros to git-compat-util.h
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.

Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05 11:42:31 -07:00
Glen Choo
a4e7e317f8 config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).

In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.

Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:

- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed

Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.

The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:

- trace2/tr2_cfg.c:tr2_cfg_set_fl()

  This is indirectly called by git_config_set() so that the trace2
  machinery can notice the new config values and update its settings
  using the tr2 config parsing function, i.e. tr2_cfg_cb().

- builtin/checkout.c:checkout_main()

  This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
  This might be worth refactoring away in the future, since
  git_xmerge_config() can call git_default_config(), which can do much
  more than just parsing.

Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 14:06:39 -07:00
Elijah Newren
9875058870 treewide: remove cache.h inclusion due to environment.h changes
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:54 -07:00
Elijah Newren
7ee24e18e5 environment: move comment_line_char from cache.h
This is one step towards making strbuf.c not depend upon cache.h.
Additional steps will follow in subsequent commits.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:52 -07:00
Elijah Newren
f394e093df treewide: be explicit about dependence on gettext.h
Dozens of files made use of gettext functions, without explicitly
including gettext.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
gettext.h if they are using it.

However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:51 -07:00
Elijah Newren
36bf195890 alloc.h: move ALLOC_GROW() functions from cache.h
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places.  It does mean that we also need to add
includes of alloc.h in a number of C files.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:28 -08:00
Ævar Arnfjörð Bjarmason
5cf88fd8b0 git-compat-util.h: use "UNUSED", not "UNUSED(var)"
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75d (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.

Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.

This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:49:48 -07:00
Jeff King
783a86c142 config: mark unused callback parameters
The callback passed to git_config() must conform to a particular
interface. But most callbacks don't actually look at the extra "void
*data" parameter. Let's mark the unused parameters to make
-Wunused-parameter happy.

Note there's one unusual case here in get_remote_default() where we
actually ignore the "value" parameter. That's because it's only checking
whether the option is found at all, and not parsing its value.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
Junio C Hamano
c21fa3bb54 Merge branch 'ab/env-array'
Rename .env_array member to .env in the child_process structure.

* ab/env-array:
  run-command API users: use "env" not "env_array" in comments & names
  run-command API: rename "env_array" to "env"
2022-06-10 15:04:13 -07:00
Ævar Arnfjörð Bjarmason
29fda24dd1 run-command API: rename "env_array" to "env"
Start following-up on the rename mentioned in c7c4bdeccf (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".

The "env_array" name was picked in 19a583dc39 (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.

This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.

The rest of this is all a result of applying [1]:

 * make contrib/coccinelle/run_command.cocci.patch
 * patch -p1 <contrib/coccinelle/run_command.cocci.patch
 * git add -u

1. cat contrib/coccinelle/run_command.pending.cocci
   @@
   struct child_process E;
   @@
   - E.env_array
   + E.env

   @@
   struct child_process *E;
   @@
   - E->env_array
   + E->env

I've avoided changing any comments and derived variable names here,
that will all be done in the next commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:16 -07:00
Junio C Hamano
2b0a58d164 Merge branch 'ep/maint-equals-null-cocci' for maint-2.35
* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-02 10:06:04 -07:00
Junio C Hamano
afe8a9070b tree-wide: apply equals-null.cocci
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02 09:50:37 -07:00
Ævar Arnfjörð Bjarmason
c7c4bdeccf run-command API: remove "env" member, always use "env_array"
Remove the "env" member from "struct child_process" in favor of always
using the "env_array". As with the preceding removal of "argv" in
favor of "args" this gets rid of current and future oddities around
memory management at the API boundary (see the amended API docs).

For some of the conversions we can replace patterns like:

    child.env = env->v;

With:

    strvec_pushv(&child.env_array, env->v);

But for others we need to guard the strvec_pushv() with a NULL check,
since we're not passing in the "v" member of a "struct strvec",
e.g. in the case of tmp_objdir_env()'s return value.

Ideally we'd rename the "env_array" member to simply "env" as a
follow-up, since it and "args" are now inconsistent in not having an
"_array" suffix, and seemingly without any good reason, unless we look
at the history of how they came to be.

But as we've currently got 122 in-tree hits for a "git grep env_array"
let's leave that for now (and possibly forever). Doing that rename
would be too disruptive.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-25 22:15:08 -08:00
ZheNing Hu
c364b7ef51 trailer: add new .cmd config option
The `trailer.<token>.command` configuration variable
specifies a command (run via the shell, so it does not have
to be a single name or path to the command, but can be a
shell script), and the first occurrence of substring $ARG is
replaced with the value given to the `interpret-trailer`
command for the token in a '--trailer <token>=<value>' argument.

This has three downsides:

* The use of $ARG in the mechanism misleads the users that
the value is passed in the shell variable, and tempt them
to use $ARG more than once, but that would not work, as
the second and subsequent $ARG are not replaced.

* Because $ARG is textually replaced without regard to the
shell language syntax, even '$ARG' (inside a single-quote
pair), which a user would expect to stay intact, would be
replaced, and worse, if the value had an unmatched single
quote (imagine a name like "O'Connor", substituted into
NAME='$ARG' to make it NAME='O'Connor'), it would result in
a broken command that is not syntactically correct (or
worse).

* The first occurrence of substring `$ARG` will be replaced
with the empty string, in the command when the command is
first called to add a trailer with the specified <token>.
This is a bad design, the nature of automatic execution
causes it to add a trailer that we don't expect.

Introduce a new `trailer.<token>.cmd` configuration that
takes higher precedence to deprecate and eventually remove
`trailer.<token>.command`, which passes the value as an
argument to the command.  Instead of "$ARG", users can
refer to the value as positional argument, $1, in their
scripts. At the same time, in order to allow
`git interpret-trailers` to better simulate the behavior
of `git command -s`, 'trailer.<token>.cmd' will not
automatically execute.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 12:09:43 +09:00
Junio C Hamano
bfcc6e2a68 Merge branch 'rs/xcalloc-takes-nelem-first'
Code cleanup.

* rs/xcalloc-takes-nelem-first:
  fix xcalloc() argument order
2021-03-19 15:25:39 -07:00
René Scharfe
ca56dadb4b use CALLOC_ARRAY
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13 16:00:09 -08:00
René Scharfe
241b5d3ebe fix xcalloc() argument order
Pass the number of elements first and ther size second, as expected
by xcalloc().  Provide a semantic patch, which was actually used to
generate the rest of this patch.

The semantic patch would generate flip-flop diffs if both arguments
are sizeofs.  We don't have such a case, and it's hard to imagine
the usefulness of such an allocation.  If it ever occurs then we
could deal with it by duplicating the rule in the semantic patch to
make it cancel itself out, or we could change the code to use
CALLOC_ARRAY.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-08 09:45:04 -08:00
Ævar Arnfjörð Bjarmason
058761f1c1 pretty format %(trailers): add a "key_value_separator"
Add a "key_value_separator" option to the "%(trailers)" pretty format,
to go along with the existing "separator" argument. In combination
these two options make it trivial to produce machine-readable (e.g. \0
and \0\0-delimited) format output.

As elaborated on in a previous commit which added "keyonly" it was
needlessly tedious to extract structured data from "%(trailers)"
before the addition of this "key_value_separator" option. As seen by
the test being added here extracting this data now becomes trivial.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-09 14:16:42 -08:00
Ævar Arnfjörð Bjarmason
9d87d5ae02 pretty format %(trailers): add a "keyonly"
Add support for a "keyonly". This allows for easier parsing out of the
key and value. Before if you didn't want to make assumptions about how
the key was formatted. You'd need to parse it out as e.g.:

    --pretty=format:'%H%x00%(trailers:separator=%x00%x00)' \
                       '%x00%(trailers:separator=%x00%x00,valueonly)'

And then proceed to deduce keys by looking at those two and
subtracting the value plus the hardcoded ": " separator from the
non-valueonly %(trailers) line. Now it's possible to simply do:

    --pretty=format:'%H%x00%(trailers:separator=%x00%x00,keyonly)' \
                    '%x00%(trailers:separator=%x00%x00,valueonly)'

Which at least reduces it to a state machine where you get N keys and
correlate them with N values. Even better would be to have a way to
change the ": " delimiter to something easily machine-readable (a key
might contain ": " too). A follow-up change will add support for that.

I don't really have a use-case for just "keyonly" myself. I suppose it
would be useful in some cases as "key=*" matches case-insensitively,
so a plain "keyonly" will give you the variants of the keys you
matched. I'm mainly adding it to fix the inconsistency with
"valueonly".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-09 14:16:42 -08:00