2013-04-17 18:33:35 +00:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='git log with invalid commit headers'
|
|
|
|
|
revisions API: have release_revisions() release "cmdline"
Extend the the release_revisions() function so that it frees the
"cmdline" in the "struct rev_info". This in combination with a
preceding change to free "commits" and "mailmap" means that we can
whitelist another test under "TEST_PASSES_SANITIZE_LEAK=true".
There was a proposal in [1] to do away with xstrdup()-ing this
add_rev_cmdline(), perhaps that would be worthwhile, but for now let's
just free() it.
We could also make that a "char *" in "struct rev_cmdline_entry"
itself, but since we own it let's expose it as a constant to outside
callers. I proposed that in [2] but have since changed my mind. See
14d30cdfc04 (ref-filter: fix memory leak in `free_array_item()`,
2019-07-10), c514c62a4fd (checkout: fix leak of non-existent branch
names, 2020-08-14) and other log history hits for "free((char *)" for
prior art.
This includes the tests we had false-positive passes on before my
6798b08e848 (perl Git.pm: don't ignore signalled failure in
_cmd_close(), 2022-02-01), now they pass for real.
Since there are 66 tests matching t/t[0-9]*git-svn*.sh it's easier to
list those that don't pass than to touch most of those 66. So let's
introduce a "TEST_FAILS_SANITIZE_LEAK=true", which if set in the tests
won't cause lib-git-svn.sh to set "TEST_PASSES_SANITIZE_LEAK=true.
This change also marks all the tests that we removed
"TEST_FAILS_SANITIZE_LEAK=true" from in an earlier commit due to
removing the UNLEAK() from cmd_format_patch(), we can now assert that
its API use doesn't leak any "struct rev_info" memory.
This change also made commit "t5503-tagfollow.sh" pass on current
master, but that would regress when combined with
ps/fetch-atomic-fixup's de004e848a9 (t5503: simplify setup of test
which exercises failure of backfill, 2022-03-03) (through no fault of
that topic, that change started using "git clone" in the test, which
has an outstanding leak). Let's leave that test out for now to avoid
in-flight semantic conflicts.
1. https://lore.kernel.org/git/YUj%2FgFRh6pwrZalY@carlos-mbp.lan/
2. https://lore.kernel.org/git/87o88obkb1.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 20:01:47 +00:00
|
|
|
TEST_PASSES_SANITIZE_LEAK=true
|
2013-04-17 18:33:35 +00:00
|
|
|
. ./test-lib.sh
|
|
|
|
|
|
|
|
test_expect_success 'setup' '
|
|
|
|
test_commit foo &&
|
|
|
|
|
2023-04-27 08:14:06 +00:00
|
|
|
git cat-file commit HEAD >ok.commit &&
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 08:14:09 +00:00
|
|
|
sed "s/>/>-<>/" <ok.commit >broken_email.commit &&
|
2023-04-27 08:14:06 +00:00
|
|
|
|
2023-01-18 20:41:56 +00:00
|
|
|
git hash-object --literally -w -t commit broken_email.commit >broken_email.hash &&
|
2013-04-17 18:33:35 +00:00
|
|
|
git update-ref refs/heads/broken_email $(cat broken_email.hash)
|
|
|
|
'
|
|
|
|
|
split_ident: parse timestamp from end of line
Split_ident currently parses left to right. Given this
input:
Your Name <email@example.com> 123456789 -0500\n
We assume the name starts the line and runs until the first
"<". That starts the email address, which runs until the
first ">". Everything after that is assumed to be the
timestamp.
This works fine in the normal case, but is easily broken by
corrupted ident lines that contain an extra ">". Some
examples seen in the wild are:
1. Name <email>-<> 123456789 -0500\n
2. Name <email> <Name<email>> 123456789 -0500\n
3. Name1 <email1>, Name2 <email2> 123456789 -0500\n
Currently each of these produces some email address (which
is not necessarily the one the user intended) and end up
with a NULL date (which is generally interpreted as the
epoch by "git log" and friends).
But in each case we could get the correct timestamp simply
by parsing from the right-hand side, looking backwards for
the final ">", and then reading the timestamp from there.
In general, it's a losing battle to try to automatically
guess what the user meant with their broken crud. But this
particular workaround is probably worth doing. One, it's
dirt simple, and can't impact non-broken cases. Two, it
doesn't catch a single breakage we've seen, but rather a
large class of errors (i.e., any breakage inside the email
angle brackets may affect the email, but won't spill over
into the timestamp parsing). And three, the timestamp is
arguably more valuable to get right, because it can affect
correctness (e.g., in --until cutoffs).
This patch implements the right-to-left scheme described
above. We adjust the tests in t4212, which generate a commit
with such a broken ident, and now gets the timestamp right.
We also add a test that fsck continues to detect the
breakage.
For reference, here are pointers to the breakages seen (as
numbered above):
[1] http://article.gmane.org/gmane.comp.version-control.git/221441
[2] http://article.gmane.org/gmane.comp.version-control.git/222362
[3] http://perl5.git.perl.org/perl.git/commit/13b79730adea97e660de84bbe67f9d7cbe344302
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-14 22:45:00 +00:00
|
|
|
test_expect_success 'fsck notices broken commit' '
|
2014-08-29 20:31:46 +00:00
|
|
|
test_must_fail git fsck 2>actual &&
|
split_ident: parse timestamp from end of line
Split_ident currently parses left to right. Given this
input:
Your Name <email@example.com> 123456789 -0500\n
We assume the name starts the line and runs until the first
"<". That starts the email address, which runs until the
first ">". Everything after that is assumed to be the
timestamp.
This works fine in the normal case, but is easily broken by
corrupted ident lines that contain an extra ">". Some
examples seen in the wild are:
1. Name <email>-<> 123456789 -0500\n
2. Name <email> <Name<email>> 123456789 -0500\n
3. Name1 <email1>, Name2 <email2> 123456789 -0500\n
Currently each of these produces some email address (which
is not necessarily the one the user intended) and end up
with a NULL date (which is generally interpreted as the
epoch by "git log" and friends).
But in each case we could get the correct timestamp simply
by parsing from the right-hand side, looking backwards for
the final ">", and then reading the timestamp from there.
In general, it's a losing battle to try to automatically
guess what the user meant with their broken crud. But this
particular workaround is probably worth doing. One, it's
dirt simple, and can't impact non-broken cases. Two, it
doesn't catch a single breakage we've seen, but rather a
large class of errors (i.e., any breakage inside the email
angle brackets may affect the email, but won't spill over
into the timestamp parsing). And three, the timestamp is
arguably more valuable to get right, because it can affect
correctness (e.g., in --until cutoffs).
This patch implements the right-to-left scheme described
above. We adjust the tests in t4212, which generate a commit
with such a broken ident, and now gets the timestamp right.
We also add a test that fsck continues to detect the
breakage.
For reference, here are pointers to the breakages seen (as
numbered above):
[1] http://article.gmane.org/gmane.comp.version-control.git/221441
[2] http://article.gmane.org/gmane.comp.version-control.git/222362
[3] http://perl5.git.perl.org/perl.git/commit/13b79730adea97e660de84bbe67f9d7cbe344302
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-14 22:45:00 +00:00
|
|
|
test_i18ngrep invalid.author actual
|
|
|
|
'
|
|
|
|
|
2013-04-17 18:33:35 +00:00
|
|
|
test_expect_success 'git log with broken author email' '
|
|
|
|
{
|
tests: fix broken &&-chains in `{...}` groups
The top-level &&-chain checker built into t/test-lib.sh causes tests to
magically exit with code 117 if the &&-chain is broken. However, it has
the shortcoming that the magic does not work within `{...}` groups,
`(...)` subshells, `$(...)` substitutions, or within bodies of compound
statements, such as `if`, `for`, `while`, `case`, etc. `chainlint.sed`
partly fills in the gap by catching broken &&-chains in `(...)`
subshells, but bugs can still lurk behind broken &&-chains in the other
cases.
Fix broken &&-chains in `{...}` groups in order to reduce the number of
possible lurking bugs.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 05:11:08 +00:00
|
|
|
echo commit $(cat broken_email.hash) &&
|
|
|
|
echo "Author: A U Thor <author@example.com>" &&
|
|
|
|
echo "Date: Thu Apr 7 15:13:13 2005 -0700" &&
|
|
|
|
echo &&
|
2013-04-17 18:33:35 +00:00
|
|
|
echo " foo"
|
|
|
|
} >expect.out &&
|
|
|
|
|
|
|
|
git log broken_email >actual.out 2>actual.err &&
|
|
|
|
|
|
|
|
test_cmp expect.out actual.out &&
|
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-19 21:57:25 +00:00
|
|
|
test_must_be_empty actual.err
|
2013-04-17 18:33:35 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'git log --format with broken author email' '
|
split_ident: parse timestamp from end of line
Split_ident currently parses left to right. Given this
input:
Your Name <email@example.com> 123456789 -0500\n
We assume the name starts the line and runs until the first
"<". That starts the email address, which runs until the
first ">". Everything after that is assumed to be the
timestamp.
This works fine in the normal case, but is easily broken by
corrupted ident lines that contain an extra ">". Some
examples seen in the wild are:
1. Name <email>-<> 123456789 -0500\n
2. Name <email> <Name<email>> 123456789 -0500\n
3. Name1 <email1>, Name2 <email2> 123456789 -0500\n
Currently each of these produces some email address (which
is not necessarily the one the user intended) and end up
with a NULL date (which is generally interpreted as the
epoch by "git log" and friends).
But in each case we could get the correct timestamp simply
by parsing from the right-hand side, looking backwards for
the final ">", and then reading the timestamp from there.
In general, it's a losing battle to try to automatically
guess what the user meant with their broken crud. But this
particular workaround is probably worth doing. One, it's
dirt simple, and can't impact non-broken cases. Two, it
doesn't catch a single breakage we've seen, but rather a
large class of errors (i.e., any breakage inside the email
angle brackets may affect the email, but won't spill over
into the timestamp parsing). And three, the timestamp is
arguably more valuable to get right, because it can affect
correctness (e.g., in --until cutoffs).
This patch implements the right-to-left scheme described
above. We adjust the tests in t4212, which generate a commit
with such a broken ident, and now gets the timestamp right.
We also add a test that fsck continues to detect the
breakage.
For reference, here are pointers to the breakages seen (as
numbered above):
[1] http://article.gmane.org/gmane.comp.version-control.git/221441
[2] http://article.gmane.org/gmane.comp.version-control.git/222362
[3] http://perl5.git.perl.org/perl.git/commit/13b79730adea97e660de84bbe67f9d7cbe344302
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-14 22:45:00 +00:00
|
|
|
echo "A U Thor+author@example.com+Thu Apr 7 15:13:13 2005 -0700" >expect.out &&
|
2013-04-17 18:33:35 +00:00
|
|
|
|
|
|
|
git log --format="%an+%ae+%ad" broken_email >actual.out 2>actual.err &&
|
|
|
|
|
|
|
|
test_cmp expect.out actual.out &&
|
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-19 21:57:25 +00:00
|
|
|
test_must_be_empty actual.err
|
2013-04-17 18:33:35 +00:00
|
|
|
'
|
|
|
|
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 08:14:09 +00:00
|
|
|
test_expect_success '--until handles broken email' '
|
|
|
|
git rev-list --until=1980-01-01 broken_email >actual &&
|
|
|
|
test_must_be_empty actual
|
|
|
|
'
|
|
|
|
|
t4212: test bogus timestamps with git-log
When t4212 was originally added by 9dbe7c3d (pretty: handle
broken commit headers gracefully, 2013-04-17), it tested our
handling of commits with broken ident lines in which the
timestamps could not be parsed. It does so using a bogus line
like "Name <email>-<> 1234 -0000", because that simulates an
error that was seen in the wild.
Later, 03818a4 (split_ident: parse timestamp from end of
line, 2013-10-14) made our parser smart enough to actually
find the timestamp on such a line, and t4212 was adjusted to
match. While it's nice that we handle this real-world case,
this meant that we were not actually testing the
bogus-timestamp case anymore.
This patch adds a test with a totally incomprehensible
timestamp to make sure we are testing the code path.
Note that the behavior is slightly different between regular log
output and "--format=%ad". In the former case, we produce a
sentinel value and in the latter, we produce an empty
string. While at first this seems unnecessarily
inconsistent, it matches the original behavior given by
9dbe7c3d.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 07:36:22 +00:00
|
|
|
munge_author_date () {
|
|
|
|
git cat-file commit "$1" >commit.orig &&
|
|
|
|
sed "s/^\(author .*>\) [0-9]*/\1 $2/" <commit.orig >commit.munge &&
|
2023-01-18 20:41:56 +00:00
|
|
|
git hash-object --literally -w -t commit commit.munge
|
t4212: test bogus timestamps with git-log
When t4212 was originally added by 9dbe7c3d (pretty: handle
broken commit headers gracefully, 2013-04-17), it tested our
handling of commits with broken ident lines in which the
timestamps could not be parsed. It does so using a bogus line
like "Name <email>-<> 1234 -0000", because that simulates an
error that was seen in the wild.
Later, 03818a4 (split_ident: parse timestamp from end of
line, 2013-10-14) made our parser smart enough to actually
find the timestamp on such a line, and t4212 was adjusted to
match. While it's nice that we handle this real-world case,
this meant that we were not actually testing the
bogus-timestamp case anymore.
This patch adds a test with a totally incomprehensible
timestamp to make sure we are testing the code path.
Note that the behavior is slightly different between regular log
output and "--format=%ad". In the former case, we produce a
sentinel value and in the latter, we produce an empty
string. While at first this seems unnecessarily
inconsistent, it matches the original behavior given by
9dbe7c3d.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 07:36:22 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'unparsable dates produce sentinel value' '
|
|
|
|
commit=$(munge_author_date HEAD totally_bogus) &&
|
|
|
|
echo "Date: Thu Jan 1 00:00:00 1970 +0000" >expect &&
|
|
|
|
git log -1 $commit >actual.full &&
|
|
|
|
grep Date <actual.full >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'unparsable dates produce sentinel value (%ad)' '
|
|
|
|
commit=$(munge_author_date HEAD totally_bogus) &&
|
|
|
|
echo >expect &&
|
2015-03-20 10:06:44 +00:00
|
|
|
git log -1 --format=%ad $commit >actual &&
|
t4212: test bogus timestamps with git-log
When t4212 was originally added by 9dbe7c3d (pretty: handle
broken commit headers gracefully, 2013-04-17), it tested our
handling of commits with broken ident lines in which the
timestamps could not be parsed. It does so using a bogus line
like "Name <email>-<> 1234 -0000", because that simulates an
error that was seen in the wild.
Later, 03818a4 (split_ident: parse timestamp from end of
line, 2013-10-14) made our parser smart enough to actually
find the timestamp on such a line, and t4212 was adjusted to
match. While it's nice that we handle this real-world case,
this meant that we were not actually testing the
bogus-timestamp case anymore.
This patch adds a test with a totally incomprehensible
timestamp to make sure we are testing the code path.
Note that the behavior is slightly different between regular log
output and "--format=%ad". In the former case, we produce a
sentinel value and in the latter, we produce an empty
string. While at first this seems unnecessarily
inconsistent, it matches the original behavior given by
9dbe7c3d.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 07:36:22 +00:00
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
log: handle integer overflow in timestamps
If an ident line has a ridiculous date value like (2^64)+1,
we currently just pass ULONG_MAX along to the date code,
which can produce nonsensical dates.
On systems with a signed long time_t (e.g., 64-bit glibc
systems), this actually doesn't end up too bad. The
ULONG_MAX is converted to -1, we apply the timezone field to
that, and the result ends up somewhere between Dec 31, 1969
and Jan 1, 1970.
However, there is still a few good reasons to detect the
overflow explicitly:
1. On systems where "unsigned long" is smaller than
time_t, we get a nonsensical date in the future.
2. Even where it would produce "Dec 31, 1969", it's easier
to recognize "midnight Jan 1" as a consistent sentinel
value for "we could not parse this".
3. Values which do not overflow strtoul but do overflow a
signed time_t produce nonsensical values in the past.
For example, on a 64-bit system with a signed long
time_t, a timestamp of 18446744073000000000 produces a
date in 1947.
We also recognize overflow in the timezone field, which
could produce nonsensical results. In this case we show the
parsed date, but in UTC.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 07:46:37 +00:00
|
|
|
# date is 2^64 + 1
|
|
|
|
test_expect_success 'date parser recognizes integer overflow' '
|
|
|
|
commit=$(munge_author_date HEAD 18446744073709551617) &&
|
|
|
|
echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
|
|
|
|
git log -1 --format=%ad $commit >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
|
|
|
# date is 2^64 - 2
|
|
|
|
test_expect_success 'date parser recognizes time_t overflow' '
|
|
|
|
commit=$(munge_author_date HEAD 18446744073709551614) &&
|
|
|
|
echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
|
|
|
|
git log -1 --format=%ad $commit >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
2014-02-24 07:49:05 +00:00
|
|
|
# date is within 2^63-1, but enough to choke glibc's gmtime
|
2014-04-01 07:43:06 +00:00
|
|
|
test_expect_success 'absurdly far-in-future date' '
|
2014-02-24 07:49:05 +00:00
|
|
|
commit=$(munge_author_date HEAD 999999999999999999) &&
|
2014-04-01 07:43:06 +00:00
|
|
|
git log -1 --format=%ad $commit
|
2014-02-24 07:49:05 +00:00
|
|
|
'
|
|
|
|
|
parse_commit(): handle broken whitespace-only timestamp
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 08:17:15 +00:00
|
|
|
test_expect_success 'create commits with whitespace committer dates' '
|
|
|
|
# It is important that this subject line is numeric, since we want to
|
|
|
|
# be sure we are not confused by skipping whitespace and accidentally
|
|
|
|
# parsing the subject as a timestamp.
|
|
|
|
#
|
|
|
|
# Do not use munge_author_date here. Besides not hitting the committer
|
|
|
|
# line, it leaves the timezone intact, and we want nothing but
|
|
|
|
# whitespace.
|
|
|
|
#
|
|
|
|
# We will make two munged commits here. The first, ws_commit, will
|
|
|
|
# be purely spaces. The second contains a vertical tab, which is
|
|
|
|
# considered a space by strtoumax(), but not by our isspace().
|
|
|
|
test_commit 1234567890 &&
|
|
|
|
git cat-file commit HEAD >commit.orig &&
|
|
|
|
sed "s/>.*/> /" <commit.orig >commit.munge &&
|
|
|
|
ws_commit=$(git hash-object --literally -w -t commit commit.munge) &&
|
|
|
|
sed "s/>.*/> $(printf "\013")/" <commit.orig >commit.munge &&
|
|
|
|
vt_commit=$(git hash-object --literally -w -t commit commit.munge)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--until treats whitespace date as sentinel' '
|
|
|
|
echo $ws_commit >expect &&
|
|
|
|
git rev-list --until=1980-01-01 $ws_commit >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
|
|
|
|
echo $vt_commit >expect &&
|
|
|
|
git rev-list --until=1980-01-01 $vt_commit >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pretty-printer handles whitespace date' '
|
|
|
|
# as with the %ad test above, we will show these as the empty string,
|
|
|
|
# not the 1970 epoch date. This is intentional; see 7d9a281941 (t4212:
|
|
|
|
# test bogus timestamps with git-log, 2014-02-24) for more discussion.
|
|
|
|
echo : >expect &&
|
|
|
|
git log -1 --format="%at:%ct" $ws_commit >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
git log -1 --format="%at:%ct" $vt_commit >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
2013-04-17 18:33:35 +00:00
|
|
|
test_done
|