From d1bd3a8c3424e818f4117a39fe418909e24cea5f Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 12 Dec 2023 17:12:43 -0500 Subject: [PATCH 1/3] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When processing a header like a "From" line, mailinfo uses unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized comments. It takes a NUL-terminated string on input, and loops over the "in" pointer until it sees the NUL. When it finds the start of an interesting block, it delegates to helper functions which also increment "in", and return the updated pointer. But there's a bug here: the helpers find the NUL with a post-increment in the loop condition, like: while ((c = *in++) != 0) So when they do see a NUL (rather than the correct termination of the quote or comment section), they return "in" as one _past_ the NUL terminator. And thus the outer loop in unquote_quoted_pair() does not realize we hit the NUL, and keeps reading past the end of the buffer. We should instead make sure to return "in" positioned at the NUL, so that the caller knows to stop their loop, too. A hacky way to do this is to return "in - 1" after leaving the inner loop. But a slightly cleaner solution is to avoid incrementing "in" until we are sure it contained a non-NUL byte (i.e., doing it inside the loop body). The two tests here show off the problem. Since we check the output, they'll _usually_ report a failure in a normal build, but it depends on what garbage bytes are found after the heap buffer. Building with SANITIZE=address reliably notices the problem. The outcome (both the exit code and the exact bytes) are just what Git happens to produce for these cases today, and shouldn't be taken as an endorsement. It might be reasonable to abort on an unterminated string, for example. The priority for this patch is fixing the out-of-bounds memory access. Reported-by: Carlos Andrés Ramírez Cataño Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- mailinfo.c | 8 ++++---- t/t5100-mailinfo.sh | 22 ++++++++++++++++++++++ 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/mailinfo.c b/mailinfo.c index 833d28612f..542d4458f6 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -56,12 +56,12 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line) static const char *unquote_comment(struct strbuf *outbuf, const char *in) { - int c; int take_next_literally = 0; strbuf_addch(outbuf, '('); - while ((c = *in++) != 0) { + while (*in) { + int c = *in++; if (take_next_literally == 1) { take_next_literally = 0; } else { @@ -86,10 +86,10 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in) static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in) { - int c; int take_next_literally = 0; - while ((c = *in++) != 0) { + while (*in) { + int c = *in++; if (take_next_literally == 1) { take_next_literally = 0; } else { diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh index db11cababd..654d8cf3ee 100755 --- a/t/t5100-mailinfo.sh +++ b/t/t5100-mailinfo.sh @@ -268,4 +268,26 @@ test_expect_success 'mailinfo warn CR in base64 encoded email' ' test_must_be_empty quoted-cr/0002.err ' +test_expect_success 'from line with unterminated quoted string' ' + echo "From: bob \"unterminated string smith " >in && + git mailinfo /dev/null /dev/null actual && + cat >expect <<-\EOF && + Author: bob unterminated string smith + Email: bob@example.com + + EOF + test_cmp expect actual +' + +test_expect_success 'from line with unterminated comment' ' + echo "From: bob (unterminated comment smith " >in && + git mailinfo /dev/null /dev/null actual && + cat >expect <<-\EOF && + Author: bob (unterminated comment smith + Email: bob@example.com + + EOF + test_cmp expect actual +' + test_done From 2d9396c2feac8b707b367c64cd59604d7bc86a33 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 14 Dec 2023 16:47:46 -0500 Subject: [PATCH 2/3] t5100: make rfc822 comment test more careful When processing "From" headers in an email, mailinfo "unquotes" quoted strings and rfc822 parenthesized comments. For quoted strings, we actually remove the double-quotes, so: From: "A U Thor" become: Author: A U Thor Email: someone@example.com But for comments, we leave the outer parentheses in place, so: From: A U (this is a comment) Thor becomes: Author: A U (this is a comment) Thor Email: someone@example.com So what is the comment "unquoting" actually doing? In our code, being in a comment section has exactly two effects: 1. We'll unquote backslash-escaped characters inside a comment section. 2. We _won't_ unquote double-quoted strings inside a comment section. Our test for comments in t5100 checks this: From: "A U Thor" (this is \(really\) a comment (honestly)) So it is covering (1), but not (2). Let's add in a quoted string to cover this. Moreover, because the comment appears at the end of the From header, there's nothing to confirm that we correctly found the end of the comment section (and not just the end-of-string). Let's instead move it to the beginning of the header, which means we can confirm that the existing quoted string is detected (which will only happen if we know we've left the comment block). As expected, the test continues to pass, but this will give us more confidence as we refactor the code in the next patch. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t5100/comment.expect | 2 +- t/t5100/comment.in | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t5100/comment.expect b/t/t5100/comment.expect index 7228177984..bd71956a47 100644 --- a/t/t5100/comment.expect +++ b/t/t5100/comment.expect @@ -1,4 +1,4 @@ -Author: A U Thor (this is (really) a comment (honestly)) +Author: (this is (really) a "comment" (honestly)) A U Thor Email: somebody@example.com Subject: testing comments Date: Sun, 25 May 2008 00:38:18 -0700 diff --git a/t/t5100/comment.in b/t/t5100/comment.in index c53a192dfe..0b7e903b06 100644 --- a/t/t5100/comment.in +++ b/t/t5100/comment.in @@ -1,5 +1,5 @@ From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001 -From: "A U Thor" (this is \(really\) a comment (honestly)) +From: (this is \(really\) a "comment" (honestly)) "A U Thor" Date: Sun, 25 May 2008 00:38:18 -0700 Subject: [PATCH] testing comments From dee182941fb685f5d85e61a0e9d97e8e91512f6c Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 14 Dec 2023 16:48:59 -0500 Subject: [PATCH 3/3] mailinfo: avoid recursion when unquoting From headers Our unquote_comment() function is recursive; when it sees a comment within a comment, like: (this is an (embedded) comment) it recurses to handle the inner comment. This is fine for practical use, but it does mean that you can easily run out of stack space with a malicious header. For example: perl -e 'print "From: ", "(" x 2**18;' | git mailinfo /dev/null /dev/null segfaults on my system. And since mailinfo is likely to be fed untrusted input from the Internet (if not by human users, who might recognize a garbage header, but certainly there are automated systems that apply patches from a list) it may be possible for an attacker to trigger the problem. That said, I don't think there's an interesting security vulnerability here. All an attacker can do is make it impossible to parse their email and apply their patch, and there are lots of ways to generate bogus emails. So it's more of an annoyance than anything. But it's pretty easy to fix it. The recursion is not helping us preserve any particular state from each level. The only flag in our parsing is take_next_literally, and we can never recurse when it is set (since the start of a new comment implies it was not backslash-escaped). So it is really only useful for finding the end of the matched pair of parentheses. We can do that easily with a simple depth counter. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- mailinfo.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mailinfo.c b/mailinfo.c index 542d4458f6..4acf7cb601 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -57,6 +57,7 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line) static const char *unquote_comment(struct strbuf *outbuf, const char *in) { int take_next_literally = 0; + int depth = 1; strbuf_addch(outbuf, '('); @@ -70,11 +71,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in) take_next_literally = 1; continue; case '(': - in = unquote_comment(outbuf, in); + strbuf_addch(outbuf, '('); + depth++; continue; case ')': strbuf_addch(outbuf, ')'); - return in; + if (!--depth) + return in; + continue; } }