From 48860819e80260abae480407d0bd75dd58e70bb7 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 30 Jun 2016 05:07:54 -0400 Subject: [PATCH 1/5] t9300: factor out portable "head -c" replacement It is sometimes useful to be able to read exactly N bytes from a pipe. Doing this portably turns out to be surprisingly difficult in shell scripts. We want a solution that: - is portable - never reads more than N bytes due to buffering (which would mean those bytes are not available to the next program to read from the same pipe) - handles partial reads by looping until N bytes are read (or we see EOF) - is resilient to stray signals giving us EINTR while trying to read (even though we don't send them, things like SIGWINCH could cause apparently-random failures) Some possible solutions are: - "head -c" is not portable, and implementations may buffer (though GNU head does not) - "read -N" is a bash-ism, and thus not portable - "dd bs=$n count=1" does not handle partial reads. GNU dd has iflags=fullblock, but that is not portable - "dd bs=1 count=$n" fixes the partial read problem (all reads are 1-byte, so there can be no partial response). It does make a lot of write() calls, but for our tests that's unlikely to matter. It's fairly portable. We already use it in our tests, and it's unlikely that implementations would screw up any of our criteria. The most unknown one would be signal handling. - perl can do a sysread() loop pretty easily. On my Linux system, at least, it seems to restart the read() call automatically. If that turns out not to be portable, though, it would be easy for us to handle it. That makes the perl solution the least bad (because we conveniently omitted "length of code" as a criterion). It's also what t9300 is currently using, so we can just pull the implementation from there. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t9300-fast-import.sh | 23 +++-------------------- t/test-lib-functions.sh | 14 ++++++++++++++ 2 files changed, 17 insertions(+), 20 deletions(-) diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 4bca35c259..38a06aaafc 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -7,23 +7,6 @@ test_description='test git fast-import utility' . ./test-lib.sh . "$TEST_DIRECTORY"/diff-lib.sh ;# test-lib chdir's into trash -# Print $1 bytes from stdin to stdout. -# -# This could be written as "head -c $1", but IRIX "head" does not -# support the -c option. -head_c () { - perl -e ' - my $len = $ARGV[1]; - while ($len > 0) { - my $s; - my $nread = sysread(STDIN, $s, $len); - die "cannot read: $!" unless defined($nread); - print $s; - $len -= $nread; - } - ' - "$1" -} - verify_packs () { for p in .git/objects/pack/*.pack do @@ -2480,7 +2463,7 @@ test_expect_success PIPE 'R: copy using cat-file' ' read blob_id type size <&3 && echo "$blob_id $type $size" >response && - head_c $size >blob <&3 && + test_copy_bytes $size >blob <&3 && read newline <&3 && cat <<-EOF && @@ -2523,7 +2506,7 @@ test_expect_success PIPE 'R: print blob mid-commit' ' EOF read blob_id type size <&3 && - head_c $size >actual <&3 && + test_copy_bytes $size >actual <&3 && read newline <&3 && echo @@ -2558,7 +2541,7 @@ test_expect_success PIPE 'R: print staged blob within commit' ' echo "cat-blob $to_get" && read blob_id type size <&3 && - head_c $size >actual <&3 && + test_copy_bytes $size >actual <&3 && read newline <&3 && echo deleteall diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 48884d5208..90856d67e5 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -961,3 +961,17 @@ test_env () { done ) } + +# Read up to "$1" bytes (or to EOF) from stdin and write them to stdout. +test_copy_bytes () { + perl -e ' + my $len = $ARGV[1]; + while ($len > 0) { + my $s; + my $nread = sysread(STDIN, $s, $len); + die "cannot read: $!" unless defined($nread); + print $s; + $len -= $nread; + } + ' - "$1" +} From e51217e15c3eb89d18b313dad32db1787d4f2a44 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 30 Jun 2016 05:08:57 -0400 Subject: [PATCH 2/5] t5000: test tar files that overflow ustar headers The ustar format only has room for 11 (or 12, depending on some implementations) octal digits for the size and mtime of each file. For values larger than this, we have to add pax extended headers to specify the real data, and git does not yet know how to do so. Before fixing that, let's start off with some test infrastructure, as designing portable and efficient tests for this is non-trivial. We want to use the system tar to check our output (because what we really care about is interoperability), but we can't rely on it: 1. being able to read pax headers 2. being able to handle huge sizes or mtimes 3. supporting a "t" format we can parse So as a prerequisite, we can feed the system tar a reference tarball to make sure it can handle these features. The reference tar here was created with: dd if=/dev/zero seek=64G bs=1 count=1 of=huge touch -d @68719476737 huge tar cf - --format=pax | head -c 2048 using GNU tar. Note that this is not a complete tarfile, but it's enough to contain the headers we want to examine. Likewise, we need to convince git that it has a 64GB blob to output. Running "git add" on that 64GB file takes many minutes of CPU, and even compressed, the result is 64MB. So again, I pre-generated that loose object, and then took only the first 2k of it. That should be enough to generate 2MB of data before hitting an inflate error, which is plenty for us to generate the tar header (and then die of SIGPIPE while streaming the rest out). The tests are split so that we test as much as we can even with an uncooperative system tar. This actually catches the current breakage (which is that we die("BUG") trying to write the ustar header) on every system, and then on systems where we can, we go farther and actually verify the result. Helped-by: Robin H. Johnson Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t5000-tar-tree.sh | 74 ++++++++++++++++++ .../19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes t/t5000/huge-and-future.tar | Bin 0 -> 2048 bytes 3 files changed, 74 insertions(+) create mode 100644 t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a create mode 100644 t/t5000/huge-and-future.tar diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 4b68bbafbe..950bdd31e7 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -319,4 +319,78 @@ test_expect_success 'catch non-matching pathspec' ' test_must_fail git archive -v HEAD -- "*.abc" >/dev/null ' +# Pull the size and date of each entry in a tarfile using the system tar. +# +# We'll pull out only the year from the date; that avoids any question of +# timezones impacting the result (as long as we keep our test times away from a +# year boundary; our reference times are all in August). +# +# The output of tar_info is expected to be " ", both in decimal. It +# ignores the return value of tar. We have to do this, because some of our test +# input is only partial (the real data is 64GB in some cases). +tar_info () { + "$TAR" tvf "$1" | + awk '{ + split($4, date, "-") + print $3 " " date[1] + }' +} + +# See if our system tar can handle a tar file with huge sizes and dates far in +# the future, and that we can actually parse its output. +# +# The reference file was generated by GNU tar, and the magic time and size are +# both octal 01000000000001, which overflows normal ustar fields. +test_lazy_prereq TAR_HUGE ' + echo "68719476737 4147" >expect && + tar_info "$TEST_DIRECTORY"/t5000/huge-and-future.tar >actual && + test_cmp expect actual +' + +test_expect_success 'set up repository with huge blob' ' + obj_d=19 && + obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a && + obj=${obj_d}${obj_f} && + mkdir -p .git/objects/$obj_d && + cp "$TEST_DIRECTORY"/t5000/$obj .git/objects/$obj_d/$obj_f && + rm -f .git/index && + git update-index --add --cacheinfo 100644,$obj,huge && + git commit -m huge +' + +# We expect git to die with SIGPIPE here (otherwise we +# would generate the whole 64GB). +test_expect_failure 'generate tar with huge size' ' + { + git archive HEAD + echo $? >exit-code + } | test_copy_bytes 4096 >huge.tar && + echo 141 >expect && + test_cmp expect exit-code +' + +test_expect_failure TAR_HUGE 'system tar can read our huge size' ' + echo 68719476737 >expect && + tar_info huge.tar | cut -d" " -f1 >actual && + test_cmp expect actual +' + +test_expect_success 'set up repository with far-future commit' ' + rm -f .git/index && + echo content >file && + git add file && + GIT_COMMITTER_DATE="@68719476737 +0000" \ + git commit -m "tempori parendum" +' + +test_expect_failure 'generate tar with future mtime' ' + git archive HEAD >future.tar +' + +test_expect_failure TAR_HUGE 'system tar can read our future mtime' ' + echo 4147 >expect && + tar_info future.tar | cut -d" " -f2 >actual && + test_cmp expect actual +' + test_done diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a new file mode 100644 index 0000000000000000000000000000000000000000..5cbe9ec312bfd7b7e0398ca281e9d42848743704 GIT binary patch literal 2048 zcmb=p_2!@gk^{~&l;zIw((@uRaR^ literal 0 HcmV?d00001 From d1657b570a44108e49032962da201aad48689605 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 30 Jun 2016 05:09:16 -0400 Subject: [PATCH 3/5] archive-tar: write extended headers for file sizes >= 8GB MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The ustar format has a fixed-length field for the size of each file entry which is supposed to contain up to 11 bytes of octal-formatted data plus a NUL or space terminator. These means that the largest size we can represent is 077777777777, or 1 byte short of 8GB. The correct solution for a larger file, according to POSIX.1-2001, is to add an extended pax header, similar to how we handle long filenames. This patch does that, and writes zero for the size field in the ustar header (the last bit is not mentioned by POSIX, but it matches how GNU tar behaves with --format=pax). This should be a strict improvement over the current behavior, which is to die in xsnprintf with a "BUG". However, there's some interesting history here. Prior to f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we silently overflowed the "size" field. The extra bytes ended up in the "mtime" field of the header, which was then immediately written itself, overwriting our extra bytes. What that means depends on how many bytes we wrote. If the size was 64GB or greater, then we actually overflowed digits into the mtime field, meaning our value was effectively right-shifted by those lost octal digits. And this patch is again a strict improvement over that. But if the size was between 8GB and 64GB, then our 12-byte field held all of the actual digits, and only our NUL terminator overflowed. According to POSIX, there should be a NUL or space at the end of the field. However, GNU tar seems to be lenient here, and will correctly parse a size up 64GB (minus one) from the field. So sizes in this range might have just worked, depending on the implementation reading the tarfile. This patch is mostly still an improvement there, as the 8GB limit is specifically mentioned in POSIX as the correct limit. But it's possible that it could be a regression (versus the pre-f2f0267 state) if all of the following are true: 1. You have a file between 8GB and 64GB. 2. Your tar implementation _doesn't_ know about pax extended headers. 3. Your tar implementation _does_ parse 12-byte sizes from the ustar header without a delimiter. It's probably not worth worrying about such an obscure set of conditions, but I'm documenting it here just in case. Helped-by: René Scharfe Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- archive-tar.c | 31 +++++++++++++++++++++++++++++-- t/t5000-tar-tree.sh | 4 ++-- 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index cb99df2814..57a15406d9 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -18,6 +18,13 @@ static int tar_umask = 002; static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args); +/* + * This is the max value that a ustar size header can specify, as it is fixed + * at 11 octal digits. POSIX specifies that we switch to extended headers at + * this size. + */ +#define USTAR_MAX_SIZE 077777777777UL + /* writes out the whole block, but only if it is full */ static void write_if_needed(void) { @@ -137,6 +144,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, strbuf_addch(sb, '\n'); } +/* + * Like strbuf_append_ext_header, but for numeric values. + */ +static void strbuf_append_ext_header_uint(struct strbuf *sb, + const char *keyword, + uintmax_t value) +{ + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ + int len; + + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); + strbuf_append_ext_header(sb, keyword, buf, len); +} + static unsigned int ustar_header_chksum(const struct ustar_header *header) { const unsigned char *p = (const unsigned char *)header; @@ -208,7 +229,7 @@ static int write_tar_entry(struct archiver_args *args, struct ustar_header header; struct strbuf ext_header = STRBUF_INIT; unsigned int old_mode = mode; - unsigned long size; + unsigned long size, size_in_header; void *buffer; int err = 0; @@ -267,7 +288,13 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } - prepare_header(args, &header, mode, size); + size_in_header = size; + if (S_ISREG(mode) && size > USTAR_MAX_SIZE) { + size_in_header = 0; + strbuf_append_ext_header_uint(&ext_header, "size", size); + } + + prepare_header(args, &header, mode, size_in_header); if (ext_header.len > 0) { err = write_extended_header(args, sha1, ext_header.buf, diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 950bdd31e7..93c2d3405d 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -360,7 +360,7 @@ test_expect_success 'set up repository with huge blob' ' # We expect git to die with SIGPIPE here (otherwise we # would generate the whole 64GB). -test_expect_failure 'generate tar with huge size' ' +test_expect_success 'generate tar with huge size' ' { git archive HEAD echo $? >exit-code @@ -369,7 +369,7 @@ test_expect_failure 'generate tar with huge size' ' test_cmp expect exit-code ' -test_expect_failure TAR_HUGE 'system tar can read our huge size' ' +test_expect_success TAR_HUGE 'system tar can read our huge size' ' echo 68719476737 >expect && tar_info huge.tar | cut -d" " -f1 >actual && test_cmp expect actual From 6e8e0991e5219954f049731d18e5f53c5f5f526b Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 30 Jun 2016 05:09:20 -0400 Subject: [PATCH 4/5] archive-tar: write extended headers for far-future mtime MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The ustar format represents timestamps as seconds since the epoch, but only has room to store 11 octal digits. To express anything larger, we need to use an extended header. This is exactly the same case we fixed for the size field in the previous commit, and the solution here follows the same pattern. This is even mentioned as an issue in f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), but since it only affected things far in the future, it wasn't deemed worth dealing with. But note that my calculations claiming thousands of years were off there; because our xsnprintf produces a NUL byte, we only have until the year 2242 to fix this. Given that this is just around the corner (geologically speaking, anyway), and because it's easy to fix, let's just make it work. Unlike the previous fix for "size", where we had to write an individual extended header for each file, we can write one global header (since we have only one mtime for the whole archive). There's a slight bit of trickiness there. We may already be writing a global header with a "comment" field for the commit sha1. So we need to write our new field into the same header. To do this, we push the decision of whether to write such a header down into write_global_extended_header(), which will now assemble the header as it sees fit, and will return early if we have nothing to write (in practice, we'll only have a large mtime if it comes from a commit, but this makes it also work if you set your system clock ahead such that time() returns a huge value). Note that we don't (and never did) handle negative timestamps (i.e., before 1970). This would probably not be too hard to support in the same way, but since git does not support negative timestamps at all, I didn't bother here. After writing the extended header, we munge the timestamp in the ustar headers to the maximum-allowable size. This is wrong, but it's the least-wrong thing we can provide to a tar implementation that doesn't understand pax headers (it's also what GNU tar does). Helped-by: René Scharfe Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- archive-tar.c | 19 ++++++++++++++++--- t/t5000-tar-tree.sh | 4 ++-- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index 57a15406d9..d671bc34f2 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -22,8 +22,11 @@ static int write_tar_filter_archive(const struct archiver *ar, * This is the max value that a ustar size header can specify, as it is fixed * at 11 octal digits. POSIX specifies that we switch to extended headers at * this size. + * + * Likewise for the mtime (which happens to use a buffer of the same size). */ #define USTAR_MAX_SIZE 077777777777UL +#define USTAR_MAX_MTIME 077777777777UL /* writes out the whole block, but only if it is full */ static void write_if_needed(void) @@ -324,7 +327,18 @@ static int write_global_extended_header(struct archiver_args *args) unsigned int mode; int err = 0; - strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40); + if (sha1) + strbuf_append_ext_header(&ext_header, "comment", + sha1_to_hex(sha1), 40); + if (args->time > USTAR_MAX_MTIME) { + strbuf_append_ext_header_uint(&ext_header, "mtime", + args->time); + args->time = USTAR_MAX_MTIME; + } + + if (!ext_header.len) + return 0; + memset(&header, 0, sizeof(header)); *header.typeflag = TYPEFLAG_GLOBAL_HEADER; mode = 0100666; @@ -409,8 +423,7 @@ static int write_tar_archive(const struct archiver *ar, { int err = 0; - if (args->commit_sha1) - err = write_global_extended_header(args); + err = write_global_extended_header(args); if (!err) err = write_archive_entries(args, write_tar_entry); if (!err) diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 93c2d3405d..96d208da25 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -383,11 +383,11 @@ test_expect_success 'set up repository with far-future commit' ' git commit -m "tempori parendum" ' -test_expect_failure 'generate tar with future mtime' ' +test_expect_success 'generate tar with future mtime' ' git archive HEAD >future.tar ' -test_expect_failure TAR_HUGE 'system tar can read our future mtime' ' +test_expect_success TAR_HUGE 'system tar can read our future mtime' ' echo 4147 >expect && tar_info future.tar | cut -d" " -f2 >actual && test_cmp expect actual From 5caeeb83bcb88622db739fe27ee8bfc64e9cbf21 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 30 Jun 2016 05:09:26 -0400 Subject: [PATCH 5/5] archive-tar: drop return value We never do any error checks, and so never return anything but "0". Let's just drop this to simplify the code. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- archive-tar.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index d671bc34f2..7ea4e90814 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -319,13 +319,12 @@ static int write_tar_entry(struct archiver_args *args, return err; } -static int write_global_extended_header(struct archiver_args *args) +static void write_global_extended_header(struct archiver_args *args) { const unsigned char *sha1 = args->commit_sha1; struct strbuf ext_header = STRBUF_INIT; struct ustar_header header; unsigned int mode; - int err = 0; if (sha1) strbuf_append_ext_header(&ext_header, "comment", @@ -337,7 +336,7 @@ static int write_global_extended_header(struct archiver_args *args) } if (!ext_header.len) - return 0; + return; memset(&header, 0, sizeof(header)); *header.typeflag = TYPEFLAG_GLOBAL_HEADER; @@ -347,7 +346,6 @@ static int write_global_extended_header(struct archiver_args *args) write_blocked(&header, sizeof(header)); write_blocked(ext_header.buf, ext_header.len); strbuf_release(&ext_header); - return err; } static struct archiver **tar_filters; @@ -423,9 +421,8 @@ static int write_tar_archive(const struct archiver *ar, { int err = 0; - err = write_global_extended_header(args); - if (!err) - err = write_archive_entries(args, write_tar_entry); + write_global_extended_header(args); + err = write_archive_entries(args, write_tar_entry); if (!err) write_trailer(); return err;