Commit graph

19 commits

Author SHA1 Message Date
René Scharfe 4318094047 archive: don't add empty directories to archives
While git doesn't track empty directories, git archive can be tricked
into putting some into archives.  One way is to construct an empty tree
object, as t5004 does.  While that is supported by the object database,
it can't be represented in the index and thus it's unlikely to occur in
the wild.

Another way is using the literal name of a directory in an exclude
pathspec -- its contents are are excluded, but the directory stub is
included.  That's inconsistent: exclude pathspecs containing wildcards
don't leave empty directories in the archive.

Yet another way is have a few levels of nested subdirectories (e.g.
d1/d2/d3/file1) and ignoring the entries at the leaves (e.g. file1).
The directories with the ignored content are ignored as well (e.g. d3),
but their empty parents are included (e.g. d2).

As empty directories are not supported by git, they should also not be
written into archives.  If an empty directory is really needed then it
can be tracked and archived by placing an empty .gitignore file in it.

There already is a mechanism in place for suppressing empty directories.
When read_tree_recursive() encounters a directory excluded by a pathspec
then it enters it anyway because it might contain included entries.  It
calls the callback function before it is able to decide if the directory
is actually needed.  For that reason git archive adds directories to a
queue and writes entries for them only when it encounters the first
child item -- but currently only if pathspecs with wildcards are used.

Queue *all* directories, no matter if there even are pathspecs present.
This prevents git archive from writing entries for empty directories in
all cases.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14 15:08:22 +09:00
René Scharfe 867e40ff3a t5004: require 64-bit support for big ZIP tests
Check if unzip supports the ZIP64 format and skip the tests that create
big archives otherwise.  Also skip the test that archives a big file on
32-bit platforms because the git object systems can't unpack files
bigger than 4GB there.

Reported-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-01 08:46:50 +09:00
René Scharfe 4cdf3f9d84 archive-zip: support files bigger than 4GB
Write a zip64 extended information extra field for big files as part of
their local headers and as part of their central directory headers.
Also write a zip64 version of the data descriptor in that case.

If we're streaming then we don't know the compressed size at the time we
write the header.  Deflate can end up making a file bigger instead of
smaller if we're unlucky.  Write a local zip64 header already for files
with a size of 2GB or more in this case to be on the safe side.

Both sizes need to be included in the local zip64 header, but the extra
field for the directory must only contain 64-bit equivalents for 32-bit
values of 0xffffffff.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-24 22:10:51 -07:00
René Scharfe af95749f9b archive-zip: support archives bigger than 4GB
Add a zip64 extended information extra field to the central directory
and emit the zip64 end of central directory records as well as locator
if the offset of an entry within the archive exceeds 4GB.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-24 22:10:51 -07:00
René Scharfe 758c1f9d1b archive-zip: add tests for big ZIP archives
Test the creation of ZIP archives bigger than 4GB and containing files
bigger than 4GB.  They are marked as EXPENSIVE because they take quite a
while and because the first one needs a bit more than 4GB of disk space
to store the resulting archive.

The big archive in the first test is made up of a tree containing
thousands of copies of a small file.  Yet the test has to write out the
full archive because unzip doesn't offer a way to read from stdin.

The big file in the second test is provided as a zipped pack file to
avoid writing another 4GB file to disk and then adding it.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-24 21:43:21 -07:00
René Scharfe 88329ca809 archive-zip: support more than 65535 entries
Support more than 65535 entries cleanly by writing a "zip64 end of
central directory record" (with a 64-bit field for the number of
entries) before the usual "end of central directory record" (which
contains only a 16-bit field).  InfoZIP's zip does the same.
Archives with 65535 or less entries are not affected.

Programs that extract all files like InfoZIP's zip and 7-Zip
ignored the field and could extract all files already.  Software
that relies on the ZIP file directory to show a list of contained
files quickly to simulate to normal directory like Windows'
built-in ZIP functionality only saw a subset of the included files.

Windows supports ZIP64 since Vista according to
https://en.wikipedia.org/wiki/Zip_%28file_format%29#ZIP64.

Suggested-by: Johannes Schauer <josch@debian.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-28 08:54:57 -07:00
René Scharfe 19ee29401d t5004: test ZIP archives with many entries
A ZIP file directory has a 16-bit field for the number of entries it
contains.  There are 64-bit extensions to deal with that.  Demonstrate
that git archive --format=zip currently doesn't use them and instead
overflows the field.

InfoZIP's unzip doesn't care about this field and extracts all files
anyway.  Software that uses the directory for presenting a filesystem
like view quickly -- notably Windows -- depends on it, but doesn't
lend itself to an automatic test case easily.  Use InfoZIP's zipinfo,
which probably isn't available everywhere but at least can provides
*some* way to check this field.

To speed things up a bit create and commit only a subset of the files
and build a fake tree out of duplicates and pass that to git archive.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-28 08:52:10 -07:00
Jeff King 9ddc5ac97e t: wrap complicated expect_code users in a block
If we are expecting a command to produce a particular exit
code, we can use test_expect_code. However, some cases are
more complicated, and want to accept one of a range of exit
codes. For these, we end up with something like:

  cmd;
  case "$?" in
  ...

That unfortunately breaks the &&-chain and fools
--chain-lint. Since these special cases are so few, we can
wrap them in a block, like this:

  { cmd; ret=$?; } &&
  case "$ret" in
  ...

This accomplishes the same thing, and retains the &&-chain
(the exit status fed to the && is that of the assignment,
which should always be true). It's technically longer, but
it is probably a good thing for unusual code like this to
stand out.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20 10:20:16 -07:00
Junio C Hamano 15c6ef7b06 Revert "archive: honor tar.umask even for pax headers"
This reverts commit 10f343ea81, whose
output is no longer bit-for-bit equivalent from the older versions
of Git, which the infrastructure to (pretend to) upload tarballs
kernel.org uses depends on.
2014-10-20 12:04:46 -07:00
Junio C Hamano 4740891e47 Merge branch 'bc/archive-pax-header-mode'
Implementations of "tar" that do not understand an extended pax
header would extract the contents of it in a regular file; make
sure the permission bits of this file follows the same tar.umask
configuration setting.

* bc/archive-pax-header-mode:
  archive: honor tar.umask even for pax headers
2014-09-02 13:27:13 -07:00
brian m. carlson 10f343ea81 archive: honor tar.umask even for pax headers
git archive's tar format uses extended pax headers to encode metadata
into the archive.  Most tar implementations correctly treat these as
metadata, but some that do not understand the pax format extract these
as files instead.  Apply the tar.umask setting to these entries to
prevent tampering by other users.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-04 11:39:11 -07:00
Stepan Kasal b93e6e3663 t5000, t5003: do not use test_cmp to compare binary files
test_cmp() is primarily meant to compare text files (and display the
difference for debug purposes).

Raw "cmp" is better suited to compare binary files (tar, zip, etc.).

On MinGW, test_cmp is a shell function mingw_test_cmp that tries to
read both files into environment, stripping CR characters (introduced
in commit 4d715ac0).

This function usually speeds things up, as fork is extremly slow on
Windows.  But no wonder that this function is extremely slow and
sometimes even crashes when comparing large tar or zip files.

Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-04 11:14:25 -07:00
Junio C Hamano 843fb919fd Merge branch 'rs/empty-archive'
Fixes tests added in 1.8.2 era that are broken on BSDs.

* rs/empty-archive:
  t5004: resurrect original empty tar archive test
  t5004: avoid using tar for checking emptiness of archive
2013-06-02 15:48:25 -07:00
René Scharfe ea2d20d4c2 t5004: avoid using tar for checking emptiness of archive
Test 2 of t5004 checks if a supposedly empty tar archive really
contains no files.  24676f02 (t5004: fix issue with empty archive test
and bsdtar) removed our commit hash to make it work with bsdtar, but
the test still fails on NetBSD and OpenBSD, which use their own tar
that considers a tar file containing only NULs as broken.

Here's what the different archivers do when asked to create a tar
file without entries:

	$ uname -v
	NetBSD 6.0.1 (GENERIC)
	$ gtar --version | head -1
	tar (GNU tar) 1.26
	$ bsdtar --version
	bsdtar 2.8.4 - libarchive 2.8.4

	$ : >zero.tar
	$ perl -e 'print "\0" x 10240' >tenk.tar
	$ sha1 zero.tar tenk.tar
	SHA1 (zero.tar) = da39a3ee5e6b4b0d3255bfef95601890afd80709
	SHA1 (tenk.tar) = 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c

	$ : | tar cf - -T - | sha1
	da39a3ee5e6b4b0d3255bfef95601890afd80709
	$ : | gtar cf - -T - | sha1
	34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c
	$ : | bsdtar cf - -T - | sha1
	34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c

So NetBSD's native tar creates an empty file, while GNU tar and bsdtar
both give us 10KB of NULs -- just like git archive with an empty tree.
Now let's see how the archivers handle these two kinds of empty tar
files:

	$ tar tf zero.tar; echo $?
	tar: Unexpected EOF on archive file
	1
	$ gtar tf zero.tar; echo $?
	gtar: This does not look like a tar archive
	gtar: Exiting with failure status due to previous errors
	2
	$ bsdtar tf zero.tar; echo $?
	0

	$ tar tf tenk.tar; echo $?
	tar: Cannot identify format. Searching...
	tar: End of archive volume 1 reached
	tar: Sorry, unable to determine archive format.
	1
	$ gtar tf tenk.tar; echo $?
	0
	$ bsdtar tf tenk.tar; echo $?
	0

NetBSD's tar complains about both, bsdtar happily accepts any of them
and GNU tar doesn't like zero-length archive files.  So the safest
course of action is to stay with our block-of-NULs format which is
compatible with GNU tar and bsdtar, as we can't make NetBSD's native
tar happy anyway.

We can simplify our test, however, by taking tar out of the picture.
Instead of extracting the archive and checking for the non-presence of
files, check if the file has a size of 10KB and contains only NULs.
This makes t5004 pass on NetBSD and OpenBSD.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-09 12:41:31 -07:00
René Scharfe 56ee96572a t5004: resurrect original empty tar archive test
Add a test to verify the emptiness of an archive by extracting its
contents.  Don't run this test if the version of tar doesn't support
archives containing only a comment header, though.

The existing check 'tar archive of empty tree is empty' used to work
like that (minus the tar capability check) but was changed to depend
on the exact representation of empty tar files created by git archive
instead of on the behaviour of tar in order to avoid issues with
different tar versions.

The different approaches test different things: The existing one is
for empty trees, for which we know the exact expected output and thus
we can simply check it without extracting; the new one is for commits
with empty trees, whose archives include stamps and so the more
"natural" check by extraction is a better fit because it focuses on
the interesting aspect, namely the absence of any archive entries.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-09 12:22:31 -07:00
René Scharfe 71a19a3744 t5004: avoid using tar for checking emptiness of archive
Test 2 of t5004 checks if a supposedly empty tar archive really
contains no files.  24676f02 (t5004: fix issue with empty archive test
and bsdtar) removed our commit hash to make it work with bsdtar, but
the test still fails on NetBSD and OpenBSD, which use their own tar
that considers a tar file containing only NULs as broken.

Here's what the different archivers do when asked to create a tar
file without entries:

	$ uname -v
	NetBSD 6.0.1 (GENERIC)
	$ gtar --version | head -1
	tar (GNU tar) 1.26
	$ bsdtar --version
	bsdtar 2.8.4 - libarchive 2.8.4

	$ : >zero.tar
	$ perl -e 'print "\0" x 10240' >tenk.tar
	$ sha1 zero.tar tenk.tar
	SHA1 (zero.tar) = da39a3ee5e6b4b0d3255bfef95601890afd80709
	SHA1 (tenk.tar) = 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c

	$ : | tar cf - -T - | sha1
	da39a3ee5e6b4b0d3255bfef95601890afd80709
	$ : | gtar cf - -T - | sha1
	34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c
	$ : | bsdtar cf - -T - | sha1
	34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c

So NetBSD's native tar creates an empty file, while GNU tar and bsdtar
both give us 10KB of NULs -- just like git archive with an empty tree.
Now let's see how the archivers handle these two kinds of empty tar
files:

	$ tar tf zero.tar; echo $?
	tar: Unexpected EOF on archive file
	1
	$ gtar tf zero.tar; echo $?
	gtar: This does not look like a tar archive
	gtar: Exiting with failure status due to previous errors
	2
	$ bsdtar tf zero.tar; echo $?
	0

	$ tar tf tenk.tar; echo $?
	tar: Cannot identify format. Searching...
	tar: End of archive volume 1 reached
	tar: Sorry, unable to determine archive format.
	$ gtar tf tenk.tar; echo $?
	0
	$ bsdtar tf tenk.tar; echo $?
	0

NetBSD's tar complains about both, bsdtar happily accepts any of them
and GNU tar doesn't like zero-length archive files.  So the safest
course of action is to stay with our block-of-NULs format which is
compatible with GNU tar and bsdtar, as we can't make NetBSD's native
tar happy anyway.

We can simplify our test, however, by taking tar out of the picture.
Instead of extracting the archive and checking for the non-presence of
files, check if the file has a size of 10KB and contains only NULs.
This makes t5004 pass on NetBSD and OpenBSD.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-09 12:20:40 -07:00
René Scharfe abdb9b2e4f t5004: ignore pax global header file
Versions of tar that don't know pax headers -- like the ones in NetBSD 6
and OpenBSD 5.2 -- extract them as regular files.  Explicitly ignore the
file created for our global header when checking the list of extracted
files, as this is normal and harmless fall-back behaviour.  This fixes
test 3 of t5004 on these platforms.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-09 12:18:57 -07:00
René Scharfe 24676f02ba t5004: fix issue with empty archive test and bsdtar
bsdtar, which is the default tar on Mac OS X, handles empty archives
just fine but reports archives containing only a pax extended header
comment as damaged.  Work around the issue by explicitly generating
the archive for the tree and not the commit, which causes git archive
to omit the commit hash comment record from the tar file.

Reported-by: BJ Hargrave <bj@bjhargrave.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-10 12:26:14 -07:00
Jeff King bd54cf17a4 archive: handle commits with an empty tree
git-archive relies on get_pathspec to convert its argv into
a list of pathspecs. When get_pathspec is given an empty
argv list, it returns a single pathspec, the empty string,
to indicate that everything matches. When we feed this to
our path_exists function, we typically see that the pathspec
turns up at least one item in the tree, and we are happy.

But when our tree is empty, we erroneously think it is
because the pathspec is too limited, when in fact it is
simply that there is nothing to be found in the tree. This
is a weird corner case, but the correct behavior is almost
certainly to produce an empty archive, not to exit with an
error.

This patch teaches git-archive to create empty archives when
there is no pathspec given (we continue to complain if a
pathspec is given, since it obviously is not matched). It
also confirms that the tar and zip writers produce sane
output in this instance.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-10 22:25:22 -07:00