Commit graph

83 commits

Author SHA1 Message Date
René Scharfe
cde8ea9c66 archive: support compression levels beyond 9
Compression programs like zip, gzip, bzip2 and xz allow to adjust the
trade-off between CPU cost and size gain with numerical options from -1
for fast compression and -9 for high compression ratio.  zip also
accepts -0 for storing files verbatim.  git archive directly support
these single-digit compression levels for ZIP output and passes them to
filters like gzip.

Zstandard additionally supports compression level options -10 to -19, or
up to -22 with --ultra.  This *seems* to work with git archive in most
cases, e.g. it will produce an archive with -19 without complaining, but
since it only supports single-digit compression level options this is
the same as -1 -9 and thus -9.

Allow git archive to accept multi-digit compression levels to support
the full range supported by zstd.  Explicitly reject them for the ZIP
format, as otherwise deflateInit2() would just fail with a somewhat
cryptic "stream consistency error".

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 11:25:45 -08:00
René Scharfe
200589abcb archive: read short blobs in archive.c::write_archive_entry()
Centralize reading of symlink destinations and the contents of regular
files that are too small to be streamed.  This reduces code duplication
and allows future patches to add support for adding non-tracked files to
archives.  The backends are expected to stream blobs if buffer is NULL.

object_file_to_archive() is only called from archive.c and thus no
longer exported.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-19 15:56:05 -07:00
Jeff King
f5914f4b6b parse_config_key(): return subsection len as size_t
We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.

Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).

When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:44:29 -07:00
Matheus Tavares
c8123e72f6 streaming: allow open_istream() to handle any repo
Some callers of open_istream() at archive-tar.c and archive-zip.c are
capable of working on arbitrary repositories but the repo struct is not
passed down to open_istream(), which uses the_repository internally. For
now, that's not a problem since the said callers are only being called
with the_repository. But to be consistent and avoid future problems,
let's allow open_istream() to receive a struct repository and use that
instead of the_repository. This parameter addition will also be used in
a future patch to make sha1-file.c:check_object_signature() be able to
work on arbitrary repos.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Junio C Hamano
af2b8faf49 Merge branch 'rs/pax-extended-header-length-fix'
"git archive" recorded incorrect length in extended pax header in
some corner cases, which has been corrected.

* rs/pax-extended-header-length-fix:
  archive-tar: turn length miscalculation warning into BUG
  archive-tar: use size_t in strbuf_append_ext_header()
  archive-tar: fix pax extended header length calculation
  archive-tar: report wrong pax extended header length
2019-09-09 12:26:37 -07:00
René Scharfe
71d41ff651 archive-tar: turn length miscalculation warning into BUG
Now that we're confident our pax extended header calculation is correct,
turn the criticality of the assertion up to the maximum, from warning
right up to BUG.  Simplify the test, as the stderr comparison step would
not be reached in case the BUG message is triggered.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 10:49:00 -07:00
René Scharfe
17e9ef00d2 archive-tar: use size_t in strbuf_append_ext_header()
One of its callers already passes in a size_t value.  Use it
consistently in this function.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 10:49:00 -07:00
René Scharfe
82a46af13e archive-tar: fix pax extended header length calculation
A pax extended header record starts with a decimal number.  Its value
is the length of the whole record, including its own length.

The calculation of that number in strbuf_append_ext_header() is off by
one in case the length of the rest is close to a higher order of
magnitude.  This affects paths and link targets a bit shorter than 1000,
10000, 100000 etc. characters -- paths with a length of up to 100 fit
into the tar header and don't need a pax extended header.

The mistake has been present since the function was added by ae64bbc18c
("tar-tree: Introduce write_entry()", 2006-03-25).

Account for digits added to len during the loop and keep incrementing
until we have enough space for len and the rest.  The crucial change is
to check against the current value of len before each iteration, instead
of against its value before the loop.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 10:48:02 -07:00
René Scharfe
4060c1990a archive-tar: report wrong pax extended header length
Extended header entries contain a length value that is a bit tricky to
calculate because it includes its own length (number of decimal digits)
as well.  We get it wrong in corner cases.  Add a check, report wrong
results as a warning and add a test for exercising it.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 10:47:28 -07:00
brian m. carlson
bbf05cf70e archive: convert struct archiver_args to object_id
Change the commit_sha1 member to be called "commit_oid" and change it to
be a pointer to struct object_id.  Additionally, update some uses of
GIT_SHA1_HEXSZ and hard-coded values to use the_hash_algo instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 11:57:39 +09:00
Nguyễn Thái Ngọc Duy
ec36c42a63 Indent code with TABs
We indent with TABs and sometimes for fine alignment, TABs followed by
spaces, but never all spaces (unless the indentation is less than 8
columns). Indenting with spaces slips through in some places. Fix
them.

Imported code and compat/ are left alone on purpose. The former should
remain as close as upstream as possible. The latter pretty much has
separate maintainers, it's up to them to decide.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-09 12:37:32 +09:00
Torsten Bögershausen
ca473cef91 Upcast size_t variables to uintmax_t when printing
When printing variables which contain a size, today "unsigned long"
is used at many places.
In order to be able to change the type from "unsigned long" into size_t
some day in the future, we need to have a way to print 64 bit variables
on a system that has "unsigned long" defined to be 32 bit, like Win64.

Upcast all those variables into uintmax_t before they are printed.
This is to prepare for a bigger change, when "unsigned long"
will be converted into size_t for variables which may be > 4Gib.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 16:43:52 +09:00
Junio C Hamano
dc0f6f9e1d Merge branch 'nd/no-the-index'
The more library-ish parts of the codebase learned to work on the
in-core index-state instance that is passed in by their callers,
instead of always working on the singleton "the_index" instance.

* nd/no-the-index: (24 commits)
  blame.c: remove implicit dependency on the_index
  apply.c: remove implicit dependency on the_index
  apply.c: make init_apply_state() take a struct repository
  apply.c: pass struct apply_state to more functions
  resolve-undo.c: use the right index instead of the_index
  archive-*.c: use the right repository
  archive.c: avoid access to the_index
  grep: use the right index instead of the_index
  attr: remove index from git_attr_set_direction()
  entry.c: use the right index instead of the_index
  submodule.c: use the right index instead of the_index
  pathspec.c: use the right index instead of the_index
  unpack-trees: avoid the_index in verify_absent()
  unpack-trees: convert clear_ce_flags* to avoid the_index
  unpack-trees: don't shadow global var the_index
  unpack-trees: add a note about path invalidation
  unpack-trees: remove 'extern' on function declaration
  ls-files: correct index argument to get_convert_attr_ascii()
  preload-index.c: use the right index instead of the_index
  dir.c: remove an implicit dependency on the_index in pathspec code
  ...
2018-08-20 11:33:53 -07:00
Junio C Hamano
4bea8485e3 Merge branch 'nd/i18n'
Many more strings are prepared for l10n.

* nd/i18n: (23 commits)
  transport-helper.c: mark more strings for translation
  transport.c: mark more strings for translation
  sha1-file.c: mark more strings for translation
  sequencer.c: mark more strings for translation
  replace-object.c: mark more strings for translation
  refspec.c: mark more strings for translation
  refs.c: mark more strings for translation
  pkt-line.c: mark more strings for translation
  object.c: mark more strings for translation
  exec-cmd.c: mark more strings for translation
  environment.c: mark more strings for translation
  dir.c: mark more strings for translation
  convert.c: mark more strings for translation
  connect.c: mark more strings for translation
  config.c: mark more strings for translation
  commit-graph.c: mark more strings for translation
  builtin/replace.c: mark more strings for translation
  builtin/pack-objects.c: mark more strings for translation
  builtin/grep.c: mark strings for translation
  builtin/config.c: mark more strings for translation
  ...
2018-08-15 15:08:23 -07:00
Nguyễn Thái Ngọc Duy
b67b55127c archive-*.c: use the right repository
With 'struct archive_args' gaining new repository pointer, we don't
have to assume the_repository in the archive backends anymore.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:44 -07:00
Nguyễn Thái Ngọc Duy
d0482e697c archive-tar.c: mark more strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 11:19:09 -07:00
Junio C Hamano
00624d608c Merge branch 'sb/object-store-grafts'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-07-18 12:20:28 -07:00
Junio C Hamano
50f08db594 Merge branch 'js/use-bug-macro'
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.

* js/use-bug-macro:
  BUG_exit_code: fix sparse "symbol not declared" warning
  Convert remaining die*(BUG) messages
  Replace all die("BUG: ...") calls by BUG() ones
  run-command: use BUG() to report bugs, not die()
  test-tool: help verifying BUG() code paths
2018-05-30 14:04:07 +09:00
Stefan Beller
cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Johannes Schindelin
033abf97fc Replace all die("BUG: ...") calls by BUG() ones
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).

The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.

Let's just convert all remaining ones in one fell swoop.

This trick was performed by this invocation:

	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 19:06:13 +09:00
Stefan Beller
0df8e96566 cache.h: add repository argument to oid_object_info
Add a repository argument to allow the callers of oid_object_info
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 10:54:27 +09:00
brian m. carlson
abef9020e3 sha1_file: convert sha1_object_info* to object_id
Convert sha1_object_info and sha1_object_info_extended to take pointers
to struct object_id and rename them to use "oid" instead of "sha1" in
their names.  Update the declaration and definition and apply the
following semantic patch, plus the standard object_id transforms:

@@
expression E1, E2;
@@
- sha1_object_info(E1.hash, E2)
+ oid_object_info(&E1, E2)

@@
expression E1, E2;
@@
- sha1_object_info(E1->hash, E2)
+ oid_object_info(E1, E2)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1.hash, E2, E3)
+ oid_object_info_extended(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1->hash, E2, E3)
+ oid_object_info_extended(E1, E2, E3)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson
ef7b5195f1 streaming: convert open_istream to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson
e5ec981a4b archive: convert sha1_file_to_archive to struct object_id
Convert this function to take a pointer to struct object_id and rename
it object_file_to_archive.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:48 -07:00
brian m. carlson
015ff4f822 archive: convert write_archive_entry_fn_t to object_id
Convert the write_archive_entry_fn_t type to use a pointer to struct
object_id.  Convert various static functions in the tar and zip
archivers also.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:48 -07:00
Junio C Hamano
f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Brandon Williams
b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Ramsay Jones
3f789719a6 archive-tar: fix a sparse 'constant too large' warning
Commit dddbad728c ("timestamp_t: a new data type for timestamps",
26-04-2017) introduced a new typedef 'timestamp_t', as a synonym for an
unsigned long, which was used at the time to represent timestamps in
git. A later commit 28f4aee3fb ("use uintmax_t for timestamps",
26-04-2017) changed the typedef to use an 'uintmax_t' for the timestamp
representation type.

When building on a 32-bit Linux system, sparse complains that a constant
(USTAR_MAX_MTIME) used to detect a 'far-future mtime' timestamp, is too
large; 'warning: constant 077777777777UL is so big it is unsigned long
long' on lines 335 and 338 of archive-tar.c. Note that both gcc and
clang only issue a warning if this constant is used in a context that
requires an 'unsigned long' (rather than an uintmax_t). (Since TIME_MAX
is no longer equal to 0xFFFFFFFF, even on a 32-bit system, the macro
USTAR_MAX_MTIME is set to 077777777777UL, which cannot be represented as
an 'unsigned long' constant).

In order to suppress the warning, change the definition of the macro
constant USTAR_MAX_MTIME to use an 'ULL' type suffix.

In a similar vein, on systems which use a 64-bit representation of the
'unsigned long' type, the USTAR_MAX_SIZE constant macro is defined with
the value 077777777777ULL. Although this does not cause any warning
messages to be issued, it would be more appropriate for this constant
to use an 'UL' type suffix rather than 'ULL'.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-09 11:23:14 +09:00
Johannes Schindelin
dddbad728c timestamp_t: a new data type for timestamps
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).

So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.

By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.

As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-27 13:07:39 +09:00
Junio C Hamano
57734b4e88 Merge branch 'jk/big-and-future-archive-tar'
A small code clean-up.

* jk/big-and-future-archive-tar:
  archive-tar: make write_extended_header() void
2016-08-12 09:47:37 -07:00
René Scharfe
560b0e8f52 archive-tar: make write_extended_header() void
The function write_extended_header() only ever returns 0.  Simplify
it and its caller by dropping its return value, like we did with
write_global_extended_header() earlier.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-06 10:31:51 -07:00
Junio C Hamano
29493589e9 archive-tar: huge offset and future timestamps would not work on 32-bit
As we are not yet moving everything to size_t but still using ulong
internally when talking about the size of object, platforms with
32-bit long will not be able to produce tar archive with 4GB+ file,
and cannot grok 077777777777UL as a constant.  Disable the extended
header feature and do not test it on them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-15 10:51:55 -07:00
Jeff King
5caeeb83bc archive-tar: drop return value
We never do any error checks, and so never return anything
but "0". Let's just drop this to simplify the code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 10:26:28 -07:00
Jeff King
6e8e0991e5 archive-tar: write extended headers for far-future mtime
The ustar format represents timestamps as seconds since the
epoch, but only has room to store 11 octal digits.  To
express anything larger, we need to use an extended header.
This is exactly the same case we fixed for the size field in
the previous commit, and the solution here follows the same
pattern.

This is even mentioned as an issue in f2f0267 (archive-tar:
use xsnprintf for trivial formatting, 2015-09-24), but since
it only affected things far in the future, it wasn't deemed
worth dealing with. But note that my calculations claiming
thousands of years were off there; because our xsnprintf
produces a NUL byte, we only have until the year 2242 to fix
this.

Given that this is just around the corner (geologically
speaking, anyway), and because it's easy to fix, let's just
make it work. Unlike the previous fix for "size", where we
had to write an individual extended header for each file, we
can write one global header (since we have only one mtime
for the whole archive).

There's a slight bit of trickiness there. We may already be
writing a global header with a "comment" field for the
commit sha1. So we need to write our new field into the same
header. To do this, we push the decision of whether to write
such a header down into write_global_extended_header(),
which will now assemble the header as it sees fit, and will
return early if we have nothing to write (in practice, we'll
only have a large mtime if it comes from a commit, but this
makes it also work if you set your system clock ahead such
that time() returns a huge value).

Note that we don't (and never did) handle negative
timestamps (i.e., before 1970). This would probably not be
too hard to support in the same way, but since git does not
support negative timestamps at all, I didn't bother here.

After writing the extended header, we munge the timestamp in
the ustar headers to the maximum-allowable size. This is
wrong, but it's the least-wrong thing we can provide to a
tar implementation that doesn't understand pax headers (it's
also what GNU tar does).

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 10:26:01 -07:00
Jeff King
d1657b570a archive-tar: write extended headers for file sizes >= 8GB
The ustar format has a fixed-length field for the size of
each file entry which is supposed to contain up to 11 bytes
of octal-formatted data plus a NUL or space terminator.

These means that the largest size we can represent is
077777777777, or 1 byte short of 8GB. The correct solution
for a larger file, according to POSIX.1-2001, is to add an
extended pax header, similar to how we handle long
filenames. This patch does that, and writes zero for the
size field in the ustar header (the last bit is not
mentioned by POSIX, but it matches how GNU tar behaves with
--format=pax).

This should be a strict improvement over the current
behavior, which is to die in xsnprintf with a "BUG".
However, there's some interesting history here.

Prior to f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we silently overflowed the "size"
field. The extra bytes ended up in the "mtime" field of the
header, which was then immediately written itself,
overwriting our extra bytes. What that means depends on how
many bytes we wrote.

If the size was 64GB or greater, then we actually overflowed
digits into the mtime field, meaning our value was
effectively right-shifted by those lost octal digits. And
this patch is again a strict improvement over that.

But if the size was between 8GB and 64GB, then our 12-byte
field held all of the actual digits, and only our NUL
terminator overflowed. According to POSIX, there should be a
NUL or space at the end of the field. However, GNU tar seems
to be lenient here, and will correctly parse a size up 64GB
(minus one) from the field. So sizes in this range might
have just worked, depending on the implementation reading
the tarfile.

This patch is mostly still an improvement there, as the 8GB
limit is specifically mentioned in POSIX as the correct
limit. But it's possible that it could be a regression
(versus the pre-f2f0267 state) if all of the following are
true:

  1. You have a file between 8GB and 64GB.

  2. Your tar implementation _doesn't_ know about pax
     extended headers.

  3. Your tar implementation _does_ parse 12-byte sizes from
     the ustar header without a delimiter.

It's probably not worth worrying about such an obscure set
of conditions, but I'm documenting it here just in case.

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 10:25:46 -07:00
Jeff King
9e6c1e91a3 archive-tar: convert snprintf to xsnprintf
Commit f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24) converted cases of "sprintf" to
"xsnprintf", but accidentally left one as just "snprintf".
This meant that we could silently truncate the resulting
buffer instead of flagging an error.

In practice, this is impossible to achieve, as we are
formatting a ustar checksum, which can be at most 7
characters. But the point of xsnprintf is to document and
check for "should be impossible" conditions; this site was
just accidentally mis-converted during f2f0267.

Noticed-by: Paul Green <Paul.Green@stratus.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-26 10:44:26 -07:00
Jeff King
f2f0267529 archive-tar: use xsnprintf for trivial formatting
When we generate tar headers, we sprintf() values directly
into a struct with the fixed-size header values. For the
most part this is fine, as we are formatting small values
(e.g., the octal format of "mode & 0x7777" is of fixed
length). But it's still a good idea to use xsnprintf here.
It communicates to readers what our expectation is, and it
provides a run-time check that we are not overflowing the
buffers.

The one exception here is the mtime, which comes from the
epoch time of the commit we are archiving. For sane values,
this fits into the 12-byte value allocated in the header.
But since git can handle 64-bit times, if I claim to be a
visitor from the year 10,000 AD, I can overflow the buffer.
This turns out to be harmless, as we simply overflow into
the chksum field, which is then overwritten.

This case is also best as an xsnprintf. It should never come
up, short of extremely malformed dates, and in that case we
are probably better off dying than silently truncating the
date value (and we cannot expand the size of the buffer,
since it is dictated by the ustar format). Our friends in
the year 5138 (when we legitimately flip to a 12-digit
epoch) can deal with that problem then.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King
5096d4909f convert trivial sprintf / strcpy calls to xsnprintf
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.

However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King
108332c7a0 archive-tar: fix minor indentation violation
This looks like a simple omission from 8539070 (archive-tar:
unindent write_tar_entry by one level, 2012-05-03).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Junio C Hamano
15c6ef7b06 Revert "archive: honor tar.umask even for pax headers"
This reverts commit 10f343ea81, whose
output is no longer bit-for-bit equivalent from the older versions
of Git, which the infrastructure to (pretend to) upload tarballs
kernel.org uses depends on.
2014-10-20 12:04:46 -07:00
Junio C Hamano
825fd93767 Merge branch 'rs/child-process-init'
Code clean-up.

* rs/child-process-init:
  run-command: inline prepare_run_command_v_opt()
  run-command: call run_command_v_opt_cd_env() instead of duplicating it
  run-command: introduce child_process_init()
  run-command: introduce CHILD_PROCESS_INIT
2014-09-11 10:33:27 -07:00
René Scharfe
d318027932 run-command: introduce CHILD_PROCESS_INIT
Most struct child_process variables are cleared using memset first after
declaration.  Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead.  That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-20 09:53:37 -07:00
brian m. carlson
10f343ea81 archive: honor tar.umask even for pax headers
git archive's tar format uses extended pax headers to encode metadata
into the archive.  Most tar implementations correctly treat these as
metadata, but some that do not understand the pax format extract these
as files instead.  Apply the tar.umask setting to these entries to
prevent tampering by other users.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-04 11:39:11 -07:00
Jeff King
785a042981 archive-tar: use parse_config_key when parsing config
This is fewer lines of code, but more importantly, fixes a
bogus pointer offset. We are looking for "tar." in the
section, but later assume that the dot we found is at offset
9, not 3. This is a holdover from an earlier iteration of
767cf45 which called the section "tarfilter".

As a result, we could erroneously reject some filters with
dots in their name, as well as read uninitialized memory.

Reported by (and test by) René Scharfe.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-23 08:41:50 -08:00
Junio C Hamano
f12f3af726 Merge branch 'rs/leave-base-name-in-name-field-of-tar'
Improve compatibility with implementations of "tar" that do not
like empty name field in header (with the additional prefix field
holding everything).

* rs/leave-base-name-in-name-field-of-tar:
  archive-tar: split long paths more carefully
2013-01-10 13:47:35 -08:00
René Scharfe
22f0dcd963 archive-tar: split long paths more carefully
The name field of a tar header has a size of 100 characters.  This limit
was extended long ago in a backward compatible way by providing the
additional prefix field, which can hold 155 additional characters.  The
actual path is constructed at extraction time by concatenating the prefix
field, a slash and the name field.

get_path_prefix() is used to determine which slash in the path is used as
the cutting point and thus which part of it is placed into the field
prefix and which into the field name.  It tries to cram as much into the
prefix field as possible.  (And only if we can't fit a path into the
provided 255 characters we use a pax extended header to store it.)

If a path is longer than 100 but shorter than 156 characters and ends
with a slash (i.e. is for a directory) then get_path_prefix() puts the
whole path in the prefix field and leaves the name field empty.  GNU tar
reconstructs the path without complaint, but the tar included with
NetBSD 6 does not: It reports the header to be invalid.

For compatibility with this version of tar, make sure to never leave the
name field empty.  In order to do that, trim the trailing slash from the
part considered as possible prefix, if it exists -- that way the last
path component (or more, but not less) will end up in the name field.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-05 22:56:36 -08:00
Junio C Hamano
a5a46eb90f archive: ustar header checksum is computed unsigned
POSIX.1 (pax) is pretty clear on this:

  The chksum field shall be the ISO/IEC 646:1991 standard IRV
  representation of the octal value of the simple sum of all octets
  in the header logical record. Each octet in the header shall be
  treated as an unsigned value. These values shall be added to an
  unsigned integer, initialized to zero, the precision of which is
  not less than 17 bits. When calculating the checksum, the chksum
  field is treated as if it were all <space> characters.

so is GNU:

  http://www.gnu.org/software/tar/manual/html_node/Checksumming.html

Found by 7zip folks and reported by Rafał Mużyło.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-13 10:47:21 -07:00
Junio C Hamano
b83cfa5949 Merge branch 'rs/archive-tree-in-tip-simplify'
By René Scharfe
* rs/archive-tree-in-tip-simplify:
  archive-tar: keep const in checksum calculation
  archive: simplify refname handling
2012-05-23 13:35:22 -07:00
René Scharfe
bf38245be8 archive-tar: keep const in checksum calculation
For correctness, don't needlessly drop the const qualifier when casting.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-18 11:26:18 -07:00
Nguyễn Thái Ngọc Duy
5544049def archive-tar: stream large blobs to tar file
t5000 verifies output while t1050 makes sure the command always
respects core.bigfilethreshold

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-03 10:22:56 -07:00