mirror of
https://github.com/freebsd/freebsd-src
synced 2024-10-15 04:43:53 +00:00
Vendor import of xz 5.4.0 (trimmed)
This commit is contained in:
parent
46780ea2dc
commit
f6a891c2b4
12
AUTHORS
12
AUTHORS
|
@ -19,6 +19,18 @@ Authors of XZ Utils
|
|||
Andrew Dudman helped adapting the scripts and their man pages for
|
||||
XZ Utils.
|
||||
|
||||
The initial version of the threaded .xz decompressor was written
|
||||
by Sebastian Andrzej Siewior.
|
||||
|
||||
The initial version of the .lz (lzip) decoder was written
|
||||
by Michał Górny.
|
||||
|
||||
CLMUL-accelerated CRC code was contributed by Ilya Kurdyukov.
|
||||
|
||||
Other authors:
|
||||
- Jonathan Nieder
|
||||
- Joachim Henke
|
||||
|
||||
The GNU Autotools-based build system contains files from many authors,
|
||||
which I'm not trying to list here.
|
||||
|
||||
|
|
74
README
74
README
|
@ -202,9 +202,77 @@ XZ Utils
|
|||
|
||||
https://translationproject.org/html/translators.html
|
||||
|
||||
Several strings will change in a future version of xz so if you
|
||||
wish to start a new translation, look at the code in the xz git
|
||||
repository instead of a 5.2.x release.
|
||||
Below are notes and testing instructions specific to xz
|
||||
translations.
|
||||
|
||||
Testing can be done by installing xz into a temporary directory:
|
||||
|
||||
./configure --disable-shared --prefix=/tmp/xz-test
|
||||
# <Edit the .po file in the po directory.>
|
||||
make -C po update-po
|
||||
make install
|
||||
bash debug/translation.bash | less
|
||||
bash debug/translation.bash | less -S # For --list outputs
|
||||
|
||||
Repeat the above as needed (no need to re-run configure though).
|
||||
|
||||
Note especially the following:
|
||||
|
||||
- The output of --help and --long-help must look nice on
|
||||
an 80-column terminal. It's OK to add extra lines if needed.
|
||||
|
||||
- In contrast, don't add extra lines to error messages and such.
|
||||
They are often preceded with e.g. a filename on the same line,
|
||||
so you have no way to predict where to put a \n. Let the terminal
|
||||
do the wrapping even if it looks ugly. Adding new lines will be
|
||||
even uglier in the generic case even if it looks nice in a few
|
||||
limited examples.
|
||||
|
||||
- Be careful with column alignment in tables and table-like output
|
||||
(--list, --list --verbose --verbose, --info-memory, --help, and
|
||||
--long-help):
|
||||
|
||||
* All descriptions of options in --help should start in the
|
||||
same column (but it doesn't need to be the same column as
|
||||
in the English messages; just be consistent if you change it).
|
||||
Check that both --help and --long-help look OK, since they
|
||||
share several strings.
|
||||
|
||||
* --list --verbose and --info-memory print lines that have
|
||||
the format "Description: %s". If you need a longer
|
||||
description, you can put extra space between the colon
|
||||
and %s. Then you may need to add extra space to other
|
||||
strings too so that the result as a whole looks good (all
|
||||
values start at the same column).
|
||||
|
||||
* The columns of the actual tables in --list --verbose --verbose
|
||||
should be aligned properly. Abbreviate if necessary. It might
|
||||
be good to keep at least 2 or 3 spaces between column headings
|
||||
and avoid spaces in the headings so that the columns stand out
|
||||
better, but this is a matter of opinion. Do what you think
|
||||
looks best.
|
||||
|
||||
- Be careful to put a period at the end of a sentence when the
|
||||
original version has it, and don't put it when the original
|
||||
doesn't have it. Similarly, be careful with \n characters
|
||||
at the beginning and end of the strings.
|
||||
|
||||
- Read the TRANSLATORS comments that have been extracted from the
|
||||
source code and included in xz.pot. Some comments suggest
|
||||
testing with a specific command which needs an .xz file. You
|
||||
may use e.g. any tests/files/good-*.xz. However, these test
|
||||
commands are included in translations.bash output, so reading
|
||||
translations.bash output carefully can be enough.
|
||||
|
||||
- If you find language problems in the original English strings,
|
||||
feel free to suggest improvements. Ask if something is unclear.
|
||||
|
||||
- The translated messages should be understandable (sometimes this
|
||||
may be a problem with the original English messages too). Don't
|
||||
make a direct word-by-word translation from English especially if
|
||||
the result doesn't sound good in your language.
|
||||
|
||||
Thanks for your help!
|
||||
|
||||
|
||||
5. Other implementations of the .xz format
|
||||
|
|
9
THANKS
9
THANKS
|
@ -42,8 +42,11 @@ has been important. :-) In alphabetical order:
|
|||
- Michael Fox
|
||||
- Mike Frysinger
|
||||
- Daniel Richard G.
|
||||
- Tomasz Gajc
|
||||
- Bjarni Ingi Gislason
|
||||
- John Paul Adrian Glaubitz
|
||||
- Bill Glessner
|
||||
- Michał Górny
|
||||
- Jason Gorski
|
||||
- Juan Manuel Guerrero
|
||||
- Diederik de Haas
|
||||
|
@ -51,6 +54,8 @@ has been important. :-) In alphabetical order:
|
|||
- Christian Hesse
|
||||
- Vincenzo Innocente
|
||||
- Peter Ivanov
|
||||
- Nicholas Jackson
|
||||
- Sam James
|
||||
- Jouk Jansen
|
||||
- Jun I Jin
|
||||
- Kiyoshi Kanazawa
|
||||
|
@ -61,8 +66,10 @@ has been important. :-) In alphabetical order:
|
|||
- Jan Kratochvil
|
||||
- Christian Kujau
|
||||
- Stephan Kulow
|
||||
- Ilya Kurdyukov
|
||||
- Peter Lawler
|
||||
- James M Leddy
|
||||
- Vincent Lefevre
|
||||
- Hin-Tak Leung
|
||||
- Andraž 'ruskie' Levstik
|
||||
- Cary Lewis
|
||||
|
@ -79,6 +86,8 @@ has been important. :-) In alphabetical order:
|
|||
- Ivan A. Melnikov
|
||||
- Jim Meyering
|
||||
- Arkadiusz Miskiewicz
|
||||
- Nathan Moinvaziri
|
||||
- Étienne Mollier
|
||||
- Conley Moorhous
|
||||
- Rafał Mużyło
|
||||
- Adrien Nader
|
||||
|
|
2
TODO
2
TODO
|
@ -59,8 +59,6 @@ Missing features
|
|||
- Implement threaded match finders.
|
||||
- Implement pigz-style threading in LZMA2.
|
||||
|
||||
Multithreaded decompression
|
||||
|
||||
Buffer-to-buffer coding could use less RAM (especially when
|
||||
decompressing LZMA1 or LZMA2).
|
||||
|
||||
|
|
|
@ -14,7 +14,7 @@
|
|||
#define TUKLIB_COMMON_H
|
||||
|
||||
// The config file may be replaced by a package-specific file.
|
||||
// It should include at least stddef.h, inttypes.h, and limits.h.
|
||||
// It should include at least stddef.h, stdbool.h, inttypes.h, and limits.h.
|
||||
#include "tuklib_config.h"
|
||||
|
||||
// TUKLIB_SYMBOL_PREFIX is prefixed to all symbols exported by
|
||||
|
|
|
@ -1,7 +1,10 @@
|
|||
// If config.h isn't available, assume that the headers required by
|
||||
// tuklib_common.h are available. This is required by crc32_tablegen.c.
|
||||
#ifdef HAVE_CONFIG_H
|
||||
# include "sysdefs.h"
|
||||
#else
|
||||
# include <stddef.h>
|
||||
# include <stdbool.h>
|
||||
# include <inttypes.h>
|
||||
# include <limits.h>
|
||||
#endif
|
||||
|
|
|
@ -17,8 +17,8 @@
|
|||
/// - Byte swapping: bswapXX(num)
|
||||
/// - Byte order conversions to/from native (byteswaps if Y isn't
|
||||
/// the native endianness): convXXYe(num)
|
||||
/// - Unaligned reads (16/32-bit only): readXXYe(ptr)
|
||||
/// - Unaligned writes (16/32-bit only): writeXXYe(ptr, num)
|
||||
/// - Unaligned reads: readXXYe(ptr)
|
||||
/// - Unaligned writes: writeXXYe(ptr, num)
|
||||
/// - Aligned reads: aligned_readXXYe(ptr)
|
||||
/// - Aligned writes: aligned_writeXXYe(ptr, num)
|
||||
///
|
||||
|
@ -343,6 +343,46 @@ read32le(const uint8_t *buf)
|
|||
}
|
||||
|
||||
|
||||
static inline uint64_t
|
||||
read64be(const uint8_t *buf)
|
||||
{
|
||||
#if defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
|
||||
uint64_t num = read64ne(buf);
|
||||
return conv64be(num);
|
||||
#else
|
||||
uint64_t num = (uint64_t)buf[0] << 56;
|
||||
num |= (uint64_t)buf[1] << 48;
|
||||
num |= (uint64_t)buf[2] << 40;
|
||||
num |= (uint64_t)buf[3] << 32;
|
||||
num |= (uint64_t)buf[4] << 24;
|
||||
num |= (uint64_t)buf[5] << 16;
|
||||
num |= (uint64_t)buf[6] << 8;
|
||||
num |= (uint64_t)buf[7];
|
||||
return num;
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
static inline uint64_t
|
||||
read64le(const uint8_t *buf)
|
||||
{
|
||||
#if !defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
|
||||
uint64_t num = read64ne(buf);
|
||||
return conv64le(num);
|
||||
#else
|
||||
uint64_t num = (uint64_t)buf[0];
|
||||
num |= (uint64_t)buf[1] << 8;
|
||||
num |= (uint64_t)buf[2] << 16;
|
||||
num |= (uint64_t)buf[3] << 24;
|
||||
num |= (uint64_t)buf[4] << 32;
|
||||
num |= (uint64_t)buf[5] << 40;
|
||||
num |= (uint64_t)buf[6] << 48;
|
||||
num |= (uint64_t)buf[7] << 56;
|
||||
return num;
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
// NOTE: Possible byte swapping must be done in a macro to allow the compiler
|
||||
// to optimize byte swapping of constants when using glibc's or *BSD's
|
||||
// byte swapping macros. The actual write is done in an inline function
|
||||
|
@ -350,11 +390,13 @@ read32le(const uint8_t *buf)
|
|||
#if defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
|
||||
# define write16be(buf, num) write16ne(buf, conv16be(num))
|
||||
# define write32be(buf, num) write32ne(buf, conv32be(num))
|
||||
# define write64be(buf, num) write64ne(buf, conv64be(num))
|
||||
#endif
|
||||
|
||||
#if !defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
|
||||
# define write16le(buf, num) write16ne(buf, conv16le(num))
|
||||
# define write32le(buf, num) write32ne(buf, conv32le(num))
|
||||
# define write64le(buf, num) write64ne(buf, conv64le(num))
|
||||
#endif
|
||||
|
||||
|
||||
|
|
|
@ -240,6 +240,36 @@ typedef enum {
|
|||
* can be a sign of a bug in liblzma. See the documentation
|
||||
* how to report bugs.
|
||||
*/
|
||||
|
||||
LZMA_SEEK_NEEDED = 12,
|
||||
/**<
|
||||
* \brief Request to change the input file position
|
||||
*
|
||||
* Some coders can do random access in the input file. The
|
||||
* initialization functions of these coders take the file size
|
||||
* as an argument. No other coders can return LZMA_SEEK_NEEDED.
|
||||
*
|
||||
* When this value is returned, the application must seek to
|
||||
* the file position given in lzma_stream.seek_pos. This value
|
||||
* is guaranteed to never exceed the file size that was
|
||||
* specified at the coder initialization.
|
||||
*
|
||||
* After seeking the application should read new input and
|
||||
* pass it normally via lzma_stream.next_in and .avail_in.
|
||||
*/
|
||||
|
||||
/*
|
||||
* These eumerations may be used internally by liblzma
|
||||
* but they will never be returned to applications.
|
||||
*/
|
||||
LZMA_RET_INTERNAL1 = 101,
|
||||
LZMA_RET_INTERNAL2 = 102,
|
||||
LZMA_RET_INTERNAL3 = 103,
|
||||
LZMA_RET_INTERNAL4 = 104,
|
||||
LZMA_RET_INTERNAL5 = 105,
|
||||
LZMA_RET_INTERNAL6 = 106,
|
||||
LZMA_RET_INTERNAL7 = 107,
|
||||
LZMA_RET_INTERNAL8 = 108
|
||||
} lzma_ret;
|
||||
|
||||
|
||||
|
@ -520,7 +550,19 @@ typedef struct {
|
|||
void *reserved_ptr2;
|
||||
void *reserved_ptr3;
|
||||
void *reserved_ptr4;
|
||||
uint64_t reserved_int1;
|
||||
|
||||
/**
|
||||
* \brief New seek input position for LZMA_SEEK_NEEDED
|
||||
*
|
||||
* When lzma_code() returns LZMA_SEEK_NEEDED, the new input position
|
||||
* needed by liblzma will be available seek_pos. The value is
|
||||
* guaranteed to not exceed the file size that was specified when
|
||||
* this lzma_stream was initialized.
|
||||
*
|
||||
* In all other situations the value of this variable is undefined.
|
||||
*/
|
||||
uint64_t seek_pos;
|
||||
|
||||
uint64_t reserved_int2;
|
||||
size_t reserved_int3;
|
||||
size_t reserved_int4;
|
||||
|
|
|
@ -49,9 +49,13 @@
|
|||
* Filter for SPARC binaries.
|
||||
*/
|
||||
|
||||
#define LZMA_FILTER_ARM64 LZMA_VLI_C(0x0A)
|
||||
/**<
|
||||
* Filter for ARM64 binaries.
|
||||
*/
|
||||
|
||||
/**
|
||||
* \brief Options for BCJ filters
|
||||
* \brief Options for BCJ filters (except ARM64)
|
||||
*
|
||||
* The BCJ filters never change the size of the data. Specifying options
|
||||
* for them is optional: if pointer to options is NULL, default value is
|
||||
|
|
|
@ -69,7 +69,12 @@ typedef struct {
|
|||
*
|
||||
* Set this to zero if no flags are wanted.
|
||||
*
|
||||
* No flags are currently supported.
|
||||
* Encoder: No flags are currently supported.
|
||||
*
|
||||
* Decoder: Bitwise-or of zero or more of the decoder flags:
|
||||
* LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
|
||||
* LZMA_TELL_ANY_CHECK, LZMA_IGNORE_CHECK,
|
||||
* LZMA_CONCATENATED, LZMA_FAIL_FAST
|
||||
*/
|
||||
uint32_t flags;
|
||||
|
||||
|
@ -79,7 +84,7 @@ typedef struct {
|
|||
uint32_t threads;
|
||||
|
||||
/**
|
||||
* \brief Maximum uncompressed size of a Block
|
||||
* \brief Encoder only: Maximum uncompressed size of a Block
|
||||
*
|
||||
* The encoder will start a new .xz Block every block_size bytes.
|
||||
* Using LZMA_FULL_FLUSH or LZMA_FULL_BARRIER with lzma_code()
|
||||
|
@ -135,7 +140,7 @@ typedef struct {
|
|||
uint32_t timeout;
|
||||
|
||||
/**
|
||||
* \brief Compression preset (level and possible flags)
|
||||
* \brief Encoder only: Compression preset
|
||||
*
|
||||
* The preset is set just like with lzma_easy_encoder().
|
||||
* The preset is ignored if filters below is non-NULL.
|
||||
|
@ -143,7 +148,7 @@ typedef struct {
|
|||
uint32_t preset;
|
||||
|
||||
/**
|
||||
* \brief Filter chain (alternative to a preset)
|
||||
* \brief Encoder only: Filter chain (alternative to a preset)
|
||||
*
|
||||
* If this is NULL, the preset above is used. Otherwise the preset
|
||||
* is ignored and the filter chain specified here is used.
|
||||
|
@ -151,7 +156,7 @@ typedef struct {
|
|||
const lzma_filter *filters;
|
||||
|
||||
/**
|
||||
* \brief Integrity check type
|
||||
* \brief Encoder only: Integrity check type
|
||||
*
|
||||
* See check.h for available checks. The xz command line tool
|
||||
* defaults to LZMA_CHECK_CRC64, which is a good choice if you
|
||||
|
@ -173,8 +178,50 @@ typedef struct {
|
|||
uint32_t reserved_int2;
|
||||
uint32_t reserved_int3;
|
||||
uint32_t reserved_int4;
|
||||
uint64_t reserved_int5;
|
||||
uint64_t reserved_int6;
|
||||
|
||||
/**
|
||||
* \brief Memory usage limit to reduce the number of threads
|
||||
*
|
||||
* Encoder: Ignored.
|
||||
*
|
||||
* Decoder:
|
||||
*
|
||||
* If the number of threads has been set so high that more than
|
||||
* memlimit_threading bytes of memory would be needed, the number
|
||||
* of threads will be reduced so that the memory usage will not exceed
|
||||
* memlimit_threading bytes. However, if memlimit_threading cannot
|
||||
* be met even in single-threaded mode, then decoding will continue
|
||||
* in single-threaded mode and memlimit_threading may be exceeded
|
||||
* even by a large amount. That is, memlimit_threading will never make
|
||||
* lzma_code() return LZMA_MEMLIMIT_ERROR. To truly cap the memory
|
||||
* usage, see memlimit_stop below.
|
||||
*
|
||||
* Setting memlimit_threading to UINT64_MAX or a similar huge value
|
||||
* means that liblzma is allowed to keep the whole compressed file
|
||||
* and the whole uncompressed file in memory in addition to the memory
|
||||
* needed by the decompressor data structures used by each thread!
|
||||
* In other words, a reasonable value limit must be set here or it
|
||||
* will cause problems sooner or later. If you have no idea what
|
||||
* a reasonable value could be, try lzma_physmem() / 4 as a starting
|
||||
* point. Setting this limit will never prevent decompression of
|
||||
* a file; this will only reduce the number of threads.
|
||||
*
|
||||
* If memlimit_threading is greater than memlimit_stop, then the value
|
||||
* of memlimit_stop will be used for both.
|
||||
*/
|
||||
uint64_t memlimit_threading;
|
||||
|
||||
/**
|
||||
* \brief Memory usage limit that should never be exceeded
|
||||
*
|
||||
* Encoder: Ignored.
|
||||
*
|
||||
* Decoder: If decompressing will need more than this amount of
|
||||
* memory even in the single-threaded mode, then lzma_code() will
|
||||
* return LZMA_MEMLIMIT_ERROR.
|
||||
*/
|
||||
uint64_t memlimit_stop;
|
||||
|
||||
uint64_t reserved_int7;
|
||||
uint64_t reserved_int8;
|
||||
void *reserved_ptr1;
|
||||
|
@ -444,6 +491,60 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
|
|||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/**
|
||||
* \brief MicroLZMA encoder
|
||||
*
|
||||
* The MicroLZMA format is a raw LZMA stream whose first byte (always 0x00)
|
||||
* has been replaced with bitwise-negation of the LZMA properties (lc/lp/pb).
|
||||
* This encoding ensures that the first byte of MicroLZMA stream is never
|
||||
* 0x00. There is no end of payload marker and thus the uncompressed size
|
||||
* must be stored separately. For the best error detection the dictionary
|
||||
* size should be stored separately as well but alternatively one may use
|
||||
* the uncompressed size as the dictionary size when decoding.
|
||||
*
|
||||
* With the MicroLZMA encoder, lzma_code() behaves slightly unusually.
|
||||
* The action argument must be LZMA_FINISH and the return value will never be
|
||||
* LZMA_OK. Thus the encoding is always done with a single lzma_code() after
|
||||
* the initialization. The benefit of the combination of initialization
|
||||
* function and lzma_code() is that memory allocations can be re-used for
|
||||
* better performance.
|
||||
*
|
||||
* lzma_code() will try to encode as much input as is possible to fit into
|
||||
* the given output buffer. If not all input can be encoded, the stream will
|
||||
* be finished without encoding all the input. The caller must check both
|
||||
* input and output buffer usage after lzma_code() (total_in and total_out
|
||||
* in lzma_stream can be convenient). Often lzma_code() can fill the output
|
||||
* buffer completely if there is a lot of input, but sometimes a few bytes
|
||||
* may remain unused because the next LZMA symbol would require more space.
|
||||
*
|
||||
* lzma_stream.avail_out must be at least 6. Otherwise LZMA_PROG_ERROR
|
||||
* will be returned.
|
||||
*
|
||||
* The LZMA dictionary should be reasonably low to speed up the encoder
|
||||
* re-initialization. A good value is bigger than the resulting
|
||||
* uncompressed size of most of the output chunks. For example, if output
|
||||
* size is 4 KiB, dictionary size of 32 KiB or 64 KiB is good. If the
|
||||
* data compresses extremely well, even 128 KiB may be useful.
|
||||
*
|
||||
* The MicroLZMA format and this encoder variant were made with the EROFS
|
||||
* file system in mind. This format may be convenient in other embedded
|
||||
* uses too where many small streams are needed. XZ Embedded includes a
|
||||
* decoder for this format.
|
||||
*
|
||||
* \return - LZMA_STREAM_END: All good. Check the amounts of input used
|
||||
* and output produced. Store the amount of input used
|
||||
* (uncompressed size) as it needs to be known to decompress
|
||||
* the data.
|
||||
* - LZMA_OPTIONS_ERROR
|
||||
* - LZMA_MEM_ERROR
|
||||
* - LZMA_PROG_ERROR: In addition to the generic reasons for this
|
||||
* error code, this may also be returned if there isn't enough
|
||||
* output space (6 bytes) to create a valid MicroLZMA stream.
|
||||
*/
|
||||
extern LZMA_API(lzma_ret) lzma_microlzma_encoder(
|
||||
lzma_stream *strm, const lzma_options_lzma *options);
|
||||
|
||||
|
||||
/************
|
||||
* Decoding *
|
||||
************/
|
||||
|
@ -501,8 +602,8 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
|
|||
/**
|
||||
* This flag enables decoding of concatenated files with file formats that
|
||||
* allow concatenating compressed files as is. From the formats currently
|
||||
* supported by liblzma, only the .xz format allows concatenated files.
|
||||
* Concatenated files are not allowed with the legacy .lzma format.
|
||||
* supported by liblzma, only the .xz and .lz formats allow concatenated
|
||||
* files. Concatenated files are not allowed with the legacy .lzma format.
|
||||
*
|
||||
* This flag also affects the usage of the `action' argument for lzma_code().
|
||||
* When LZMA_CONCATENATED is used, lzma_code() won't return LZMA_STREAM_END
|
||||
|
@ -515,6 +616,35 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
|
|||
#define LZMA_CONCATENATED UINT32_C(0x08)
|
||||
|
||||
|
||||
/**
|
||||
* This flag makes the threaded decoder report errors (like LZMA_DATA_ERROR)
|
||||
* as soon as they are detected. This saves time when the application has no
|
||||
* interest in a partially decompressed truncated or corrupt file. Note that
|
||||
* due to timing randomness, if the same truncated or corrupt input is
|
||||
* decompressed multiple times with this flag, a different amount of output
|
||||
* may be produced by different runs, and even the error code might vary.
|
||||
*
|
||||
* When using LZMA_FAIL_FAST, it is recommended to use LZMA_FINISH to tell
|
||||
* the decoder when no more input will be coming because it can help fast
|
||||
* detection and reporting of truncated files. Note that in this situation
|
||||
* truncated files might be diagnosed with LZMA_DATA_ERROR instead of
|
||||
* LZMA_OK or LZMA_BUF_ERROR!
|
||||
*
|
||||
* Without this flag the threaded decoder will provide as much output as
|
||||
* possible at first and then report the pending error. This default behavior
|
||||
* matches the single-threaded decoder and provides repeatable behavior
|
||||
* with truncated or corrupt input. There are a few special cases where the
|
||||
* behavior can still differ like memory allocation failures (LZMA_MEM_ERROR).
|
||||
*
|
||||
* Single-threaded decoders currently ignore this flag.
|
||||
*
|
||||
* Support for this flag was added in liblzma 5.3.3alpha. Note that in older
|
||||
* versions this flag isn't supported (LZMA_OPTIONS_ERROR) even by functions
|
||||
* that ignore this flag in newer liblzma versions.
|
||||
*/
|
||||
#define LZMA_FAIL_FAST UINT32_C(0x20)
|
||||
|
||||
|
||||
/**
|
||||
* \brief Initialize .xz Stream decoder
|
||||
*
|
||||
|
@ -527,7 +657,7 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
|
|||
* \param flags Bitwise-or of zero or more of the decoder flags:
|
||||
* LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
|
||||
* LZMA_TELL_ANY_CHECK, LZMA_IGNORE_CHECK,
|
||||
* LZMA_CONCATENATED
|
||||
* LZMA_CONCATENATED, LZMA_FAIL_FAST
|
||||
*
|
||||
* \return - LZMA_OK: Initialization was successful.
|
||||
* - LZMA_MEM_ERROR: Cannot allocate memory.
|
||||
|
@ -540,11 +670,43 @@ extern LZMA_API(lzma_ret) lzma_stream_decoder(
|
|||
|
||||
|
||||
/**
|
||||
* \brief Decode .xz Streams and .lzma files with autodetection
|
||||
* \brief Initialize multithreaded .xz Stream decoder
|
||||
*
|
||||
* This decoder autodetects between the .xz and .lzma file formats, and
|
||||
* calls lzma_stream_decoder() or lzma_alone_decoder() once the type
|
||||
* of the input file has been detected.
|
||||
* \param strm Pointer to properly prepared lzma_stream
|
||||
* \param options Pointer to multithreaded compression options
|
||||
*
|
||||
* The decoder can decode multiple Blocks in parallel. This requires that each
|
||||
* Block Header contains the Compressed Size and Uncompressed size fields
|
||||
* which are added by the multi-threaded encoder, see lzma_stream_encoder_mt().
|
||||
*
|
||||
* A Stream with one Block will only utilize one thread. A Stream with multiple
|
||||
* Blocks but without size information in Block Headers will be processed in
|
||||
* single-threaded mode in the same way as done by lzma_stream_decoder().
|
||||
* Concatenated Streams are processed one Stream at a time; no inter-Stream
|
||||
* parallelization is done.
|
||||
*
|
||||
* This function behaves like lzma_stream_decoder() when options->threads == 1
|
||||
* and options->memlimit_threading <= 1.
|
||||
*
|
||||
* \return - LZMA_OK: Initialization was successful.
|
||||
* - LZMA_MEM_ERROR: Cannot allocate memory.
|
||||
* - LZMA_MEMLIMIT_ERROR: Memory usage limit was reached.
|
||||
* - LZMA_OPTIONS_ERROR: Unsupported flags.
|
||||
* - LZMA_PROG_ERROR
|
||||
*/
|
||||
extern LZMA_API(lzma_ret) lzma_stream_decoder_mt(
|
||||
lzma_stream *strm, const lzma_mt *options)
|
||||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/**
|
||||
* \brief Decode .xz, .lzma, and .lz (lzip) files with autodetection
|
||||
*
|
||||
* This decoder autodetects between the .xz, .lzma, and .lz file formats,
|
||||
* and calls lzma_stream_decoder(), lzma_alone_decoder(), or
|
||||
* lzma_lzip_decoder() once the type of the input file has been detected.
|
||||
*
|
||||
* Support for .lz was added in 5.4.0.
|
||||
*
|
||||
* If the flag LZMA_CONCATENATED is used and the input is a .lzma file:
|
||||
* For historical reasons concatenated .lzma files aren't supported.
|
||||
|
@ -562,7 +724,7 @@ extern LZMA_API(lzma_ret) lzma_stream_decoder(
|
|||
* \param flags Bitwise-or of zero or more of the decoder flags:
|
||||
* LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
|
||||
* LZMA_TELL_ANY_CHECK, LZMA_IGNORE_CHECK,
|
||||
* LZMA_CONCATENATED
|
||||
* LZMA_CONCATENATED, LZMA_FAIL_FAST
|
||||
*
|
||||
* \return - LZMA_OK: Initialization was successful.
|
||||
* - LZMA_MEM_ERROR: Cannot allocate memory.
|
||||
|
@ -597,6 +759,64 @@ extern LZMA_API(lzma_ret) lzma_alone_decoder(
|
|||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/**
|
||||
* \brief Initialize .lz (lzip) decoder (a foreign file format)
|
||||
*
|
||||
* \param strm Pointer to properly prepared lzma_stream
|
||||
* \param memlimit Memory usage limit as bytes. Use UINT64_MAX
|
||||
* to effectively disable the limiter.
|
||||
* \param flags Bitwise-or of flags, or zero for no flags.
|
||||
* All decoder flags listed above are supported
|
||||
* although only LZMA_CONCATENATED and (in very rare
|
||||
* cases) LZMA_IGNORE_CHECK are actually useful.
|
||||
* LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
|
||||
* and LZMA_FAIL_FAST do nothing. LZMA_TELL_ANY_CHECK
|
||||
* is supported for consistency only as CRC32 is
|
||||
* always used in the .lz format.
|
||||
*
|
||||
* This decoder supports the .lz format version 0 and the unextended .lz
|
||||
* format version 1:
|
||||
*
|
||||
* - Files in the format version 0 were produced by lzip 1.3 and older.
|
||||
* Such files aren't common but may be found from file archives
|
||||
* as a few source packages were released in this format. People
|
||||
* might have old personal files in this format too. Decompression
|
||||
* support for the format version 0 was removed in lzip 1.18.
|
||||
*
|
||||
* - lzip 1.3 added decompression support for .lz format version 1 files.
|
||||
* Compression support was added in lzip 1.4. In lzip 1.6 the .lz format
|
||||
* version 1 was extended to support the Sync Flush marker. This extension
|
||||
* is not supported by liblzma. lzma_code() will return LZMA_DATA_ERROR
|
||||
* at the location of the Sync Flush marker. In practice files with
|
||||
* the Sync Flush marker are very rare and thus liblzma can decompress
|
||||
* almost all .lz files.
|
||||
*
|
||||
* Just like with lzma_stream_decoder() for .xz files, LZMA_CONCATENATED
|
||||
* should be used when decompressing normal standalone .lz files.
|
||||
*
|
||||
* The .lz format allows putting non-.lz data at the end of a file after at
|
||||
* least one valid .lz member. That is, one can append custom data at the end
|
||||
* of a .lz file and the decoder is required to ignore it. In liblzma this
|
||||
* is relevant only when LZMA_CONCATENATED is used. In that case lzma_code()
|
||||
* will return LZMA_STREAM_END and leave lzma_stream.next_in pointing to
|
||||
* the first byte of the non-.lz data. An exception to this is if the first
|
||||
* 1-3 bytes of the non-.lz data are identical to the .lz magic bytes
|
||||
* (0x4C, 0x5A, 0x49, 0x50; "LZIP" in US-ASCII). In such a case the 1-3 bytes
|
||||
* will have been ignored by lzma_code(). If one wishes to locate the non-.lz
|
||||
* data reliably, one must ensure that the first byte isn't 0x4C. Actually
|
||||
* one should ensure that none of the first four bytes of trailing data are
|
||||
* equal to the magic bytes because lzip >= 1.20 requires it by default.
|
||||
*
|
||||
* \return - LZMA_OK: Initialization was successful.
|
||||
* - LZMA_MEM_ERROR: Cannot allocate memory.
|
||||
* - LZMA_OPTIONS_ERROR: Unsupported flags
|
||||
* - LZMA_PROG_ERROR
|
||||
*/
|
||||
extern LZMA_API(lzma_ret) lzma_lzip_decoder(
|
||||
lzma_stream *strm, uint64_t memlimit, uint32_t flags)
|
||||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/**
|
||||
* \brief Single-call .xz Stream decoder
|
||||
*
|
||||
|
@ -606,9 +826,9 @@ extern LZMA_API(lzma_ret) lzma_alone_decoder(
|
|||
* returned.
|
||||
* \param flags Bitwise-or of zero or more of the decoder flags:
|
||||
* LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
|
||||
* LZMA_IGNORE_CHECK, LZMA_CONCATENATED. Note that
|
||||
* LZMA_TELL_ANY_CHECK is not allowed and will
|
||||
* return LZMA_PROG_ERROR.
|
||||
* LZMA_IGNORE_CHECK, LZMA_CONCATENATED,
|
||||
* LZMA_FAIL_FAST. Note that LZMA_TELL_ANY_CHECK
|
||||
* is not allowed and will return LZMA_PROG_ERROR.
|
||||
* \param allocator lzma_allocator for custom allocator functions.
|
||||
* Set to NULL to use malloc() and free().
|
||||
* \param in Beginning of the input buffer
|
||||
|
@ -642,3 +862,43 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_decode(
|
|||
const uint8_t *in, size_t *in_pos, size_t in_size,
|
||||
uint8_t *out, size_t *out_pos, size_t out_size)
|
||||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/**
|
||||
* \brief MicroLZMA decoder
|
||||
*
|
||||
* See lzma_microlzma_decoder() for more information.
|
||||
*
|
||||
* The lzma_code() usage with this decoder is completely normal. The
|
||||
* special behavior of lzma_code() applies to lzma_microlzma_encoder() only.
|
||||
*
|
||||
* \param strm Pointer to properly prepared lzma_stream
|
||||
* \param comp_size Compressed size of the MicroLZMA stream.
|
||||
* The caller must somehow know this exactly.
|
||||
* \param uncomp_size Uncompressed size of the MicroLZMA stream.
|
||||
* If the exact uncompressed size isn't known, this
|
||||
* can be set to a value that is at most as big as
|
||||
* the exact uncompressed size would be, but then the
|
||||
* next argument uncomp_size_is_exact must be false.
|
||||
* \param uncomp_size_is_exact
|
||||
* If true, uncomp_size must be exactly correct.
|
||||
* This will improve error detection at the end of
|
||||
* the stream. If the exact uncompressed size isn't
|
||||
* known, this must be false. uncomp_size must still
|
||||
* be at most as big as the exact uncompressed size
|
||||
* is. Setting this to false when the exact size is
|
||||
* known will work but error detection at the end of
|
||||
* the stream will be weaker.
|
||||
* \param dict_size LZMA dictionary size that was used when
|
||||
* compressing the data. It is OK to use a bigger
|
||||
* value too but liblzma will then allocate more
|
||||
* memory than would actually be required and error
|
||||
* detection will be slightly worse. (Note that with
|
||||
* the implementation in XZ Embedded it doesn't
|
||||
* affect the memory usage if one specifies bigger
|
||||
* dictionary than actually required.)
|
||||
*/
|
||||
extern LZMA_API(lzma_ret) lzma_microlzma_decoder(
|
||||
lzma_stream *strm, uint64_t comp_size,
|
||||
uint64_t uncomp_size, lzma_bool uncomp_size_is_exact,
|
||||
uint32_t dict_size);
|
||||
|
|
|
@ -124,6 +124,27 @@ extern LZMA_API(lzma_ret) lzma_filters_copy(
|
|||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/**
|
||||
* \brief Free the options in the array of lzma_filter structures
|
||||
*
|
||||
* This frees the filter chain options. The filters array itself is not freed.
|
||||
*
|
||||
* The filters array must have at most LZMA_FILTERS_MAX + 1 elements
|
||||
* including the terminating element which must have .id = LZMA_VLI_UNKNOWN.
|
||||
* For all elements before the terminating element:
|
||||
* - options will be freed using the given lzma_allocator or,
|
||||
* if allocator is NULL, using free().
|
||||
* - options will be set to NULL.
|
||||
* - id will be set to LZMA_VLI_UNKNOWN.
|
||||
*
|
||||
* If filters is NULL, this does nothing but remember that this never frees
|
||||
* the filters array itself.
|
||||
*/
|
||||
extern LZMA_API(void) lzma_filters_free(
|
||||
lzma_filter *filters, const lzma_allocator *allocator)
|
||||
lzma_nothrow;
|
||||
|
||||
|
||||
/**
|
||||
* \brief Calculate approximate memory requirements for raw encoder
|
||||
*
|
||||
|
@ -205,21 +226,27 @@ extern LZMA_API(lzma_ret) lzma_raw_decoder(
|
|||
/**
|
||||
* \brief Update the filter chain in the encoder
|
||||
*
|
||||
* This function is for advanced users only. This function has two slightly
|
||||
* different purposes:
|
||||
* This function may be called after lzma_code() has returned LZMA_STREAM_END
|
||||
* when LZMA_FULL_BARRIER, LZMA_FULL_FLUSH, or LZMA_SYNC_FLUSH was used:
|
||||
*
|
||||
* - After LZMA_FULL_FLUSH when using Stream encoder: Set a new filter
|
||||
* chain, which will be used starting from the next Block.
|
||||
* - After LZMA_FULL_BARRIER or LZMA_FULL_FLUSH: Single-threaded .xz Stream
|
||||
* encoder (lzma_stream_encoder()) and (since liblzma 5.4.0) multi-threaded
|
||||
* Stream encoder (lzma_stream_encoder_mt()) allow setting a new filter
|
||||
* chain to be used for the next Block(s).
|
||||
*
|
||||
* - After LZMA_SYNC_FLUSH using Raw, Block, or Stream encoder: Change
|
||||
* the filter-specific options in the middle of encoding. The actual
|
||||
* filters in the chain (Filter IDs) cannot be changed. In the future,
|
||||
* it might become possible to change the filter options without
|
||||
* using LZMA_SYNC_FLUSH.
|
||||
* - After LZMA_SYNC_FLUSH: Raw encoder (lzma_raw_encoder()),
|
||||
* Block encocder (lzma_block_encoder()), and single-threaded .xz Stream
|
||||
* encoder (lzma_stream_encoder()) allow changing certain filter-specific
|
||||
* options in the middle of encoding. The actual filters in the chain
|
||||
* (Filter IDs) must not be changed! Currently only the lc, lp, and pb
|
||||
* options of LZMA2 (not LZMA1) can be changed this way.
|
||||
*
|
||||
* While rarely useful, this function may be called also when no data has
|
||||
* been compressed yet. In that case, this function will behave as if
|
||||
* LZMA_FULL_FLUSH (Stream encoder) or LZMA_SYNC_FLUSH (Raw or Block
|
||||
* - In the future some filters might allow changing some of their options
|
||||
* without any barrier or flushing but currently such filters don't exist.
|
||||
*
|
||||
* This function may also be called when no data has been compressed yet
|
||||
* although this is rarely useful. In that case, this function will behave
|
||||
* as if LZMA_FULL_FLUSH (Stream encoders) or LZMA_SYNC_FLUSH (Raw or Block
|
||||
* encoder) had been used right before calling this function.
|
||||
*
|
||||
* \return - LZMA_OK
|
||||
|
@ -427,3 +454,261 @@ extern LZMA_API(lzma_ret) lzma_filter_flags_decode(
|
|||
lzma_filter *filter, const lzma_allocator *allocator,
|
||||
const uint8_t *in, size_t *in_pos, size_t in_size)
|
||||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/***********
|
||||
* Strings *
|
||||
***********/
|
||||
|
||||
/**
|
||||
* \brief Allow or show all filters
|
||||
*
|
||||
* By default only the filters supported in the .xz format are accept by
|
||||
* lzma_str_to_filters() or shown by lzma_str_list_filters().
|
||||
*/
|
||||
#define LZMA_STR_ALL_FILTERS UINT32_C(0x01)
|
||||
|
||||
|
||||
/**
|
||||
* \brief Do not validate the filter chain in lzma_str_to_filters()
|
||||
*
|
||||
* By default lzma_str_to_filters() can return an error if the filter chain
|
||||
* as a whole isn't usable in the .xz format or in the raw encoder or decoder.
|
||||
* With this flag the validation is skipped (this doesn't affect the handling
|
||||
* of the individual filter options).
|
||||
*/
|
||||
#define LZMA_STR_NO_VALIDATION UINT32_C(0x02)
|
||||
|
||||
|
||||
/**
|
||||
* \brief Stringify encoder options
|
||||
*
|
||||
* Show the filter-specific options that the encoder will use.
|
||||
* This may be useful for verbose diagnostic messages.
|
||||
*
|
||||
* Note that if options were decoded from .xz headers then the encoder options
|
||||
* may be undefined. This flag shouldn't be used in such a situation.
|
||||
*/
|
||||
#define LZMA_STR_ENCODER UINT32_C(0x10)
|
||||
|
||||
|
||||
/**
|
||||
* \brief Stringify decoder options
|
||||
*
|
||||
* Show the filter-specific options that the decoder will use.
|
||||
* This may be useful for showing what filter options were decoded
|
||||
* from file headers.
|
||||
*/
|
||||
#define LZMA_STR_DECODER UINT32_C(0x20)
|
||||
|
||||
|
||||
/**
|
||||
* \brief Produce xz-compatible getopt_long() syntax
|
||||
*
|
||||
* That is, "delta:dist=2 lzma2:dict=4MiB,pb=1,lp=1" becomes
|
||||
* "--delta=dist=2 --lzma2=dict=4MiB,pb=1,lp=1".
|
||||
*
|
||||
* This syntax is compatible with xz 5.0.0 as long as the filters and
|
||||
* their options are supported too.
|
||||
*/
|
||||
#define LZMA_STR_GETOPT_LONG UINT32_C(0x40)
|
||||
|
||||
|
||||
/**
|
||||
* \brief Use two dashes "--" instead of a space to separate filters
|
||||
*
|
||||
* That is, "delta:dist=2 lzma2:pb=1,lp=1" becomes
|
||||
* "delta:dist=2--lzma2:pb=1,lp=1". This looks slightly odd but this
|
||||
* kind of strings should be usable on the command line without quoting.
|
||||
* However, it is possible that future versions with new filter options
|
||||
* might produce strings that require shell quoting anyway as the exact
|
||||
* set of possible characters isn't frozen for now.
|
||||
*
|
||||
* It is guaranteed that the single quote (') will never be used in
|
||||
* filter chain strings (even if LZMA_STR_NO_SPACES isn't used).
|
||||
*/
|
||||
#define LZMA_STR_NO_SPACES UINT32_C(0x80)
|
||||
|
||||
|
||||
/**
|
||||
* \brief Convert a string to a filter chain
|
||||
*
|
||||
* This tries to make it easier to write applications that allow users
|
||||
* to set custom compression options. This only handles the filter
|
||||
* configuration (including presets) but not the number of threads,
|
||||
* block size, check type, or memory limits.
|
||||
*
|
||||
* The input string can be either a preset or a filter chain. Presets
|
||||
* begin with a digit 0-9 and may be followed by zero or more flags
|
||||
* which are lower-case letters. Currently only "e" is supported, matching
|
||||
* LZMA_PRESET_EXTREME. For partial xz command line syntax compatibility,
|
||||
* a preset string may start with a single dash "-".
|
||||
*
|
||||
* A filter chain consists of one or more "filtername:opt1=value1,opt2=value2"
|
||||
* strings separated by one or more spaces. Leading and trailing spaces are
|
||||
* ignored. All names and values must be lower-case. Extra commas in the
|
||||
* option list are ignored. The order of filters is significant: when
|
||||
* encoding, the uncompressed input data goes to the leftmost filter first.
|
||||
* Normally "lzma2" is the last filter in the chain.
|
||||
*
|
||||
* If one wishes to avoid spaces, for example, to avoid shell quoting,
|
||||
* it is possible to use two dashes "--" instead of spaces to separate
|
||||
* the filters.
|
||||
*
|
||||
* For xz command line compatibility, each filter may be prefixed with
|
||||
* two dashes "--" and the colon ":" separating the filter name from
|
||||
* the options may be replaced with an equals sign "=".
|
||||
*
|
||||
* By default, only filters that can be used in the .xz format are accepted.
|
||||
* To allow all filters (LZMA1) use the flag LZMA_STR_ALL_FILTERS.
|
||||
*
|
||||
* By default, very basic validation is done for the filter chain as a whole,
|
||||
* for example, that LZMA2 is only used as the last filter in the chain.
|
||||
* The validation isn't perfect though and it's possible that this function
|
||||
* succeeds but using the filter chain for encoding or decoding will still
|
||||
* result in LZMA_OPTIONS_ERROR. To disable this validation, use the flag
|
||||
* LZMA_STR_NO_VALIDATION.
|
||||
*
|
||||
* The available filter names and their options are available via
|
||||
* lzma_str_list_filters(). See the xz man page for the description
|
||||
* of filter names and options.
|
||||
*
|
||||
* \param str User-supplied string describing a preset or
|
||||
* a filter chain. If a default value is needed and
|
||||
* you don't know what would be good, use "6" since
|
||||
* that is the default preset in xz too.
|
||||
* \param error_pos If this isn't NULL, this value will be set on
|
||||
* both success and on all errors. This tells the
|
||||
* location of the error in the string. This is
|
||||
* an int to make it straightforward to use this
|
||||
* as printf() field width. The value is guaranteed
|
||||
* to be in the range [0, INT_MAX] even if strlen(str)
|
||||
* somehow was greater than INT_MAX.
|
||||
* \param filters An array of lzma_filter structures. There must
|
||||
* be LZMA_FILTERS_MAX + 1 (that is, five) elements
|
||||
* in the array. The old contents are ignored so it
|
||||
* doesn't need to be initialized. This array is
|
||||
* modified only if this function returns LZMA_OK.
|
||||
* Once the allocated filter options are no longer
|
||||
* needed, lzma_filters_free() can be used to free the
|
||||
* options (it doesn't free the filters array itself).
|
||||
* \param flags Bitwise-or of zero or more of the flags
|
||||
* LZMA_STR_ALL_FILTERS and LZMA_STR_NO_VALIDATION.
|
||||
* \param allocator lzma_allocator for custom allocator functions.
|
||||
* Set to NULL to use malloc() and free().
|
||||
*
|
||||
* \return On success, NULL is returned. On error, a statically-allocated
|
||||
* error message is returned which together with the error_pos
|
||||
* should give some idea what is wrong.
|
||||
*
|
||||
* For command line applications, below is an example how an error message
|
||||
* can be displayed. Note the use of an empty string for the field width.
|
||||
* If "^" was used there it would create an off-by-one error except at
|
||||
* the very beginning of the line.
|
||||
*
|
||||
* \code{.c}
|
||||
* const char *str = ...; // From user
|
||||
* lzma_filter filters[LZMA_FILTERS_MAX + 1];
|
||||
* int pos;
|
||||
* const char *msg = lzma_str_to_filters(str, &pos, filters, 0, NULL);
|
||||
* if (msg != NULL) {
|
||||
* printf("%s: Error in XZ compression options:\n", argv[0]);
|
||||
* printf("%s: %s\n", argv[0], str);
|
||||
* printf("%s: %*s^\n", argv[0], errpos, "");
|
||||
* printf("%s: %s\n", argv[0], msg);
|
||||
* }
|
||||
* \endcode
|
||||
*/
|
||||
extern LZMA_API(const char *) lzma_str_to_filters(
|
||||
const char *str, int *error_pos, lzma_filter *filters,
|
||||
uint32_t flags, const lzma_allocator *allocator)
|
||||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/**
|
||||
* \brief Convert a filter chain to a string
|
||||
*
|
||||
* Use cases:
|
||||
*
|
||||
* - Verbose output showing the full encoder options to the user
|
||||
* (use LZMA_STR_ENCODER in flags)
|
||||
*
|
||||
* - Showing the filters and options that are required to decode a file
|
||||
* (use LZMA_STR_DECODER in flags)
|
||||
*
|
||||
* - Showing the filter names without any options in informational messages
|
||||
* where the technical details aren't important (no flags). In this case
|
||||
* the .options in the filters array are ignored and may be NULL even if
|
||||
* a filter has a mandatory options structure.
|
||||
*
|
||||
* Note that even if the filter chain was specified using a preset,
|
||||
* the resulting filter chain isn't reversed to a preset. So if you
|
||||
* specify "6" to lzma_str_to_filters() then lzma_str_from_filters()
|
||||
* will produce a string containing "lzma2".
|
||||
*
|
||||
* \param str On success *str will be set to point to an
|
||||
* allocated string describing the given filter
|
||||
* chain. Old value is ignored. On error *str is
|
||||
* always set to NULL.
|
||||
* \param filters Array of 1-4 filters and a terminating element
|
||||
* with .id = LZMA_VLI_UNKNOWN.
|
||||
* \param flags Bitwise-or of zero or more of the flags
|
||||
* LZMA_STR_ENCODER, LZMA_STR_DECODER,
|
||||
* LZMA_STR_GETOPT_LONG, and LZMA_STR_NO_SPACES.
|
||||
* \param allocator lzma_allocator for custom allocator functions.
|
||||
* Set to NULL to use malloc() and free().
|
||||
*
|
||||
* \return - LZMA_OK
|
||||
* - LZMA_OPTIONS_ERROR: Empty filter chain
|
||||
* (filters[0].id == LZMA_VLI_UNKNOWN) or the filter chain
|
||||
* includes a Filter ID that is not supported by this function.
|
||||
* - LZMA_MEM_ERROR
|
||||
* - LZMA_PROG_ERROR
|
||||
*/
|
||||
extern LZMA_API(lzma_ret) lzma_str_from_filters(
|
||||
char **str, const lzma_filter *filters, uint32_t flags,
|
||||
const lzma_allocator *allocator)
|
||||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
||||
|
||||
/**
|
||||
* \brief List available filters and/or their options (for help message)
|
||||
*
|
||||
* If a filter_id is given then only one line is created which contains the
|
||||
* filter name. If LZMA_STR_ENCODER or LZMA_STR_DECODER is used then the
|
||||
* options required for encoding or decoding are listed on the same line too.
|
||||
*
|
||||
* If filter_id is LZMA_VLI_UNKNOWN then all supported .xz-compatible filters
|
||||
* are listed:
|
||||
*
|
||||
* - If neither LZMA_STR_ENCODER nor LZMA_STR_DECODER is used then
|
||||
* the supported filter names are listed on a single line separated
|
||||
* by spaces.
|
||||
*
|
||||
* - If LZMA_STR_ENCODER or LZMA_STR_DECODER is used then filters and
|
||||
* the supported options are listed one filter per line. There won't
|
||||
* be a '\n' after the last filter.
|
||||
*
|
||||
* - If LZMA_STR_ALL_FILTERS is used then the list will include also
|
||||
* those filters that cannot be used in the .xz format (LZMA1).
|
||||
*
|
||||
* \param str On success *str will be set to point to an
|
||||
* allocated string listing the filters and options.
|
||||
* Old value is ignored. On error *str is always set
|
||||
* to NULL.
|
||||
* \param filter_id Filter ID or LZMA_VLI_UNKNOWN.
|
||||
* \param flags Bitwise-or of zero or more of the flags
|
||||
* LZMA_STR_ALL_FILTERS, LZMA_STR_ENCODER,
|
||||
* LZMA_STR_DECODER, and LZMA_STR_GETOPT_LONG.
|
||||
* \param allocator lzma_allocator for custom allocator functions.
|
||||
* Set to NULL to use malloc() and free().
|
||||
*
|
||||
* \return - LZMA_OK
|
||||
* - LZMA_OPTIONS_ERROR: Unsupported filter_id or flags
|
||||
* - LZMA_MEM_ERROR
|
||||
* - LZMA_PROG_ERROR
|
||||
*/
|
||||
extern LZMA_API(lzma_ret) lzma_str_list_filters(
|
||||
char **str, lzma_vli filter_id, uint32_t flags,
|
||||
const lzma_allocator *allocator)
|
||||
lzma_nothrow lzma_attr_warn_unused_result;
|
||||
|
|
|
@ -684,3 +684,69 @@ extern LZMA_API(lzma_ret) lzma_index_buffer_decode(lzma_index **i,
|
|||
uint64_t *memlimit, const lzma_allocator *allocator,
|
||||
const uint8_t *in, size_t *in_pos, size_t in_size)
|
||||
lzma_nothrow;
|
||||
|
||||
|
||||
/**
|
||||
* \brief Initialize a .xz file information decoder
|
||||
*
|
||||
* \param strm Pointer to a properly prepared lzma_stream
|
||||
* \param dest_index Pointer to a pointer where the decoder will put
|
||||
* the decoded lzma_index. The old value
|
||||
* of *dest_index is ignored (not freed).
|
||||
* \param memlimit How much memory the resulting lzma_index is
|
||||
* allowed to require. Use UINT64_MAX to
|
||||
* effectively disable the limiter.
|
||||
* \param file_size Size of the input .xz file
|
||||
*
|
||||
* This decoder decodes the Stream Header, Stream Footer, Index, and
|
||||
* Stream Padding field(s) from the input .xz file and stores the resulting
|
||||
* combined index in *dest_index. This information can be used to get the
|
||||
* uncompressed file size with lzma_index_uncompressed_size(*dest_index) or,
|
||||
* for example, to implement random access reading by locating the Blocks
|
||||
* in the Streams.
|
||||
*
|
||||
* To get the required information from the .xz file, lzma_code() may ask
|
||||
* the application to seek in the input file by returning LZMA_SEEK_NEEDED
|
||||
* and having the target file position specified in lzma_stream.seek_pos.
|
||||
* The number of seeks required depends on the input file and how big buffers
|
||||
* the application provides. When possible, the decoder will seek backward
|
||||
* and forward in the given buffer to avoid useless seek requests. Thus, if
|
||||
* the application provides the whole file at once, no external seeking will
|
||||
* be required (that is, lzma_code() won't return LZMA_SEEK_NEEDED).
|
||||
*
|
||||
* The value in lzma_stream.total_in can be used to estimate how much data
|
||||
* liblzma had to read to get the file information. However, due to seeking
|
||||
* and the way total_in is updated, the value of total_in will be somewhat
|
||||
* inaccurate (a little too big). Thus, total_in is a good estimate but don't
|
||||
* expect to see the same exact value for the same file if you change the
|
||||
* input buffer size or switch to a different liblzma version.
|
||||
*
|
||||
* Valid `action' arguments to lzma_code() are LZMA_RUN and LZMA_FINISH.
|
||||
* You only need to use LZMA_RUN; LZMA_FINISH is only supported because it
|
||||
* might be convenient for some applications. If you use LZMA_FINISH and if
|
||||
* lzma_code() asks the application to seek, remember to reset `action' back
|
||||
* to LZMA_RUN unless you hit the end of the file again.
|
||||
*
|
||||
* Possible return values from lzma_code():
|
||||
* - LZMA_OK: All OK so far, more input needed
|
||||
* - LZMA_SEEK_NEEDED: Provide more input starting from the absolute
|
||||
* file position strm->seek_pos
|
||||
* - LZMA_STREAM_END: Decoding was successful, *dest_index has been set
|
||||
* - LZMA_FORMAT_ERROR: The input file is not in the .xz format (the
|
||||
* expected magic bytes were not found from the beginning of the file)
|
||||
* - LZMA_OPTIONS_ERROR: File looks valid but contains headers that aren't
|
||||
* supported by this version of liblzma
|
||||
* - LZMA_DATA_ERROR: File is corrupt
|
||||
* - LZMA_BUF_ERROR
|
||||
* - LZMA_MEM_ERROR
|
||||
* - LZMA_MEMLIMIT_ERROR
|
||||
* - LZMA_PROG_ERROR
|
||||
*
|
||||
* \return - LZMA_OK
|
||||
* - LZMA_MEM_ERROR
|
||||
* - LZMA_PROG_ERROR
|
||||
*/
|
||||
extern LZMA_API(lzma_ret) lzma_file_info_decoder(
|
||||
lzma_stream *strm, lzma_index **dest_index,
|
||||
uint64_t memlimit, uint64_t file_size)
|
||||
lzma_nothrow;
|
||||
|
|
|
@ -18,17 +18,40 @@
|
|||
|
||||
|
||||
/**
|
||||
* \brief LZMA1 Filter ID
|
||||
* \brief LZMA1 Filter ID (for raw encoder/decoder only, not in .xz)
|
||||
*
|
||||
* LZMA1 is the very same thing as what was called just LZMA in LZMA Utils,
|
||||
* 7-Zip, and LZMA SDK. It's called LZMA1 here to prevent developers from
|
||||
* accidentally using LZMA when they actually want LZMA2.
|
||||
*
|
||||
* LZMA1 shouldn't be used for new applications unless you _really_ know
|
||||
* what you are doing. LZMA2 is almost always a better choice.
|
||||
*/
|
||||
#define LZMA_FILTER_LZMA1 LZMA_VLI_C(0x4000000000000001)
|
||||
|
||||
/**
|
||||
* \brief LZMA1 Filter ID with extended options (for raw encoder/decoder)
|
||||
*
|
||||
* This is like LZMA_FILTER_LZMA1 but with this ID a few extra options
|
||||
* are supported in the lzma_options_lzma structure:
|
||||
*
|
||||
* - A flag to tell the encoder if the end of payload marker (EOPM) alias
|
||||
* end of stream (EOS) marker must be written at the end of the stream.
|
||||
* In contrast, LZMA_FILTER_LZMA1 always writes the end marker.
|
||||
*
|
||||
* - Decoder needs to be told the uncompressed size of the stream
|
||||
* or that it is unknown (using the special value UINT64_MAX).
|
||||
* If the size is known, a flag can be set to allow the presence of
|
||||
* the end marker anyway. In contrast, LZMA_FILTER_LZMA1 always
|
||||
* behaves as if the uncompressed size was unknown.
|
||||
*
|
||||
* This allows handling file formats where LZMA1 streams are used but where
|
||||
* the end marker isn't allowed or where it might not (always) be present.
|
||||
* This extended LZMA1 functionality is provided as a Filter ID for raw
|
||||
* encoder and decoder instead of adding new encoder and decoder initialization
|
||||
* functions because this way it is possible to also use extra filters,
|
||||
* for example, LZMA_FILTER_X86 in a filter chain with LZMA_FILTER_LZMA1EXT,
|
||||
* which might be needed to handle some file formats.
|
||||
*/
|
||||
#define LZMA_FILTER_LZMA1EXT LZMA_VLI_C(0x4000000000000002)
|
||||
|
||||
/**
|
||||
* \brief LZMA2 Filter ID
|
||||
*
|
||||
|
@ -374,6 +397,82 @@ typedef struct {
|
|||
*/
|
||||
uint32_t depth;
|
||||
|
||||
/**
|
||||
* \brief For LZMA_FILTER_LZMA1EXT: Extended flags
|
||||
*
|
||||
* This is used only with LZMA_FILTER_LZMA1EXT.
|
||||
*
|
||||
* Currently only one flag is supported, LZMA_LZMA1EXT_ALLOW_EOPM:
|
||||
*
|
||||
* - Encoder: If the flag is set, then end marker is written just
|
||||
* like it is with LZMA_FILTER_LZMA1. Without this flag the
|
||||
* end marker isn't written and the application has to store
|
||||
* the uncompressed size somewhere outside the compressed stream.
|
||||
* To decompress streams without the end marker, the appliation
|
||||
* has to set the correct uncompressed size in ext_size_low and
|
||||
* ext_size_high.
|
||||
*
|
||||
* - Decoder: If the uncompressed size in ext_size_low and
|
||||
* ext_size_high is set to the special value UINT64_MAX
|
||||
* (indicating unknown uncompressed size) then this flag is
|
||||
* ignored and the end marker must always be present, that is,
|
||||
* the behavior is identical to LZMA_FILTER_LZMA1.
|
||||
*
|
||||
* Otherwise, if this flag isn't set, then the input stream
|
||||
* must not have the end marker; if the end marker is detected
|
||||
* then it will result in LZMA_DATA_ERROR. This is useful when
|
||||
* it is known that the stream must not have the end marker and
|
||||
* strict validation is wanted.
|
||||
*
|
||||
* If this flag is set, then it is autodetected if the end marker
|
||||
* is present after the specified number of uncompressed bytes
|
||||
* has been decompressed (ext_size_low and ext_size_high). The
|
||||
* end marker isn't allowed in any other position. This behavior
|
||||
* is useful when uncompressed size is known but the end marker
|
||||
* may or may not be present. This is the case, for example,
|
||||
* in .7z files (valid .7z files that have the end marker in
|
||||
* LZMA1 streams are rare but they do exist).
|
||||
*/
|
||||
uint32_t ext_flags;
|
||||
# define LZMA_LZMA1EXT_ALLOW_EOPM UINT32_C(0x01)
|
||||
|
||||
/**
|
||||
* \brief For LZMA_FILTER_LZMA1EXT: Uncompressed size (low bits)
|
||||
*
|
||||
* The 64-bit uncompressed size is needed for decompression with
|
||||
* LZMA_FILTER_LZMA1EXT. The size is ignored by the encoder.
|
||||
*
|
||||
* The special value UINT64_MAX indicates that the uncompressed size
|
||||
* is unknown and that the end of payload marker (also known as
|
||||
* end of stream marker) must be present to indicate the end of
|
||||
* the LZMA1 stream. Any other value indicates the expected
|
||||
* uncompressed size of the LZMA1 stream. (If LZMA1 was used together
|
||||
* with filters that change the size of the data then the uncompressed
|
||||
* size of the LZMA1 stream could be different than the final
|
||||
* uncompressed size of the filtered stream.)
|
||||
*
|
||||
* ext_size_low holds the least significant 32 bits of the
|
||||
* uncompressed size. The most significant 32 bits must be set
|
||||
* in ext_size_high. The macro lzma_ext_size_set(opt_lzma, u64size)
|
||||
* can be used to set these members.
|
||||
*
|
||||
* The 64-bit uncompressed size is split into two uint32_t variables
|
||||
* because there were no reserved uint64_t members and using the
|
||||
* same options structure for LZMA_FILTER_LZMA1, LZMA_FILTER_LZMA1EXT,
|
||||
* and LZMA_FILTER_LZMA2 was otherwise more convenient than having
|
||||
* a new options structure for LZMA_FILTER_LZMA1EXT. (Replacing two
|
||||
* uint32_t members with one uint64_t changes the ABI on some systems
|
||||
* as the alignment of this struct can increase from 4 bytes to 8.)
|
||||
*/
|
||||
uint32_t ext_size_low;
|
||||
|
||||
/**
|
||||
* \brief For LZMA_FILTER_LZMA1EXT: Uncompressed size (high bits)
|
||||
*
|
||||
* This holds the most significant 32 bits of the uncompressed size.
|
||||
*/
|
||||
uint32_t ext_size_high;
|
||||
|
||||
/*
|
||||
* Reserved space to allow possible future extensions without
|
||||
* breaking the ABI. You should not touch these, because the names
|
||||
|
@ -381,9 +480,6 @@ typedef struct {
|
|||
* with the currently supported options, so it is safe to leave these
|
||||
* uninitialized.
|
||||
*/
|
||||
uint32_t reserved_int1;
|
||||
uint32_t reserved_int2;
|
||||
uint32_t reserved_int3;
|
||||
uint32_t reserved_int4;
|
||||
uint32_t reserved_int5;
|
||||
uint32_t reserved_int6;
|
||||
|
@ -399,6 +495,19 @@ typedef struct {
|
|||
} lzma_options_lzma;
|
||||
|
||||
|
||||
/**
|
||||
* \brief Macro to set the 64-bit uncompressed size in ext_size_*
|
||||
*
|
||||
* This might be convenient when decoding using LZMA_FILTER_LZMA1EXT.
|
||||
* This isn't used with LZMA_FILTER_LZMA1 or LZMA_FILTER_LZMA2.
|
||||
*/
|
||||
#define lzma_set_ext_size(opt_lzma2, u64size) \
|
||||
do { \
|
||||
(opt_lzma2).ext_size_low = (uint32_t)(u64size); \
|
||||
(opt_lzma2).ext_size_high = (uint32_t)((uint64_t)(u64size) >> 32); \
|
||||
} while (0)
|
||||
|
||||
|
||||
/**
|
||||
* \brief Set a compression preset to lzma_options_lzma structure
|
||||
*
|
||||
|
|
|
@ -21,8 +21,8 @@
|
|||
* Version number split into components
|
||||
*/
|
||||
#define LZMA_VERSION_MAJOR 5
|
||||
#define LZMA_VERSION_MINOR 2
|
||||
#define LZMA_VERSION_PATCH 9
|
||||
#define LZMA_VERSION_MINOR 4
|
||||
#define LZMA_VERSION_PATCH 0
|
||||
#define LZMA_VERSION_STABILITY LZMA_VERSION_STABILITY_STABLE
|
||||
|
||||
#ifndef LZMA_VERSION_COMMIT
|
||||
|
|
|
@ -16,6 +16,9 @@
|
|||
uint32_t lzma_crc32_table[1][256];
|
||||
|
||||
|
||||
#ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
|
||||
__attribute__((__constructor__))
|
||||
#endif
|
||||
static void
|
||||
crc32_init(void)
|
||||
{
|
||||
|
@ -37,18 +40,22 @@ crc32_init(void)
|
|||
}
|
||||
|
||||
|
||||
#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
|
||||
extern void
|
||||
lzma_crc32_init(void)
|
||||
{
|
||||
mythread_once(crc32_init);
|
||||
return;
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
extern LZMA_API(uint32_t)
|
||||
lzma_crc32(const uint8_t *buf, size_t size, uint32_t crc)
|
||||
{
|
||||
#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
|
||||
lzma_crc32_init();
|
||||
#endif
|
||||
|
||||
crc = ~crc;
|
||||
|
||||
|
|
|
@ -3,11 +3,25 @@
|
|||
/// \file crc64.c
|
||||
/// \brief CRC64 calculation
|
||||
///
|
||||
/// Calculate the CRC64 using the slice-by-four algorithm. This is the same
|
||||
/// idea that is used in crc32_fast.c, but for CRC64 we use only four tables
|
||||
/// There are two methods in this file. crc64_generic uses the
|
||||
/// the slice-by-four algorithm. This is the same idea that is
|
||||
/// used in crc32_fast.c, but for CRC64 we use only four tables
|
||||
/// instead of eight to avoid increasing CPU cache usage.
|
||||
///
|
||||
/// crc64_clmul uses 32/64-bit x86 SSSE3, SSE4.1, and CLMUL instructions.
|
||||
/// It was derived from
|
||||
/// https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
|
||||
/// and the public domain code from https://github.com/rawrunprotected/crc
|
||||
/// (URLs were checked on 2022-11-07).
|
||||
///
|
||||
/// FIXME: Builds for 32-bit x86 use crc64_x86.S by default instead
|
||||
/// of this file and thus CLMUL version isn't available on 32-bit x86
|
||||
/// unless configured with --disable-assembler. Even then the lookup table
|
||||
/// isn't omitted in crc64_table.c since it doesn't know that assembly
|
||||
/// code has been disabled.
|
||||
//
|
||||
// Author: Lasse Collin
|
||||
// Authors: Lasse Collin
|
||||
// Ilya Kurdyukov
|
||||
//
|
||||
// This file has been put into the public domain.
|
||||
// You can do whatever you want with this file.
|
||||
|
@ -15,6 +29,54 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#include "check.h"
|
||||
|
||||
#undef CRC_GENERIC
|
||||
#undef CRC_CLMUL
|
||||
#undef CRC_USE_GENERIC_FOR_SMALL_INPUTS
|
||||
|
||||
// If CLMUL cannot be used then only the generic slice-by-four is built.
|
||||
#if !defined(HAVE_USABLE_CLMUL)
|
||||
# define CRC_GENERIC 1
|
||||
|
||||
// If CLMUL is allowed unconditionally in the compiler options then the
|
||||
// generic version can be omitted. Note that this doesn't work with MSVC
|
||||
// as I don't know how to detect the features here.
|
||||
//
|
||||
// NOTE: Keep this this in sync with crc64_table.c.
|
||||
#elif (defined(__SSSE3__) && defined(__SSE4_1__) && defined(__PCLMUL__)) \
|
||||
|| (defined(__e2k__) && __iset__ >= 6)
|
||||
# define CRC_CLMUL 1
|
||||
|
||||
// Otherwise build both and detect at runtime which version to use.
|
||||
#else
|
||||
# define CRC_GENERIC 1
|
||||
# define CRC_CLMUL 1
|
||||
|
||||
/*
|
||||
// The generic code is much faster with 1-8-byte inputs and has
|
||||
// similar performance up to 16 bytes at least in microbenchmarks
|
||||
// (it depends on input buffer alignment too). If both versions are
|
||||
// built, this #define will use the generic version for inputs up to
|
||||
// 16 bytes and CLMUL for bigger inputs. It saves a little in code
|
||||
// size since the special cases for 0-16-byte inputs will be omitted
|
||||
// from the CLMUL code.
|
||||
# define CRC_USE_GENERIC_FOR_SMALL_INPUTS 1
|
||||
*/
|
||||
|
||||
# if defined(_MSC_VER)
|
||||
# include <intrin.h>
|
||||
# elif defined(HAVE_CPUID_H)
|
||||
# include <cpuid.h>
|
||||
# endif
|
||||
#endif
|
||||
|
||||
|
||||
/////////////////////////////////
|
||||
// Generic slice-by-four CRC64 //
|
||||
/////////////////////////////////
|
||||
|
||||
#ifdef CRC_GENERIC
|
||||
|
||||
#include "crc_macros.h"
|
||||
|
||||
|
||||
|
@ -26,8 +88,8 @@
|
|||
|
||||
|
||||
// See the comments in crc32_fast.c. They aren't duplicated here.
|
||||
extern LZMA_API(uint64_t)
|
||||
lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
|
||||
static uint64_t
|
||||
crc64_generic(const uint8_t *buf, size_t size, uint64_t crc)
|
||||
{
|
||||
crc = ~crc;
|
||||
|
||||
|
@ -46,10 +108,11 @@ lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
|
|||
|
||||
while (buf < limit) {
|
||||
#ifdef WORDS_BIGENDIAN
|
||||
const uint32_t tmp = (crc >> 32)
|
||||
const uint32_t tmp = (uint32_t)(crc >> 32)
|
||||
^ aligned_read32ne(buf);
|
||||
#else
|
||||
const uint32_t tmp = crc ^ aligned_read32ne(buf);
|
||||
const uint32_t tmp = (uint32_t)crc
|
||||
^ aligned_read32ne(buf);
|
||||
#endif
|
||||
buf += 4;
|
||||
|
||||
|
@ -70,3 +133,380 @@ lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
|
|||
|
||||
return ~crc;
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
/////////////////////
|
||||
// x86 CLMUL CRC64 //
|
||||
/////////////////////
|
||||
|
||||
#ifdef CRC_CLMUL
|
||||
|
||||
#include <immintrin.h>
|
||||
|
||||
|
||||
/*
|
||||
// These functions were used to generate the constants
|
||||
// at the top of crc64_clmul().
|
||||
static uint64_t
|
||||
calc_lo(uint64_t poly)
|
||||
{
|
||||
uint64_t a = poly;
|
||||
uint64_t b = 0;
|
||||
|
||||
for (unsigned i = 0; i < 64; ++i) {
|
||||
b = (b >> 1) | (a << 63);
|
||||
a = (a >> 1) ^ (a & 1 ? poly : 0);
|
||||
}
|
||||
|
||||
return b;
|
||||
}
|
||||
|
||||
static uint64_t
|
||||
calc_hi(uint64_t poly, uint64_t a)
|
||||
{
|
||||
for (unsigned i = 0; i < 64; ++i)
|
||||
a = (a >> 1) ^ (a & 1 ? poly : 0);
|
||||
|
||||
return a;
|
||||
}
|
||||
*/
|
||||
|
||||
|
||||
#define MASK_L(in, mask, r) \
|
||||
r = _mm_shuffle_epi8(in, mask)
|
||||
|
||||
#define MASK_H(in, mask, r) \
|
||||
r = _mm_shuffle_epi8(in, _mm_xor_si128(mask, vsign))
|
||||
|
||||
#define MASK_LH(in, mask, low, high) \
|
||||
MASK_L(in, mask, low); \
|
||||
MASK_H(in, mask, high)
|
||||
|
||||
|
||||
// EDG-based compilers (Intel's classic compiler and compiler for E2K) can
|
||||
// define __GNUC__ but the attribute must not be used with them.
|
||||
// The new Clang-based ICX needs the attribute.
|
||||
//
|
||||
// NOTE: Build systems check for this too, keep them in sync with this.
|
||||
#if (defined(__GNUC__) || defined(__clang__)) && !defined(__EDG__)
|
||||
__attribute__((__target__("ssse3,sse4.1,pclmul")))
|
||||
#endif
|
||||
static uint64_t
|
||||
crc64_clmul(const uint8_t *buf, size_t size, uint64_t crc)
|
||||
{
|
||||
// The prototypes of the intrinsics use signed types while most of
|
||||
// the values are treated as unsigned here. These warnings in this
|
||||
// function have been checked and found to be harmless so silence them.
|
||||
#if TUKLIB_GNUC_REQ(4, 6) || defined(__clang__)
|
||||
# pragma GCC diagnostic push
|
||||
# pragma GCC diagnostic ignored "-Wsign-conversion"
|
||||
# pragma GCC diagnostic ignored "-Wconversion"
|
||||
#endif
|
||||
|
||||
#ifndef CRC_USE_GENERIC_FOR_SMALL_INPUTS
|
||||
// The code assumes that there is at least one byte of input.
|
||||
if (size == 0)
|
||||
return crc;
|
||||
#endif
|
||||
|
||||
// const uint64_t poly = 0xc96c5795d7870f42; // CRC polynomial
|
||||
const uint64_t p = 0x92d8af2baf0e1e85; // (poly << 1) | 1
|
||||
const uint64_t mu = 0x9c3e466c172963d5; // (calc_lo(poly) << 1) | 1
|
||||
const uint64_t k2 = 0xdabe95afc7875f40; // calc_hi(poly, 1)
|
||||
const uint64_t k1 = 0xe05dd497ca393ae4; // calc_hi(poly, k2)
|
||||
const __m128i vfold0 = _mm_set_epi64x(p, mu);
|
||||
const __m128i vfold1 = _mm_set_epi64x(k2, k1);
|
||||
|
||||
// Create a vector with 8-bit values 0 to 15. This is used to
|
||||
// construct control masks for _mm_blendv_epi8 and _mm_shuffle_epi8.
|
||||
const __m128i vramp = _mm_setr_epi32(
|
||||
0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c);
|
||||
|
||||
// This is used to inverse the control mask of _mm_shuffle_epi8
|
||||
// so that bytes that wouldn't be picked with the original mask
|
||||
// will be picked and vice versa.
|
||||
const __m128i vsign = _mm_set1_epi8(0x80);
|
||||
|
||||
// Memory addresses A to D and the distances between them:
|
||||
//
|
||||
// A B C D
|
||||
// [skip_start][size][skip_end]
|
||||
// [ size2 ]
|
||||
//
|
||||
// A and D are 16-byte aligned. B and C are 1-byte aligned.
|
||||
// skip_start and skip_end are 0-15 bytes. size is at least 1 byte.
|
||||
//
|
||||
// A = aligned_buf will initially point to this address.
|
||||
// B = The address pointed by the caller-supplied buf.
|
||||
// C = buf + size == aligned_buf + size2
|
||||
// D = buf + size + skip_end == aligned_buf + size2 + skip_end
|
||||
const size_t skip_start = (size_t)((uintptr_t)buf & 15);
|
||||
const size_t skip_end = (size_t)(-(uintptr_t)(buf + size) & 15);
|
||||
const __m128i *aligned_buf = (const __m128i *)(
|
||||
(uintptr_t)buf & ~(uintptr_t)15);
|
||||
|
||||
// If size2 <= 16 then the whole input fits into a single 16-byte
|
||||
// vector. If size2 > 16 then at least two 16-byte vectors must
|
||||
// be processed. If size2 > 16 && size <= 16 then there is only
|
||||
// one 16-byte vector's worth of input but it is unaligned in memory.
|
||||
//
|
||||
// NOTE: There is no integer overflow here if the arguments are valid.
|
||||
// If this overflowed, buf + size would too.
|
||||
size_t size2 = skip_start + size;
|
||||
|
||||
// Masks to be used with _mm_blendv_epi8 and _mm_shuffle_epi8:
|
||||
// The first skip_start or skip_end bytes in the vectors will have
|
||||
// the high bit (0x80) set. _mm_blendv_epi8 and _mm_shuffle_epi8
|
||||
// will produce zeros for these positions. (Bitwise-xor of these
|
||||
// masks with vsign will produce the opposite behavior.)
|
||||
const __m128i mask_start
|
||||
= _mm_sub_epi8(vramp, _mm_set1_epi8(skip_start));
|
||||
const __m128i mask_end = _mm_sub_epi8(vramp, _mm_set1_epi8(skip_end));
|
||||
|
||||
// Get the first 1-16 bytes into data0. If loading less than 16 bytes,
|
||||
// the bytes are loaded to the high bits of the vector and the least
|
||||
// significant positions are filled with zeros.
|
||||
const __m128i data0 = _mm_blendv_epi8(_mm_load_si128(aligned_buf),
|
||||
_mm_setzero_si128(), mask_start);
|
||||
++aligned_buf;
|
||||
|
||||
#if defined(__i386__) || defined(_M_IX86)
|
||||
const __m128i initial_crc = _mm_set_epi64x(0, ~crc);
|
||||
#else
|
||||
// GCC and Clang would produce good code with _mm_set_epi64x
|
||||
// but MSVC needs _mm_cvtsi64_si128 on x86-64.
|
||||
const __m128i initial_crc = _mm_cvtsi64_si128(~crc);
|
||||
#endif
|
||||
|
||||
__m128i v0, v1, v2, v3;
|
||||
|
||||
#ifndef CRC_USE_GENERIC_FOR_SMALL_INPUTS
|
||||
if (size <= 16) {
|
||||
// Right-shift initial_crc by 1-16 bytes based on "size"
|
||||
// and store the result in v1 (high bytes) and v0 (low bytes).
|
||||
//
|
||||
// NOTE: The highest 8 bytes of initial_crc are zeros so
|
||||
// v1 will be filled with zeros if size >= 8. The highest 8
|
||||
// bytes of v1 will always become zeros.
|
||||
//
|
||||
// [ v1 ][ v0 ]
|
||||
// [ initial_crc ] size == 1
|
||||
// [ initial_crc ] size == 2
|
||||
// [ initial_crc ] size == 15
|
||||
// [ initial_crc ] size == 16 (all in v0)
|
||||
const __m128i mask_low = _mm_add_epi8(
|
||||
vramp, _mm_set1_epi8(size - 16));
|
||||
MASK_LH(initial_crc, mask_low, v0, v1);
|
||||
|
||||
if (size2 <= 16) {
|
||||
// There are 1-16 bytes of input and it is all
|
||||
// in data0. Copy the input bytes to v3. If there
|
||||
// are fewer than 16 bytes, the low bytes in v3
|
||||
// will be filled with zeros. That is, the input
|
||||
// bytes are stored to the same position as
|
||||
// (part of) initial_crc is in v0.
|
||||
MASK_L(data0, mask_end, v3);
|
||||
} else {
|
||||
// There are 2-16 bytes of input but not all bytes
|
||||
// are in data0.
|
||||
const __m128i data1 = _mm_load_si128(aligned_buf);
|
||||
|
||||
// Collect the 2-16 input bytes from data0 and data1
|
||||
// to v2 and v3, and bitwise-xor them with the
|
||||
// low bits of initial_crc in v0. Note that the
|
||||
// the second xor is below this else-block as it
|
||||
// is shared with the other branch.
|
||||
MASK_H(data0, mask_end, v2);
|
||||
MASK_L(data1, mask_end, v3);
|
||||
v0 = _mm_xor_si128(v0, v2);
|
||||
}
|
||||
|
||||
v0 = _mm_xor_si128(v0, v3);
|
||||
v1 = _mm_alignr_epi8(v1, v0, 8);
|
||||
} else
|
||||
#endif
|
||||
{
|
||||
const __m128i data1 = _mm_load_si128(aligned_buf);
|
||||
MASK_LH(initial_crc, mask_start, v0, v1);
|
||||
v0 = _mm_xor_si128(v0, data0);
|
||||
v1 = _mm_xor_si128(v1, data1);
|
||||
|
||||
#define FOLD \
|
||||
v1 = _mm_xor_si128(v1, _mm_clmulepi64_si128(v0, vfold1, 0x00)); \
|
||||
v0 = _mm_xor_si128(v1, _mm_clmulepi64_si128(v0, vfold1, 0x11));
|
||||
|
||||
while (size2 > 32) {
|
||||
++aligned_buf;
|
||||
size2 -= 16;
|
||||
FOLD
|
||||
v1 = _mm_load_si128(aligned_buf);
|
||||
}
|
||||
|
||||
if (size2 < 32) {
|
||||
MASK_H(v0, mask_end, v2);
|
||||
MASK_L(v0, mask_end, v0);
|
||||
MASK_L(v1, mask_end, v3);
|
||||
v1 = _mm_or_si128(v2, v3);
|
||||
}
|
||||
|
||||
FOLD
|
||||
v1 = _mm_srli_si128(v0, 8);
|
||||
#undef FOLD
|
||||
}
|
||||
|
||||
v1 = _mm_xor_si128(_mm_clmulepi64_si128(v0, vfold1, 0x10), v1);
|
||||
v0 = _mm_clmulepi64_si128(v1, vfold0, 0x00);
|
||||
v2 = _mm_clmulepi64_si128(v0, vfold0, 0x10);
|
||||
v0 = _mm_xor_si128(_mm_xor_si128(v2, _mm_slli_si128(v0, 8)), v1);
|
||||
|
||||
#if defined(__i386__) || defined(_M_IX86)
|
||||
return ~(((uint64_t)(uint32_t)_mm_extract_epi32(v0, 3) << 32) |
|
||||
(uint64_t)(uint32_t)_mm_extract_epi32(v0, 2));
|
||||
#else
|
||||
return ~(uint64_t)_mm_extract_epi64(v0, 1);
|
||||
#endif
|
||||
|
||||
#if TUKLIB_GNUC_REQ(4, 6) || defined(__clang__)
|
||||
# pragma GCC diagnostic pop
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
////////////////////////
|
||||
// Detect CPU support //
|
||||
////////////////////////
|
||||
|
||||
#if defined(CRC_GENERIC) && defined(CRC_CLMUL)
|
||||
static inline bool
|
||||
is_clmul_supported(void)
|
||||
{
|
||||
int success = 1;
|
||||
uint32_t r[4]; // eax, ebx, ecx, edx
|
||||
|
||||
#if defined(_MSC_VER)
|
||||
// This needs <intrin.h> with MSVC. ICC has it as a built-in
|
||||
// on all platforms.
|
||||
__cpuid(r, 1);
|
||||
#elif defined(HAVE_CPUID_H)
|
||||
// Compared to just using __asm__ to run CPUID, this also checks
|
||||
// that CPUID is supported and saves and restores ebx as that is
|
||||
// needed with GCC < 5 with position-independent code (PIC).
|
||||
success = __get_cpuid(1, &r[0], &r[1], &r[2], &r[3]);
|
||||
#else
|
||||
// Just a fallback that shouldn't be needed.
|
||||
__asm__("cpuid\n\t"
|
||||
: "=a"(r[0]), "=b"(r[1]), "=c"(r[2]), "=d"(r[3])
|
||||
: "a"(1), "c"(0));
|
||||
#endif
|
||||
|
||||
// Returns true if these are supported:
|
||||
// CLMUL (bit 1 in ecx)
|
||||
// SSSE3 (bit 9 in ecx)
|
||||
// SSE4.1 (bit 19 in ecx)
|
||||
const uint32_t ecx_mask = (1 << 1) | (1 << 9) | (1 << 19);
|
||||
return success && (r[2] & ecx_mask) == ecx_mask;
|
||||
|
||||
// Alternative methods that weren't used:
|
||||
// - ICC's _may_i_use_cpu_feature: the other methods should work too.
|
||||
// - GCC >= 6 / Clang / ICX __builtin_cpu_supports("pclmul")
|
||||
//
|
||||
// CPUID decding is needed with MSVC anyway and older GCC. This keeps
|
||||
// the feature checks in the build system simpler too. The nice thing
|
||||
// about __builtin_cpu_supports would be that it generates very short
|
||||
// code as is it only reads a variable set at startup but a few bytes
|
||||
// doesn't matter here.
|
||||
}
|
||||
|
||||
|
||||
#ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
|
||||
# define CRC64_FUNC_INIT
|
||||
# define CRC64_SET_FUNC_ATTR __attribute__((__constructor__))
|
||||
#else
|
||||
# define CRC64_FUNC_INIT = &crc64_dispatch
|
||||
# define CRC64_SET_FUNC_ATTR
|
||||
static uint64_t crc64_dispatch(const uint8_t *buf, size_t size, uint64_t crc);
|
||||
#endif
|
||||
|
||||
|
||||
// Pointer to the the selected CRC64 method.
|
||||
static uint64_t (*crc64_func)(const uint8_t *buf, size_t size, uint64_t crc)
|
||||
CRC64_FUNC_INIT;
|
||||
|
||||
|
||||
CRC64_SET_FUNC_ATTR
|
||||
static void
|
||||
crc64_set_func(void)
|
||||
{
|
||||
crc64_func = is_clmul_supported() ? &crc64_clmul : &crc64_generic;
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
|
||||
static uint64_t
|
||||
crc64_dispatch(const uint8_t *buf, size_t size, uint64_t crc)
|
||||
{
|
||||
// When __attribute__((__constructor__)) isn't supported, set the
|
||||
// function pointer without any locking. If multiple threads run
|
||||
// the detection code in parallel, they will all end up setting
|
||||
// the pointer to the same value. This avoids the use of
|
||||
// mythread_once() on every call to lzma_crc64() but this likely
|
||||
// isn't strictly standards compliant. Let's change it if it breaks.
|
||||
crc64_set_func();
|
||||
return crc64_func(buf, size, crc);
|
||||
}
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
extern LZMA_API(uint64_t)
|
||||
lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
|
||||
{
|
||||
#if defined(CRC_GENERIC) && defined(CRC_CLMUL)
|
||||
// If CLMUL is available, it is the best for non-tiny inputs,
|
||||
// being over twice as fast as the generic slice-by-four version.
|
||||
// However, for size <= 16 it's different. In the extreme case
|
||||
// of size == 1 the generic version can be five times faster.
|
||||
// At size >= 8 the CLMUL starts to become reasonable. It
|
||||
// varies depending on the alignment of buf too.
|
||||
//
|
||||
// The above doesn't include the overhead of mythread_once().
|
||||
// At least on x86-64 GNU/Linux, pthread_once() is very fast but
|
||||
// it still makes lzma_crc64(buf, 1, crc) 50-100 % slower. When
|
||||
// size reaches 12-16 bytes the overhead becomes negligible.
|
||||
//
|
||||
// So using the generic version for size <= 16 may give better
|
||||
// performance with tiny inputs but if such inputs happen rarely
|
||||
// it's not so obvious because then the lookup table of the
|
||||
// generic version may not be in the processor cache.
|
||||
#ifdef CRC_USE_GENERIC_FOR_SMALL_INPUTS
|
||||
if (size <= 16)
|
||||
return crc64_generic(buf, size, crc);
|
||||
#endif
|
||||
|
||||
/*
|
||||
#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
|
||||
// See crc64_dispatch(). This would be the alternative which uses
|
||||
// locking and doesn't use crc64_dispatch(). Note that on Windows
|
||||
// this method needs Vista threads.
|
||||
mythread_once(crc64_set_func);
|
||||
#endif
|
||||
*/
|
||||
|
||||
return crc64_func(buf, size, crc);
|
||||
|
||||
#elif defined(CRC_CLMUL)
|
||||
// If CLMUL is used unconditionally without runtime CPU detection
|
||||
// then omitting the generic version and its 8 KiB lookup table
|
||||
// makes the library smaller.
|
||||
//
|
||||
// FIXME: Lookup table isn't currently omitted on 32-bit x86,
|
||||
// see crc64_table.c.
|
||||
return crc64_clmul(buf, size, crc);
|
||||
|
||||
#else
|
||||
return crc64_generic(buf, size, crc);
|
||||
#endif
|
||||
}
|
||||
|
|
|
@ -16,6 +16,9 @@
|
|||
static uint64_t crc64_table[256];
|
||||
|
||||
|
||||
#ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
|
||||
__attribute__((__constructor__))
|
||||
#endif
|
||||
static void
|
||||
crc64_init(void)
|
||||
{
|
||||
|
@ -40,7 +43,9 @@ crc64_init(void)
|
|||
extern LZMA_API(uint64_t)
|
||||
lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
|
||||
{
|
||||
#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
|
||||
mythread_once(crc64_init);
|
||||
#endif
|
||||
|
||||
crc = ~crc;
|
||||
|
||||
|
|
|
@ -12,11 +12,24 @@
|
|||
|
||||
#include "common.h"
|
||||
|
||||
|
||||
// FIXME: Compared to crc64_fast.c this has to check for __x86_64__ too
|
||||
// so that in 32-bit builds crc64_x86.S won't break due to a missing table.
|
||||
#if (defined(__x86_64__) && defined(__SSSE3__) \
|
||||
&& defined(__SSE4_1__) && defined(__PCLMUL__)) \
|
||||
|| (defined(__e2k__) && __iset__ >= 6)
|
||||
// No table needed but something has to be exported to keep some toolchains
|
||||
// happy. Also use a declaration to silence compiler warnings.
|
||||
extern const char lzma_crc64_dummy;
|
||||
const char lzma_crc64_dummy;
|
||||
|
||||
#else
|
||||
// Having the declaration here silences clang -Wmissing-variable-declarations.
|
||||
extern const uint64_t lzma_crc64_table[4][256];
|
||||
|
||||
#ifdef WORDS_BIGENDIAN
|
||||
# include "crc64_table_be.h"
|
||||
#else
|
||||
# include "crc64_table_le.h"
|
||||
# if defined(WORDS_BIGENDIAN)
|
||||
# include "crc64_table_be.h"
|
||||
# else
|
||||
# include "crc64_table_le.h"
|
||||
# endif
|
||||
#endif
|
||||
|
|
|
@ -110,12 +110,24 @@ alone_decode(void *coder_ptr, const lzma_allocator *allocator,
|
|||
// Another hack to ditch false positives: Assume that
|
||||
// if the uncompressed size is known, it must be less
|
||||
// than 256 GiB.
|
||||
//
|
||||
// FIXME? Without picky we allow > LZMA_VLI_MAX which doesn't
|
||||
// really matter in this specific situation (> LZMA_VLI_MAX is
|
||||
// safe in the LZMA decoder) but it's somewhat weird still.
|
||||
if (coder->picky
|
||||
&& coder->uncompressed_size != LZMA_VLI_UNKNOWN
|
||||
&& coder->uncompressed_size
|
||||
>= (LZMA_VLI_C(1) << 38))
|
||||
return LZMA_FORMAT_ERROR;
|
||||
|
||||
// Use LZMA_FILTER_LZMA1EXT features to specify the
|
||||
// uncompressed size and that the end marker is allowed
|
||||
// even when the uncompressed size is known. Both .lzma
|
||||
// header and LZMA1EXT use UINT64_MAX indicate that size
|
||||
// is unknown.
|
||||
coder->options.ext_flags = LZMA_LZMA1EXT_ALLOW_EOPM;
|
||||
lzma_set_ext_size(coder->options, coder->uncompressed_size);
|
||||
|
||||
// Calculate the memory usage so that it is ready
|
||||
// for SEQ_CODER_INIT.
|
||||
coder->memusage = lzma_lzma_decoder_memusage(&coder->options)
|
||||
|
@ -132,6 +144,7 @@ alone_decode(void *coder_ptr, const lzma_allocator *allocator,
|
|||
|
||||
lzma_filter_info filters[2] = {
|
||||
{
|
||||
.id = LZMA_FILTER_LZMA1EXT,
|
||||
.init = &lzma_lzma_decoder_init,
|
||||
.options = &coder->options,
|
||||
}, {
|
||||
|
@ -139,14 +152,8 @@ alone_decode(void *coder_ptr, const lzma_allocator *allocator,
|
|||
}
|
||||
};
|
||||
|
||||
const lzma_ret ret = lzma_next_filter_init(&coder->next,
|
||||
allocator, filters);
|
||||
if (ret != LZMA_OK)
|
||||
return ret;
|
||||
|
||||
// Use a hack to set the uncompressed size.
|
||||
lzma_lz_decoder_uncompressed(coder->next.coder,
|
||||
coder->uncompressed_size, true);
|
||||
return_if_error(lzma_next_filter_init(&coder->next,
|
||||
allocator, filters));
|
||||
|
||||
coder->sequence = SEQ_CODE;
|
||||
break;
|
||||
|
|
|
@ -129,6 +129,7 @@ alone_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
// Initialize the LZMA encoder.
|
||||
const lzma_filter_info filters[2] = {
|
||||
{
|
||||
.id = LZMA_FILTER_LZMA1,
|
||||
.init = &lzma_lzma_encoder_init,
|
||||
.options = (void *)(options),
|
||||
}, {
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
/// \file auto_decoder.c
|
||||
/// \brief Autodetect between .xz Stream and .lzma (LZMA_Alone) formats
|
||||
/// \brief Autodetect between .xz, .lzma (LZMA_Alone), and .lz (lzip)
|
||||
//
|
||||
// Author: Lasse Collin
|
||||
//
|
||||
|
@ -12,10 +12,13 @@
|
|||
|
||||
#include "stream_decoder.h"
|
||||
#include "alone_decoder.h"
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
# include "lzip_decoder.h"
|
||||
#endif
|
||||
|
||||
|
||||
typedef struct {
|
||||
/// Stream decoder or LZMA_Alone decoder
|
||||
/// .xz Stream decoder, LZMA_Alone decoder, or lzip decoder
|
||||
lzma_next_coder next;
|
||||
|
||||
uint64_t memlimit;
|
||||
|
@ -46,14 +49,22 @@ auto_decode(void *coder_ptr, const lzma_allocator *allocator,
|
|||
// SEQ_CODE even if we return some LZMA_*_CHECK.
|
||||
coder->sequence = SEQ_CODE;
|
||||
|
||||
// Detect the file format. For now this is simple, since if
|
||||
// it doesn't start with 0xFD (the first magic byte of the
|
||||
// new format), it has to be LZMA_Alone, or something that
|
||||
// we don't support at all.
|
||||
// Detect the file format. .xz files start with 0xFD which
|
||||
// cannot be the first byte of .lzma (LZMA_Alone) format.
|
||||
// The .lz format starts with 0x4C which could be the
|
||||
// first byte of a .lzma file but luckily it would mean
|
||||
// lc/lp/pb being 4/3/1 which liblzma doesn't support because
|
||||
// lc + lp > 4. So using just 0x4C to detect .lz is OK here.
|
||||
if (in[*in_pos] == 0xFD) {
|
||||
return_if_error(lzma_stream_decoder_init(
|
||||
&coder->next, allocator,
|
||||
coder->memlimit, coder->flags));
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
} else if (in[*in_pos] == 0x4C) {
|
||||
return_if_error(lzma_lzip_decoder_init(
|
||||
&coder->next, allocator,
|
||||
coder->memlimit, coder->flags));
|
||||
#endif
|
||||
} else {
|
||||
return_if_error(lzma_alone_decoder_init(&coder->next,
|
||||
allocator, coder->memlimit, true));
|
||||
|
|
|
@ -14,22 +14,6 @@
|
|||
#include "check.h"
|
||||
|
||||
|
||||
static void
|
||||
free_properties(lzma_block *block, const lzma_allocator *allocator)
|
||||
{
|
||||
// Free allocated filter options. The last array member is not
|
||||
// touched after the initialization in the beginning of
|
||||
// lzma_block_header_decode(), so we don't need to touch that here.
|
||||
for (size_t i = 0; i < LZMA_FILTERS_MAX; ++i) {
|
||||
lzma_free(block->filters[i].options, allocator);
|
||||
block->filters[i].id = LZMA_VLI_UNKNOWN;
|
||||
block->filters[i].options = NULL;
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
extern LZMA_API(lzma_ret)
|
||||
lzma_block_header_decode(lzma_block *block,
|
||||
const lzma_allocator *allocator, const uint8_t *in)
|
||||
|
@ -39,6 +23,10 @@ lzma_block_header_decode(lzma_block *block,
|
|||
// are invalid or over 63 bits, or if the header is too small
|
||||
// to contain the claimed information.
|
||||
|
||||
// Catch unexpected NULL pointers.
|
||||
if (block == NULL || block->filters == NULL || in == NULL)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
// Initialize the filter options array. This way the caller can
|
||||
// safely free() the options even if an error occurs in this function.
|
||||
for (size_t i = 0; i <= LZMA_FILTERS_MAX; ++i) {
|
||||
|
@ -67,8 +55,11 @@ lzma_block_header_decode(lzma_block *block,
|
|||
const size_t in_size = block->header_size - 4;
|
||||
|
||||
// Verify CRC32
|
||||
if (lzma_crc32(in, in_size, 0) != read32le(in + in_size))
|
||||
if (lzma_crc32(in, in_size, 0) != read32le(in + in_size)) {
|
||||
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
|
||||
return LZMA_DATA_ERROR;
|
||||
#endif
|
||||
}
|
||||
|
||||
// Check for unsupported flags.
|
||||
if (in[1] & 0x3C)
|
||||
|
@ -104,7 +95,7 @@ lzma_block_header_decode(lzma_block *block,
|
|||
&block->filters[i], allocator,
|
||||
in, &in_pos, in_size);
|
||||
if (ret != LZMA_OK) {
|
||||
free_properties(block, allocator);
|
||||
lzma_filters_free(block->filters, allocator);
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
|
@ -112,7 +103,7 @@ lzma_block_header_decode(lzma_block *block,
|
|||
// Padding
|
||||
while (in_pos < in_size) {
|
||||
if (in[in_pos++] != 0x00) {
|
||||
free_properties(block, allocator);
|
||||
lzma_filters_free(block->filters, allocator);
|
||||
|
||||
// Possibly some new field present so use
|
||||
// LZMA_OPTIONS_ERROR instead of LZMA_DATA_ERROR.
|
||||
|
|
|
@ -211,7 +211,6 @@ lzma_code(lzma_stream *strm, lzma_action action)
|
|||
|| strm->reserved_ptr2 != NULL
|
||||
|| strm->reserved_ptr3 != NULL
|
||||
|| strm->reserved_ptr4 != NULL
|
||||
|| strm->reserved_int1 != 0
|
||||
|| strm->reserved_int2 != 0
|
||||
|| strm->reserved_int3 != 0
|
||||
|| strm->reserved_int4 != 0
|
||||
|
@ -299,9 +298,7 @@ lzma_code(lzma_stream *strm, lzma_action action)
|
|||
|
||||
strm->internal->avail_in = strm->avail_in;
|
||||
|
||||
// Cast is needed to silence a warning about LZMA_TIMED_OUT, which
|
||||
// isn't part of lzma_ret enumeration.
|
||||
switch ((unsigned int)(ret)) {
|
||||
switch (ret) {
|
||||
case LZMA_OK:
|
||||
// Don't return LZMA_BUF_ERROR when it happens the first time.
|
||||
// This is to avoid returning LZMA_BUF_ERROR when avail_out
|
||||
|
@ -322,6 +319,17 @@ lzma_code(lzma_stream *strm, lzma_action action)
|
|||
ret = LZMA_OK;
|
||||
break;
|
||||
|
||||
case LZMA_SEEK_NEEDED:
|
||||
strm->internal->allow_buf_error = false;
|
||||
|
||||
// If LZMA_FINISH was used, reset it back to the
|
||||
// LZMA_RUN-based state so that new input can be supplied
|
||||
// by the application.
|
||||
if (strm->internal->sequence == ISEQ_FINISH)
|
||||
strm->internal->sequence = ISEQ_RUN;
|
||||
|
||||
break;
|
||||
|
||||
case LZMA_STREAM_END:
|
||||
if (strm->internal->sequence == ISEQ_SYNC_FLUSH
|
||||
|| strm->internal->sequence == ISEQ_FULL_FLUSH
|
||||
|
|
|
@ -34,6 +34,14 @@
|
|||
|
||||
#include "lzma.h"
|
||||
|
||||
// This is for detecting modern GCC and Clang attributes
|
||||
// like __symver__ in GCC >= 10.
|
||||
#ifdef __has_attribute
|
||||
# define lzma_has_attribute(attr) __has_attribute(attr)
|
||||
#else
|
||||
# define lzma_has_attribute(attr) 0
|
||||
#endif
|
||||
|
||||
// The extra symbol versioning in the C files may only be used when
|
||||
// building a shared library. If HAVE_SYMBOL_VERSIONS_LINUX is defined
|
||||
// to 2 then symbol versioning is done only if also PIC is defined.
|
||||
|
@ -63,7 +71,12 @@
|
|||
// since 2000). When using @@ instead of @@@, the internal name must not be
|
||||
// the same as the external name to avoid problems in some situations. This
|
||||
// is why "#define foo_52 foo" is needed for the default symbol versions.
|
||||
# if TUKLIB_GNUC_REQ(10, 0) && !defined(__INTEL_COMPILER)
|
||||
//
|
||||
// __has_attribute is supported before GCC 10 and it is supported in Clang 14
|
||||
// too (which doesn't support __symver__) so use it to detect if __symver__
|
||||
// is available. This should be far more reliable than looking at compiler
|
||||
// version macros as nowadays especially __GNUC__ is defined by many compilers.
|
||||
# if lzma_has_attribute(__symver__)
|
||||
# define LZMA_SYMVER_API(extnamever, type, intname) \
|
||||
extern __attribute__((__symver__(extnamever))) \
|
||||
LZMA_API(type) intname
|
||||
|
@ -107,14 +120,15 @@
|
|||
#define LZMA_FILTER_RESERVED_START (LZMA_VLI_C(1) << 62)
|
||||
|
||||
|
||||
/// Supported flags that can be passed to lzma_stream_decoder()
|
||||
/// or lzma_auto_decoder().
|
||||
/// Supported flags that can be passed to lzma_stream_decoder(),
|
||||
/// lzma_auto_decoder(), or lzma_stream_decoder_mt().
|
||||
#define LZMA_SUPPORTED_FLAGS \
|
||||
( LZMA_TELL_NO_CHECK \
|
||||
| LZMA_TELL_UNSUPPORTED_CHECK \
|
||||
| LZMA_TELL_ANY_CHECK \
|
||||
| LZMA_IGNORE_CHECK \
|
||||
| LZMA_CONCATENATED )
|
||||
| LZMA_CONCATENATED \
|
||||
| LZMA_FAIL_FAST )
|
||||
|
||||
|
||||
/// Largest valid lzma_action value as unsigned integer.
|
||||
|
@ -123,9 +137,12 @@
|
|||
|
||||
/// Special return value (lzma_ret) to indicate that a timeout was reached
|
||||
/// and lzma_code() must not return LZMA_BUF_ERROR. This is converted to
|
||||
/// LZMA_OK in lzma_code(). This is not in the lzma_ret enumeration because
|
||||
/// there's no need to have it in the public API.
|
||||
#define LZMA_TIMED_OUT 32
|
||||
/// LZMA_OK in lzma_code().
|
||||
#define LZMA_TIMED_OUT LZMA_RET_INTERNAL1
|
||||
|
||||
/// Special return value (lzma_ret) for use in stream_decoder_mt.c to
|
||||
/// indicate Index was detected instead of a Block Header.
|
||||
#define LZMA_INDEX_DETECTED LZMA_RET_INTERNAL2
|
||||
|
||||
|
||||
typedef struct lzma_next_coder_s lzma_next_coder;
|
||||
|
@ -158,8 +175,11 @@ typedef void (*lzma_end_function)(
|
|||
/// an array of lzma_filter_info structures. This array is used with
|
||||
/// lzma_next_filter_init to initialize the filter chain.
|
||||
struct lzma_filter_info_s {
|
||||
/// Filter ID. This is used only by the encoder
|
||||
/// with lzma_filters_update().
|
||||
/// Filter ID. This can be used to share the same initiazation
|
||||
/// function *and* data structures with different Filter IDs
|
||||
/// (LZMA_FILTER_LZMA1EXT does it), and also by the encoder
|
||||
/// with lzma_filters_update() if filter chain is updated
|
||||
/// in the middle of a raw stream or Block (LZMA_SYNC_FLUSH).
|
||||
lzma_vli id;
|
||||
|
||||
/// Pointer to function used to initialize the filter.
|
||||
|
@ -213,6 +233,16 @@ struct lzma_next_coder_s {
|
|||
lzma_ret (*update)(void *coder, const lzma_allocator *allocator,
|
||||
const lzma_filter *filters,
|
||||
const lzma_filter *reversed_filters);
|
||||
|
||||
/// Set how many bytes of output this coder may produce at maximum.
|
||||
/// On success LZMA_OK must be returned.
|
||||
/// If the filter chain as a whole cannot support this feature,
|
||||
/// this must return LZMA_OPTIONS_ERROR.
|
||||
/// If no input has been given to the coder and the requested limit
|
||||
/// is too small, this must return LZMA_BUF_ERROR. If input has been
|
||||
/// seen, LZMA_OK is allowed too.
|
||||
lzma_ret (*set_out_limit)(void *coder, uint64_t *uncomp_size,
|
||||
uint64_t out_limit);
|
||||
};
|
||||
|
||||
|
||||
|
@ -228,6 +258,7 @@ struct lzma_next_coder_s {
|
|||
.get_check = NULL, \
|
||||
.memconfig = NULL, \
|
||||
.update = NULL, \
|
||||
.set_out_limit = NULL, \
|
||||
}
|
||||
|
||||
|
||||
|
|
855
src/liblzma/common/file_info.c
Normal file
855
src/liblzma/common/file_info.c
Normal file
|
@ -0,0 +1,855 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
/// \file file_info.c
|
||||
/// \brief Decode .xz file information into a lzma_index structure
|
||||
//
|
||||
// Author: Lasse Collin
|
||||
//
|
||||
// This file has been put into the public domain.
|
||||
// You can do whatever you want with this file.
|
||||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#include "index_decoder.h"
|
||||
|
||||
|
||||
typedef struct {
|
||||
enum {
|
||||
SEQ_MAGIC_BYTES,
|
||||
SEQ_PADDING_SEEK,
|
||||
SEQ_PADDING_DECODE,
|
||||
SEQ_FOOTER,
|
||||
SEQ_INDEX_INIT,
|
||||
SEQ_INDEX_DECODE,
|
||||
SEQ_HEADER_DECODE,
|
||||
SEQ_HEADER_COMPARE,
|
||||
} sequence;
|
||||
|
||||
/// Absolute position of in[*in_pos] in the file. All code that
|
||||
/// modifies *in_pos also updates this. seek_to_pos() needs this
|
||||
/// to determine if we need to request the application to seek for
|
||||
/// us or if we can do the seeking internally by adjusting *in_pos.
|
||||
uint64_t file_cur_pos;
|
||||
|
||||
/// This refers to absolute positions of interesting parts of the
|
||||
/// input file. Sometimes it points to the *beginning* of a specific
|
||||
/// field and sometimes to the *end* of a field. The current target
|
||||
/// position at each moment is explained in the comments.
|
||||
uint64_t file_target_pos;
|
||||
|
||||
/// Size of the .xz file (from the application).
|
||||
uint64_t file_size;
|
||||
|
||||
/// Index decoder
|
||||
lzma_next_coder index_decoder;
|
||||
|
||||
/// Number of bytes remaining in the Index field that is currently
|
||||
/// being decoded.
|
||||
lzma_vli index_remaining;
|
||||
|
||||
/// The Index decoder will store the decoded Index in this pointer.
|
||||
lzma_index *this_index;
|
||||
|
||||
/// Amount of Stream Padding in the current Stream.
|
||||
lzma_vli stream_padding;
|
||||
|
||||
/// The final combined index is collected here.
|
||||
lzma_index *combined_index;
|
||||
|
||||
/// Pointer from the application where to store the index information
|
||||
/// after successful decoding.
|
||||
lzma_index **dest_index;
|
||||
|
||||
/// Pointer to lzma_stream.seek_pos to be used when returning
|
||||
/// LZMA_SEEK_NEEDED. This is set by seek_to_pos() when needed.
|
||||
uint64_t *external_seek_pos;
|
||||
|
||||
/// Memory usage limit
|
||||
uint64_t memlimit;
|
||||
|
||||
/// Stream Flags from the very beginning of the file.
|
||||
lzma_stream_flags first_header_flags;
|
||||
|
||||
/// Stream Flags from Stream Header of the current Stream.
|
||||
lzma_stream_flags header_flags;
|
||||
|
||||
/// Stream Flags from Stream Footer of the current Stream.
|
||||
lzma_stream_flags footer_flags;
|
||||
|
||||
size_t temp_pos;
|
||||
size_t temp_size;
|
||||
uint8_t temp[8192];
|
||||
|
||||
} lzma_file_info_coder;
|
||||
|
||||
|
||||
/// Copies data from in[*in_pos] into coder->temp until
|
||||
/// coder->temp_pos == coder->temp_size. This also keeps coder->file_cur_pos
|
||||
/// in sync with *in_pos. Returns true if more input is needed.
|
||||
static bool
|
||||
fill_temp(lzma_file_info_coder *coder, const uint8_t *restrict in,
|
||||
size_t *restrict in_pos, size_t in_size)
|
||||
{
|
||||
coder->file_cur_pos += lzma_bufcpy(in, in_pos, in_size,
|
||||
coder->temp, &coder->temp_pos, coder->temp_size);
|
||||
return coder->temp_pos < coder->temp_size;
|
||||
}
|
||||
|
||||
|
||||
/// Seeks to the absolute file position specified by target_pos.
|
||||
/// This tries to do the seeking by only modifying *in_pos, if possible.
|
||||
/// The main benefit of this is that if one passes the whole file at once
|
||||
/// to lzma_code(), the decoder will never need to return LZMA_SEEK_NEEDED
|
||||
/// as all the seeking can be done by adjusting *in_pos in this function.
|
||||
///
|
||||
/// Returns true if an external seek is needed and the caller must return
|
||||
/// LZMA_SEEK_NEEDED.
|
||||
static bool
|
||||
seek_to_pos(lzma_file_info_coder *coder, uint64_t target_pos,
|
||||
size_t in_start, size_t *in_pos, size_t in_size)
|
||||
{
|
||||
// The input buffer doesn't extend beyond the end of the file.
|
||||
// This has been checked by file_info_decode() already.
|
||||
assert(coder->file_size - coder->file_cur_pos >= in_size - *in_pos);
|
||||
|
||||
const uint64_t pos_min = coder->file_cur_pos - (*in_pos - in_start);
|
||||
const uint64_t pos_max = coder->file_cur_pos + (in_size - *in_pos);
|
||||
|
||||
bool external_seek_needed;
|
||||
|
||||
if (target_pos >= pos_min && target_pos <= pos_max) {
|
||||
// The requested position is available in the current input
|
||||
// buffer or right after it. That is, in a corner case we
|
||||
// end up setting *in_pos == in_size and thus will immediately
|
||||
// need new input bytes from the application.
|
||||
*in_pos += (size_t)(target_pos - coder->file_cur_pos);
|
||||
external_seek_needed = false;
|
||||
} else {
|
||||
// Ask the application to seek the input file.
|
||||
*coder->external_seek_pos = target_pos;
|
||||
external_seek_needed = true;
|
||||
|
||||
// Mark the whole input buffer as used. This way
|
||||
// lzma_stream.total_in will have a better estimate
|
||||
// of the amount of data read. It still won't be perfect
|
||||
// as the value will depend on the input buffer size that
|
||||
// the application uses, but it should be good enough for
|
||||
// those few who want an estimate.
|
||||
*in_pos = in_size;
|
||||
}
|
||||
|
||||
// After seeking (internal or external) the current position
|
||||
// will match the requested target position.
|
||||
coder->file_cur_pos = target_pos;
|
||||
|
||||
return external_seek_needed;
|
||||
}
|
||||
|
||||
|
||||
/// The caller sets coder->file_target_pos so that it points to the *end*
|
||||
/// of the desired file position. This function then determines how far
|
||||
/// backwards from that position we can seek. After seeking fill_temp()
|
||||
/// can be used to read data into coder->temp. When fill_temp() has finished,
|
||||
/// coder->temp[coder->temp_size] will match coder->file_target_pos.
|
||||
///
|
||||
/// This also validates that coder->target_file_pos is sane in sense that
|
||||
/// we aren't trying to seek too far backwards (too close or beyond the
|
||||
/// beginning of the file).
|
||||
static lzma_ret
|
||||
reverse_seek(lzma_file_info_coder *coder,
|
||||
size_t in_start, size_t *in_pos, size_t in_size)
|
||||
{
|
||||
// Check that there is enough data before the target position
|
||||
// to contain at least Stream Header and Stream Footer. If there
|
||||
// isn't, the file cannot be valid.
|
||||
if (coder->file_target_pos < 2 * LZMA_STREAM_HEADER_SIZE)
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
coder->temp_pos = 0;
|
||||
|
||||
// The Stream Header at the very beginning of the file gets handled
|
||||
// specially in SEQ_MAGIC_BYTES and thus we will never need to seek
|
||||
// there. By not seeking to the first LZMA_STREAM_HEADER_SIZE bytes
|
||||
// we avoid a useless external seek after SEQ_MAGIC_BYTES if the
|
||||
// application uses an extremely small input buffer and the input
|
||||
// file is very small.
|
||||
if (coder->file_target_pos - LZMA_STREAM_HEADER_SIZE
|
||||
< sizeof(coder->temp))
|
||||
coder->temp_size = (size_t)(coder->file_target_pos
|
||||
- LZMA_STREAM_HEADER_SIZE);
|
||||
else
|
||||
coder->temp_size = sizeof(coder->temp);
|
||||
|
||||
// The above if-statements guarantee this. This is important because
|
||||
// the Stream Header/Footer decoders assume that there's at least
|
||||
// LZMA_STREAM_HEADER_SIZE bytes in coder->temp.
|
||||
assert(coder->temp_size >= LZMA_STREAM_HEADER_SIZE);
|
||||
|
||||
if (seek_to_pos(coder, coder->file_target_pos - coder->temp_size,
|
||||
in_start, in_pos, in_size))
|
||||
return LZMA_SEEK_NEEDED;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
/// Gets the number of zero-bytes at the end of the buffer.
|
||||
static size_t
|
||||
get_padding_size(const uint8_t *buf, size_t buf_size)
|
||||
{
|
||||
size_t padding = 0;
|
||||
while (buf_size > 0 && buf[--buf_size] == 0x00)
|
||||
++padding;
|
||||
|
||||
return padding;
|
||||
}
|
||||
|
||||
|
||||
/// With the Stream Header at the very beginning of the file, LZMA_FORMAT_ERROR
|
||||
/// is used to tell the application that Magic Bytes didn't match. In other
|
||||
/// Stream Header/Footer fields (in the middle/end of the file) it could be
|
||||
/// a bit confusing to return LZMA_FORMAT_ERROR as we already know that there
|
||||
/// is a valid Stream Header at the beginning of the file. For those cases
|
||||
/// this function is used to convert LZMA_FORMAT_ERROR to LZMA_DATA_ERROR.
|
||||
static lzma_ret
|
||||
hide_format_error(lzma_ret ret)
|
||||
{
|
||||
if (ret == LZMA_FORMAT_ERROR)
|
||||
ret = LZMA_DATA_ERROR;
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
||||
/// Calls the Index decoder and updates coder->index_remaining.
|
||||
/// This is a separate function because the input can be either directly
|
||||
/// from the application or from coder->temp.
|
||||
static lzma_ret
|
||||
decode_index(lzma_file_info_coder *coder, const lzma_allocator *allocator,
|
||||
const uint8_t *restrict in, size_t *restrict in_pos,
|
||||
size_t in_size, bool update_file_cur_pos)
|
||||
{
|
||||
const size_t in_start = *in_pos;
|
||||
|
||||
const lzma_ret ret = coder->index_decoder.code(
|
||||
coder->index_decoder.coder,
|
||||
allocator, in, in_pos, in_size,
|
||||
NULL, NULL, 0, LZMA_RUN);
|
||||
|
||||
coder->index_remaining -= *in_pos - in_start;
|
||||
|
||||
if (update_file_cur_pos)
|
||||
coder->file_cur_pos += *in_pos - in_start;
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
file_info_decode(void *coder_ptr, const lzma_allocator *allocator,
|
||||
const uint8_t *restrict in, size_t *restrict in_pos,
|
||||
size_t in_size,
|
||||
uint8_t *restrict out lzma_attribute((__unused__)),
|
||||
size_t *restrict out_pos lzma_attribute((__unused__)),
|
||||
size_t out_size lzma_attribute((__unused__)),
|
||||
lzma_action action lzma_attribute((__unused__)))
|
||||
{
|
||||
lzma_file_info_coder *coder = coder_ptr;
|
||||
const size_t in_start = *in_pos;
|
||||
|
||||
// If the caller provides input past the end of the file, trim
|
||||
// the extra bytes from the buffer so that we won't read too far.
|
||||
assert(coder->file_size >= coder->file_cur_pos);
|
||||
if (coder->file_size - coder->file_cur_pos < in_size - in_start)
|
||||
in_size = in_start
|
||||
+ (size_t)(coder->file_size - coder->file_cur_pos);
|
||||
|
||||
while (true)
|
||||
switch (coder->sequence) {
|
||||
case SEQ_MAGIC_BYTES:
|
||||
// Decode the Stream Header at the beginning of the file
|
||||
// first to check if the Magic Bytes match. The flags
|
||||
// are stored in coder->first_header_flags so that we
|
||||
// don't need to seek to it again.
|
||||
//
|
||||
// Check that the file is big enough to contain at least
|
||||
// Stream Header.
|
||||
if (coder->file_size < LZMA_STREAM_HEADER_SIZE)
|
||||
return LZMA_FORMAT_ERROR;
|
||||
|
||||
// Read the Stream Header field into coder->temp.
|
||||
if (fill_temp(coder, in, in_pos, in_size))
|
||||
return LZMA_OK;
|
||||
|
||||
// This is the only Stream Header/Footer decoding where we
|
||||
// want to return LZMA_FORMAT_ERROR if the Magic Bytes don't
|
||||
// match. Elsewhere it will be converted to LZMA_DATA_ERROR.
|
||||
return_if_error(lzma_stream_header_decode(
|
||||
&coder->first_header_flags, coder->temp));
|
||||
|
||||
// Now that we know that the Magic Bytes match, check the
|
||||
// file size. It's better to do this here after checking the
|
||||
// Magic Bytes since this way we can give LZMA_FORMAT_ERROR
|
||||
// instead of LZMA_DATA_ERROR when the Magic Bytes don't
|
||||
// match in a file that is too big or isn't a multiple of
|
||||
// four bytes.
|
||||
if (coder->file_size > LZMA_VLI_MAX || (coder->file_size & 3))
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
// Start looking for Stream Padding and Stream Footer
|
||||
// at the end of the file.
|
||||
coder->file_target_pos = coder->file_size;
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_PADDING_SEEK:
|
||||
coder->sequence = SEQ_PADDING_DECODE;
|
||||
return_if_error(reverse_seek(
|
||||
coder, in_start, in_pos, in_size));
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_PADDING_DECODE: {
|
||||
// Copy to coder->temp first. This keeps the code simpler if
|
||||
// the application only provides input a few bytes at a time.
|
||||
if (fill_temp(coder, in, in_pos, in_size))
|
||||
return LZMA_OK;
|
||||
|
||||
// Scan the buffer backwards to get the size of the
|
||||
// Stream Padding field (if any).
|
||||
const size_t new_padding = get_padding_size(
|
||||
coder->temp, coder->temp_size);
|
||||
coder->stream_padding += new_padding;
|
||||
|
||||
// Set the target position to the beginning of Stream Padding
|
||||
// that has been observed so far. If all Stream Padding has
|
||||
// been seen, then the target position will be at the end
|
||||
// of the Stream Footer field.
|
||||
coder->file_target_pos -= new_padding;
|
||||
|
||||
if (new_padding == coder->temp_size) {
|
||||
// The whole buffer was padding. Seek backwards in
|
||||
// the file to get more input.
|
||||
coder->sequence = SEQ_PADDING_SEEK;
|
||||
break;
|
||||
}
|
||||
|
||||
// Size of Stream Padding must be a multiple of 4 bytes.
|
||||
if (coder->stream_padding & 3)
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
coder->sequence = SEQ_FOOTER;
|
||||
|
||||
// Calculate the amount of non-padding data in coder->temp.
|
||||
coder->temp_size -= new_padding;
|
||||
coder->temp_pos = coder->temp_size;
|
||||
|
||||
// We can avoid an external seek if the whole Stream Footer
|
||||
// is already in coder->temp. In that case SEQ_FOOTER won't
|
||||
// read more input and will find the Stream Footer from
|
||||
// coder->temp[coder->temp_size - LZMA_STREAM_HEADER_SIZE].
|
||||
//
|
||||
// Otherwise we will need to seek. The seeking is done so
|
||||
// that Stream Footer wil be at the end of coder->temp.
|
||||
// This way it's likely that we also get a complete Index
|
||||
// field into coder->temp without needing a separate seek
|
||||
// for that (unless the Index field is big).
|
||||
if (coder->temp_size < LZMA_STREAM_HEADER_SIZE)
|
||||
return_if_error(reverse_seek(
|
||||
coder, in_start, in_pos, in_size));
|
||||
}
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_FOOTER:
|
||||
// Copy the Stream Footer field into coder->temp.
|
||||
// If Stream Footer was already available in coder->temp
|
||||
// in SEQ_PADDING_DECODE, then this does nothing.
|
||||
if (fill_temp(coder, in, in_pos, in_size))
|
||||
return LZMA_OK;
|
||||
|
||||
// Make coder->file_target_pos and coder->temp_size point
|
||||
// to the beginning of Stream Footer and thus to the end
|
||||
// of the Index field. coder->temp_pos will be updated
|
||||
// a bit later.
|
||||
coder->file_target_pos -= LZMA_STREAM_HEADER_SIZE;
|
||||
coder->temp_size -= LZMA_STREAM_HEADER_SIZE;
|
||||
|
||||
// Decode Stream Footer.
|
||||
return_if_error(hide_format_error(lzma_stream_footer_decode(
|
||||
&coder->footer_flags,
|
||||
coder->temp + coder->temp_size)));
|
||||
|
||||
// Check that we won't seek past the beginning of the file.
|
||||
//
|
||||
// LZMA_STREAM_HEADER_SIZE is added because there must be
|
||||
// space for Stream Header too even though we won't seek
|
||||
// there before decoding the Index field.
|
||||
//
|
||||
// There's no risk of integer overflow here because
|
||||
// Backward Size cannot be greater than 2^34.
|
||||
if (coder->file_target_pos < coder->footer_flags.backward_size
|
||||
+ LZMA_STREAM_HEADER_SIZE)
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
// Set the target position to the beginning of the Index field.
|
||||
coder->file_target_pos -= coder->footer_flags.backward_size;
|
||||
coder->sequence = SEQ_INDEX_INIT;
|
||||
|
||||
// We can avoid an external seek if the whole Index field is
|
||||
// already available in coder->temp.
|
||||
if (coder->temp_size >= coder->footer_flags.backward_size) {
|
||||
// Set coder->temp_pos to point to the beginning
|
||||
// of the Index.
|
||||
coder->temp_pos = coder->temp_size
|
||||
- coder->footer_flags.backward_size;
|
||||
} else {
|
||||
// These are set to zero to indicate that there's no
|
||||
// useful data (Index or anything else) in coder->temp.
|
||||
coder->temp_pos = 0;
|
||||
coder->temp_size = 0;
|
||||
|
||||
// Seek to the beginning of the Index field.
|
||||
if (seek_to_pos(coder, coder->file_target_pos,
|
||||
in_start, in_pos, in_size))
|
||||
return LZMA_SEEK_NEEDED;
|
||||
}
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_INDEX_INIT: {
|
||||
// Calculate the amount of memory already used by the earlier
|
||||
// Indexes so that we know how big memory limit to pass to
|
||||
// the Index decoder.
|
||||
//
|
||||
// NOTE: When there are multiple Streams, the separate
|
||||
// lzma_index structures can use more RAM (as measured by
|
||||
// lzma_index_memused()) than the final combined lzma_index.
|
||||
// Thus memlimit may need to be slightly higher than the final
|
||||
// calculated memory usage will be. This is perhaps a bit
|
||||
// confusing to the application, but I think it shouldn't
|
||||
// cause problems in practice.
|
||||
uint64_t memused = 0;
|
||||
if (coder->combined_index != NULL) {
|
||||
memused = lzma_index_memused(coder->combined_index);
|
||||
assert(memused <= coder->memlimit);
|
||||
if (memused > coder->memlimit) // Extra sanity check
|
||||
return LZMA_PROG_ERROR;
|
||||
}
|
||||
|
||||
// Initialize the Index decoder.
|
||||
return_if_error(lzma_index_decoder_init(
|
||||
&coder->index_decoder, allocator,
|
||||
&coder->this_index,
|
||||
coder->memlimit - memused));
|
||||
|
||||
coder->index_remaining = coder->footer_flags.backward_size;
|
||||
coder->sequence = SEQ_INDEX_DECODE;
|
||||
}
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_INDEX_DECODE: {
|
||||
// Decode (a part of) the Index. If the whole Index is already
|
||||
// in coder->temp, read it from there. Otherwise read from
|
||||
// in[*in_pos] onwards. Note that index_decode() updates
|
||||
// coder->index_remaining and optionally coder->file_cur_pos.
|
||||
lzma_ret ret;
|
||||
if (coder->temp_size != 0) {
|
||||
assert(coder->temp_size - coder->temp_pos
|
||||
== coder->index_remaining);
|
||||
ret = decode_index(coder, allocator, coder->temp,
|
||||
&coder->temp_pos, coder->temp_size,
|
||||
false);
|
||||
} else {
|
||||
// Don't give the decoder more input than the known
|
||||
// remaining size of the Index field.
|
||||
size_t in_stop = in_size;
|
||||
if (in_size - *in_pos > coder->index_remaining)
|
||||
in_stop = *in_pos
|
||||
+ (size_t)(coder->index_remaining);
|
||||
|
||||
ret = decode_index(coder, allocator,
|
||||
in, in_pos, in_stop, true);
|
||||
}
|
||||
|
||||
switch (ret) {
|
||||
case LZMA_OK:
|
||||
// If the Index docoder asks for more input when we
|
||||
// have already given it as much input as Backward Size
|
||||
// indicated, the file is invalid.
|
||||
if (coder->index_remaining == 0)
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
// We cannot get here if we were reading Index from
|
||||
// coder->temp because when reading from coder->temp
|
||||
// we give the Index decoder exactly
|
||||
// coder->index_remaining bytes of input.
|
||||
assert(coder->temp_size == 0);
|
||||
|
||||
return LZMA_OK;
|
||||
|
||||
case LZMA_STREAM_END:
|
||||
// If the decoding seems to be successful, check also
|
||||
// that the Index decoder consumed as much input as
|
||||
// indicated by the Backward Size field.
|
||||
if (coder->index_remaining != 0)
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
break;
|
||||
|
||||
default:
|
||||
return ret;
|
||||
}
|
||||
|
||||
// Calculate how much the Index tells us to seek backwards
|
||||
// (relative to the beginning of the Index): Total size of
|
||||
// all Blocks plus the size of the Stream Header field.
|
||||
// No integer overflow here because lzma_index_total_size()
|
||||
// cannot return a value greater than LZMA_VLI_MAX.
|
||||
const uint64_t seek_amount
|
||||
= lzma_index_total_size(coder->this_index)
|
||||
+ LZMA_STREAM_HEADER_SIZE;
|
||||
|
||||
// Check that Index is sane in sense that seek_amount won't
|
||||
// make us seek past the beginning of the file when locating
|
||||
// the Stream Header.
|
||||
//
|
||||
// coder->file_target_pos still points to the beginning of
|
||||
// the Index field.
|
||||
if (coder->file_target_pos < seek_amount)
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
// Set the target to the beginning of Stream Header.
|
||||
coder->file_target_pos -= seek_amount;
|
||||
|
||||
if (coder->file_target_pos == 0) {
|
||||
// We would seek to the beginning of the file, but
|
||||
// since we already decoded that Stream Header in
|
||||
// SEQ_MAGIC_BYTES, we can use the cached value from
|
||||
// coder->first_header_flags to avoid the seek.
|
||||
coder->header_flags = coder->first_header_flags;
|
||||
coder->sequence = SEQ_HEADER_COMPARE;
|
||||
break;
|
||||
}
|
||||
|
||||
coder->sequence = SEQ_HEADER_DECODE;
|
||||
|
||||
// Make coder->file_target_pos point to the end of
|
||||
// the Stream Header field.
|
||||
coder->file_target_pos += LZMA_STREAM_HEADER_SIZE;
|
||||
|
||||
// If coder->temp_size is non-zero, it points to the end
|
||||
// of the Index field. Then the beginning of the Index
|
||||
// field is at coder->temp[coder->temp_size
|
||||
// - coder->footer_flags.backward_size].
|
||||
assert(coder->temp_size == 0 || coder->temp_size
|
||||
>= coder->footer_flags.backward_size);
|
||||
|
||||
// If coder->temp contained the whole Index, see if it has
|
||||
// enough data to contain also the Stream Header. If so,
|
||||
// we avoid an external seek.
|
||||
//
|
||||
// NOTE: This can happen only with small .xz files and only
|
||||
// for the non-first Stream as the Stream Flags of the first
|
||||
// Stream are cached and already handled a few lines above.
|
||||
// So this isn't as useful as the other seek-avoidance cases.
|
||||
if (coder->temp_size != 0 && coder->temp_size
|
||||
- coder->footer_flags.backward_size
|
||||
>= seek_amount) {
|
||||
// Make temp_pos and temp_size point to the *end* of
|
||||
// Stream Header so that SEQ_HEADER_DECODE will find
|
||||
// the start of Stream Header from coder->temp[
|
||||
// coder->temp_size - LZMA_STREAM_HEADER_SIZE].
|
||||
coder->temp_pos = coder->temp_size
|
||||
- coder->footer_flags.backward_size
|
||||
- seek_amount
|
||||
+ LZMA_STREAM_HEADER_SIZE;
|
||||
coder->temp_size = coder->temp_pos;
|
||||
} else {
|
||||
// Seek so that Stream Header will be at the end of
|
||||
// coder->temp. With typical multi-Stream files we
|
||||
// will usually also get the Stream Footer and Index
|
||||
// of the *previous* Stream in coder->temp and thus
|
||||
// won't need a separate seek for them.
|
||||
return_if_error(reverse_seek(coder,
|
||||
in_start, in_pos, in_size));
|
||||
}
|
||||
}
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_HEADER_DECODE:
|
||||
// Copy the Stream Header field into coder->temp.
|
||||
// If Stream Header was already available in coder->temp
|
||||
// in SEQ_INDEX_DECODE, then this does nothing.
|
||||
if (fill_temp(coder, in, in_pos, in_size))
|
||||
return LZMA_OK;
|
||||
|
||||
// Make all these point to the beginning of Stream Header.
|
||||
coder->file_target_pos -= LZMA_STREAM_HEADER_SIZE;
|
||||
coder->temp_size -= LZMA_STREAM_HEADER_SIZE;
|
||||
coder->temp_pos = coder->temp_size;
|
||||
|
||||
// Decode the Stream Header.
|
||||
return_if_error(hide_format_error(lzma_stream_header_decode(
|
||||
&coder->header_flags,
|
||||
coder->temp + coder->temp_size)));
|
||||
|
||||
coder->sequence = SEQ_HEADER_COMPARE;
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_HEADER_COMPARE:
|
||||
// Compare Stream Header against Stream Footer. They must
|
||||
// match.
|
||||
return_if_error(lzma_stream_flags_compare(
|
||||
&coder->header_flags, &coder->footer_flags));
|
||||
|
||||
// Store the decoded Stream Flags into the Index. Use the
|
||||
// Footer Flags because it contains Backward Size, although
|
||||
// it shouldn't matter in practice.
|
||||
if (lzma_index_stream_flags(coder->this_index,
|
||||
&coder->footer_flags) != LZMA_OK)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
// Store also the size of the Stream Padding field. It is
|
||||
// needed to calculate the offsets of the Streams correctly.
|
||||
if (lzma_index_stream_padding(coder->this_index,
|
||||
coder->stream_padding) != LZMA_OK)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
// Reset it so that it's ready for the next Stream.
|
||||
coder->stream_padding = 0;
|
||||
|
||||
// Append the earlier decoded Indexes after this_index.
|
||||
if (coder->combined_index != NULL)
|
||||
return_if_error(lzma_index_cat(coder->this_index,
|
||||
coder->combined_index, allocator));
|
||||
|
||||
coder->combined_index = coder->this_index;
|
||||
coder->this_index = NULL;
|
||||
|
||||
// If the whole file was decoded, tell the caller that we
|
||||
// are finished.
|
||||
if (coder->file_target_pos == 0) {
|
||||
// The combined index must indicate the same file
|
||||
// size as was told to us at initialization.
|
||||
assert(lzma_index_file_size(coder->combined_index)
|
||||
== coder->file_size);
|
||||
|
||||
// Make the combined index available to
|
||||
// the application.
|
||||
*coder->dest_index = coder->combined_index;
|
||||
coder->combined_index = NULL;
|
||||
|
||||
// Mark the input buffer as used since we may have
|
||||
// done internal seeking and thus don't know how
|
||||
// many input bytes were actually used. This way
|
||||
// lzma_stream.total_in gets a slightly better
|
||||
// estimate of the amount of input used.
|
||||
*in_pos = in_size;
|
||||
return LZMA_STREAM_END;
|
||||
}
|
||||
|
||||
// We didn't hit the beginning of the file yet, so continue
|
||||
// reading backwards in the file. If we have unprocessed
|
||||
// data in coder->temp, use it before requesting more data
|
||||
// from the application.
|
||||
//
|
||||
// coder->file_target_pos, coder->temp_size, and
|
||||
// coder->temp_pos all point to the beginning of Stream Header
|
||||
// and thus the end of the previous Stream in the file.
|
||||
coder->sequence = coder->temp_size > 0
|
||||
? SEQ_PADDING_DECODE : SEQ_PADDING_SEEK;
|
||||
break;
|
||||
|
||||
default:
|
||||
assert(0);
|
||||
return LZMA_PROG_ERROR;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
file_info_decoder_memconfig(void *coder_ptr, uint64_t *memusage,
|
||||
uint64_t *old_memlimit, uint64_t new_memlimit)
|
||||
{
|
||||
lzma_file_info_coder *coder = coder_ptr;
|
||||
|
||||
// The memory usage calculation comes from three things:
|
||||
//
|
||||
// (1) The Indexes that have already been decoded and processed into
|
||||
// coder->combined_index.
|
||||
//
|
||||
// (2) The latest Index in coder->this_index that has been decoded but
|
||||
// not yet put into coder->combined_index.
|
||||
//
|
||||
// (3) The latest Index that we have started decoding but haven't
|
||||
// finished and thus isn't available in coder->this_index yet.
|
||||
// Memory usage and limit information needs to be communicated
|
||||
// from/to coder->index_decoder.
|
||||
//
|
||||
// Care has to be taken to not do both (2) and (3) when calculating
|
||||
// the memory usage.
|
||||
uint64_t combined_index_memusage = 0;
|
||||
uint64_t this_index_memusage = 0;
|
||||
|
||||
// (1) If we have already successfully decoded one or more Indexes,
|
||||
// get their memory usage.
|
||||
if (coder->combined_index != NULL)
|
||||
combined_index_memusage = lzma_index_memused(
|
||||
coder->combined_index);
|
||||
|
||||
// Choose between (2), (3), or neither.
|
||||
if (coder->this_index != NULL) {
|
||||
// (2) The latest Index is available. Use its memory usage.
|
||||
this_index_memusage = lzma_index_memused(coder->this_index);
|
||||
|
||||
} else if (coder->sequence == SEQ_INDEX_DECODE) {
|
||||
// (3) The Index decoder is activate and hasn't yet stored
|
||||
// the new index in coder->this_index. Get the memory usage
|
||||
// information from the Index decoder.
|
||||
//
|
||||
// NOTE: If the Index decoder doesn't yet know how much memory
|
||||
// it will eventually need, it will return a tiny value here.
|
||||
uint64_t dummy;
|
||||
if (coder->index_decoder.memconfig(coder->index_decoder.coder,
|
||||
&this_index_memusage, &dummy, 0)
|
||||
!= LZMA_OK) {
|
||||
assert(0);
|
||||
return LZMA_PROG_ERROR;
|
||||
}
|
||||
}
|
||||
|
||||
// Now we know the total memory usage/requirement. If we had neither
|
||||
// old Indexes nor a new Index, this will be zero which isn't
|
||||
// acceptable as lzma_memusage() has to return non-zero on success
|
||||
// and even with an empty .xz file we will end up with a lzma_index
|
||||
// that takes some memory.
|
||||
*memusage = combined_index_memusage + this_index_memusage;
|
||||
if (*memusage == 0)
|
||||
*memusage = lzma_index_memusage(1, 0);
|
||||
|
||||
*old_memlimit = coder->memlimit;
|
||||
|
||||
// If requested, set a new memory usage limit.
|
||||
if (new_memlimit != 0) {
|
||||
if (new_memlimit < *memusage)
|
||||
return LZMA_MEMLIMIT_ERROR;
|
||||
|
||||
// In the condition (3) we need to tell the Index decoder
|
||||
// its new memory usage limit.
|
||||
if (coder->this_index == NULL
|
||||
&& coder->sequence == SEQ_INDEX_DECODE) {
|
||||
const uint64_t idec_new_memlimit = new_memlimit
|
||||
- combined_index_memusage;
|
||||
|
||||
assert(this_index_memusage > 0);
|
||||
assert(idec_new_memlimit > 0);
|
||||
|
||||
uint64_t dummy1;
|
||||
uint64_t dummy2;
|
||||
|
||||
if (coder->index_decoder.memconfig(
|
||||
coder->index_decoder.coder,
|
||||
&dummy1, &dummy2, idec_new_memlimit)
|
||||
!= LZMA_OK) {
|
||||
assert(0);
|
||||
return LZMA_PROG_ERROR;
|
||||
}
|
||||
}
|
||||
|
||||
coder->memlimit = new_memlimit;
|
||||
}
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
file_info_decoder_end(void *coder_ptr, const lzma_allocator *allocator)
|
||||
{
|
||||
lzma_file_info_coder *coder = coder_ptr;
|
||||
|
||||
lzma_next_end(&coder->index_decoder, allocator);
|
||||
lzma_index_end(coder->this_index, allocator);
|
||||
lzma_index_end(coder->combined_index, allocator);
|
||||
|
||||
lzma_free(coder, allocator);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
lzma_file_info_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator, uint64_t *seek_pos,
|
||||
lzma_index **dest_index,
|
||||
uint64_t memlimit, uint64_t file_size)
|
||||
{
|
||||
lzma_next_coder_init(&lzma_file_info_decoder_init, next, allocator);
|
||||
|
||||
if (dest_index == NULL)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
lzma_file_info_coder *coder = next->coder;
|
||||
if (coder == NULL) {
|
||||
coder = lzma_alloc(sizeof(lzma_file_info_coder), allocator);
|
||||
if (coder == NULL)
|
||||
return LZMA_MEM_ERROR;
|
||||
|
||||
next->coder = coder;
|
||||
next->code = &file_info_decode;
|
||||
next->end = &file_info_decoder_end;
|
||||
next->memconfig = &file_info_decoder_memconfig;
|
||||
|
||||
coder->index_decoder = LZMA_NEXT_CODER_INIT;
|
||||
coder->this_index = NULL;
|
||||
coder->combined_index = NULL;
|
||||
}
|
||||
|
||||
coder->sequence = SEQ_MAGIC_BYTES;
|
||||
coder->file_cur_pos = 0;
|
||||
coder->file_target_pos = 0;
|
||||
coder->file_size = file_size;
|
||||
|
||||
lzma_index_end(coder->this_index, allocator);
|
||||
coder->this_index = NULL;
|
||||
|
||||
lzma_index_end(coder->combined_index, allocator);
|
||||
coder->combined_index = NULL;
|
||||
|
||||
coder->stream_padding = 0;
|
||||
|
||||
coder->dest_index = dest_index;
|
||||
coder->external_seek_pos = seek_pos;
|
||||
|
||||
// If memlimit is 0, make it 1 to ensure that lzma_memlimit_get()
|
||||
// won't return 0 (which would indicate an error).
|
||||
coder->memlimit = my_max(1, memlimit);
|
||||
|
||||
// Prepare these for reading the first Stream Header into coder->temp.
|
||||
coder->temp_pos = 0;
|
||||
coder->temp_size = LZMA_STREAM_HEADER_SIZE;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
extern LZMA_API(lzma_ret)
|
||||
lzma_file_info_decoder(lzma_stream *strm, lzma_index **dest_index,
|
||||
uint64_t memlimit, uint64_t file_size)
|
||||
{
|
||||
lzma_next_strm_init(lzma_file_info_decoder_init, strm, &strm->seek_pos,
|
||||
dest_index, memlimit, file_size);
|
||||
|
||||
// We allow LZMA_FINISH in addition to LZMA_RUN for convenience.
|
||||
// lzma_code() is able to handle the LZMA_FINISH + LZMA_SEEK_NEEDED
|
||||
// combination in a sane way. Applications still need to be careful
|
||||
// if they use LZMA_FINISH so that they remember to reset it back
|
||||
// to LZMA_RUN after seeking if needed.
|
||||
strm->internal->supported_actions[LZMA_RUN] = true;
|
||||
strm->internal->supported_actions[LZMA_FINISH] = true;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
|
@ -42,6 +42,13 @@ static const struct {
|
|||
.last_ok = true,
|
||||
.changes_size = true,
|
||||
},
|
||||
{
|
||||
.id = LZMA_FILTER_LZMA1EXT,
|
||||
.options_size = sizeof(lzma_options_lzma),
|
||||
.non_last_ok = false,
|
||||
.last_ok = true,
|
||||
.changes_size = true,
|
||||
},
|
||||
#endif
|
||||
#if defined(HAVE_ENCODER_LZMA2) || defined(HAVE_DECODER_LZMA2)
|
||||
{
|
||||
|
@ -97,6 +104,15 @@ static const struct {
|
|||
.changes_size = false,
|
||||
},
|
||||
#endif
|
||||
#if defined(HAVE_ENCODER_ARM64) || defined(HAVE_DECODER_ARM64)
|
||||
{
|
||||
.id = LZMA_FILTER_ARM64,
|
||||
.options_size = sizeof(lzma_options_bcj),
|
||||
.non_last_ok = true,
|
||||
.last_ok = false,
|
||||
.changes_size = false,
|
||||
},
|
||||
#endif
|
||||
#if defined(HAVE_ENCODER_SPARC) || defined(HAVE_DECODER_SPARC)
|
||||
{
|
||||
.id = LZMA_FILTER_SPARC,
|
||||
|
@ -196,8 +212,34 @@ lzma_filters_copy(const lzma_filter *src, lzma_filter *real_dest,
|
|||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
validate_chain(const lzma_filter *filters, size_t *count)
|
||||
extern LZMA_API(void)
|
||||
lzma_filters_free(lzma_filter *filters, const lzma_allocator *allocator)
|
||||
{
|
||||
if (filters == NULL)
|
||||
return;
|
||||
|
||||
for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) {
|
||||
if (i == LZMA_FILTERS_MAX) {
|
||||
// The API says that LZMA_FILTERS_MAX + 1 is the
|
||||
// maximum allowed size including the terminating
|
||||
// element. Thus, we should never get here but in
|
||||
// case there is a bug and we do anyway, don't go
|
||||
// past the (probable) end of the array.
|
||||
assert(0);
|
||||
break;
|
||||
}
|
||||
|
||||
lzma_free(filters[i].options, allocator);
|
||||
filters[i].options = NULL;
|
||||
filters[i].id = LZMA_VLI_UNKNOWN;
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
extern lzma_ret
|
||||
lzma_validate_chain(const lzma_filter *filters, size_t *count)
|
||||
{
|
||||
// There must be at least one filter.
|
||||
if (filters == NULL || filters[0].id == LZMA_VLI_UNKNOWN)
|
||||
|
@ -251,7 +293,7 @@ lzma_raw_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
{
|
||||
// Do some basic validation and get the number of filters.
|
||||
size_t count;
|
||||
return_if_error(validate_chain(options, &count));
|
||||
return_if_error(lzma_validate_chain(options, &count));
|
||||
|
||||
// Set the filter functions and copy the options pointer.
|
||||
lzma_filter_info filters[LZMA_FILTERS_MAX + 1];
|
||||
|
@ -304,7 +346,7 @@ lzma_raw_coder_memusage(lzma_filter_find coder_find,
|
|||
// The chain has to have at least one filter.
|
||||
{
|
||||
size_t tmp;
|
||||
if (validate_chain(filters, &tmp) != LZMA_OK)
|
||||
if (lzma_validate_chain(filters, &tmp) != LZMA_OK)
|
||||
return UINT64_MAX;
|
||||
}
|
||||
|
||||
|
|
|
@ -35,6 +35,9 @@ typedef struct {
|
|||
typedef const lzma_filter_coder *(*lzma_filter_find)(lzma_vli id);
|
||||
|
||||
|
||||
extern lzma_ret lzma_validate_chain(const lzma_filter *filters, size_t *count);
|
||||
|
||||
|
||||
extern lzma_ret lzma_raw_coder_init(
|
||||
lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
const lzma_filter *filters,
|
||||
|
|
|
@ -50,6 +50,12 @@ static const lzma_filter_decoder decoders[] = {
|
|||
.memusage = &lzma_lzma_decoder_memusage,
|
||||
.props_decode = &lzma_lzma_props_decode,
|
||||
},
|
||||
{
|
||||
.id = LZMA_FILTER_LZMA1EXT,
|
||||
.init = &lzma_lzma_decoder_init,
|
||||
.memusage = &lzma_lzma_decoder_memusage,
|
||||
.props_decode = &lzma_lzma_props_decode,
|
||||
},
|
||||
#endif
|
||||
#ifdef HAVE_DECODER_LZMA2
|
||||
{
|
||||
|
@ -99,6 +105,14 @@ static const lzma_filter_decoder decoders[] = {
|
|||
.props_decode = &lzma_simple_props_decode,
|
||||
},
|
||||
#endif
|
||||
#ifdef HAVE_DECODER_ARM64
|
||||
{
|
||||
.id = LZMA_FILTER_ARM64,
|
||||
.init = &lzma_simple_arm64_decoder_init,
|
||||
.memusage = NULL,
|
||||
.props_decode = &lzma_simple_props_decode,
|
||||
},
|
||||
#endif
|
||||
#ifdef HAVE_DECODER_SPARC
|
||||
{
|
||||
.id = LZMA_FILTER_SPARC,
|
||||
|
|
|
@ -64,6 +64,15 @@ static const lzma_filter_encoder encoders[] = {
|
|||
.props_size_fixed = 5,
|
||||
.props_encode = &lzma_lzma_props_encode,
|
||||
},
|
||||
{
|
||||
.id = LZMA_FILTER_LZMA1EXT,
|
||||
.init = &lzma_lzma_encoder_init,
|
||||
.memusage = &lzma_lzma_encoder_memusage,
|
||||
.block_size = NULL, // Not needed for LZMA1
|
||||
.props_size_get = NULL,
|
||||
.props_size_fixed = 5,
|
||||
.props_encode = &lzma_lzma_props_encode,
|
||||
},
|
||||
#endif
|
||||
#ifdef HAVE_ENCODER_LZMA2
|
||||
{
|
||||
|
@ -126,6 +135,16 @@ static const lzma_filter_encoder encoders[] = {
|
|||
.props_encode = &lzma_simple_props_encode,
|
||||
},
|
||||
#endif
|
||||
#ifdef HAVE_ENCODER_ARM64
|
||||
{
|
||||
.id = LZMA_FILTER_ARM64,
|
||||
.init = &lzma_simple_arm64_encoder_init,
|
||||
.memusage = NULL,
|
||||
.block_size = NULL,
|
||||
.props_size_get = &lzma_simple_props_size,
|
||||
.props_encode = &lzma_simple_props_encode,
|
||||
},
|
||||
#endif
|
||||
#ifdef HAVE_ENCODER_SPARC
|
||||
{
|
||||
.id = LZMA_FILTER_SPARC,
|
||||
|
|
|
@ -10,7 +10,7 @@
|
|||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#include "index.h"
|
||||
#include "index_decoder.h"
|
||||
#include "check.h"
|
||||
|
||||
|
||||
|
@ -180,8 +180,11 @@ index_decode(void *coder_ptr, const lzma_allocator *allocator,
|
|||
return LZMA_OK;
|
||||
|
||||
if (((coder->crc32 >> (coder->pos * 8)) & 0xFF)
|
||||
!= in[(*in_pos)++])
|
||||
!= in[(*in_pos)++]) {
|
||||
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
|
||||
return LZMA_DATA_ERROR;
|
||||
#endif
|
||||
}
|
||||
|
||||
} while (++coder->pos < 4);
|
||||
|
||||
|
@ -265,11 +268,11 @@ index_decoder_reset(lzma_index_coder *coder, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
index_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
extern lzma_ret
|
||||
lzma_index_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
lzma_index **i, uint64_t memlimit)
|
||||
{
|
||||
lzma_next_coder_init(&index_decoder_init, next, allocator);
|
||||
lzma_next_coder_init(&lzma_index_decoder_init, next, allocator);
|
||||
|
||||
if (i == NULL)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
@ -296,7 +299,7 @@ index_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
extern LZMA_API(lzma_ret)
|
||||
lzma_index_decoder(lzma_stream *strm, lzma_index **i, uint64_t memlimit)
|
||||
{
|
||||
lzma_next_strm_init(index_decoder_init, strm, i, memlimit);
|
||||
lzma_next_strm_init(lzma_index_decoder_init, strm, i, memlimit);
|
||||
|
||||
strm->internal->supported_actions[LZMA_RUN] = true;
|
||||
strm->internal->supported_actions[LZMA_FINISH] = true;
|
||||
|
|
24
src/liblzma/common/index_decoder.h
Normal file
24
src/liblzma/common/index_decoder.h
Normal file
|
@ -0,0 +1,24 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
/// \file index_decoder.h
|
||||
/// \brief Decodes the Index field
|
||||
//
|
||||
// Author: Lasse Collin
|
||||
//
|
||||
// This file has been put into the public domain.
|
||||
// You can do whatever you want with this file.
|
||||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#ifndef LZMA_INDEX_DECODER_H
|
||||
#define LZMA_INDEX_DECODER_H
|
||||
|
||||
#include "index.h"
|
||||
|
||||
|
||||
extern lzma_ret lzma_index_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
lzma_index **i, uint64_t memlimit);
|
||||
|
||||
|
||||
#endif
|
|
@ -312,8 +312,11 @@ lzma_index_hash_decode(lzma_index_hash *index_hash, const uint8_t *in,
|
|||
return LZMA_OK;
|
||||
|
||||
if (((index_hash->crc32 >> (index_hash->pos * 8))
|
||||
& 0xFF) != in[(*in_pos)++])
|
||||
& 0xFF) != in[(*in_pos)++]) {
|
||||
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
|
||||
return LZMA_DATA_ERROR;
|
||||
#endif
|
||||
}
|
||||
|
||||
} while (++index_hash->pos < 4);
|
||||
|
||||
|
|
414
src/liblzma/common/lzip_decoder.c
Normal file
414
src/liblzma/common/lzip_decoder.c
Normal file
|
@ -0,0 +1,414 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
/// \file lzip_decoder.c
|
||||
/// \brief Decodes .lz (lzip) files
|
||||
//
|
||||
// Author: Michał Górny
|
||||
// Lasse Collin
|
||||
//
|
||||
// This file has been put into the public domain.
|
||||
// You can do whatever you want with this file.
|
||||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#include "lzip_decoder.h"
|
||||
#include "lzma_decoder.h"
|
||||
#include "check.h"
|
||||
|
||||
|
||||
// .lz format version 0 lacks the 64-bit Member size field in the footer.
|
||||
#define LZIP_V0_FOOTER_SIZE 12
|
||||
#define LZIP_V1_FOOTER_SIZE 20
|
||||
#define LZIP_FOOTER_SIZE_MAX LZIP_V1_FOOTER_SIZE
|
||||
|
||||
// lc/lp/pb are hardcoded in the .lz format.
|
||||
#define LZIP_LC 3
|
||||
#define LZIP_LP 0
|
||||
#define LZIP_PB 2
|
||||
|
||||
|
||||
typedef struct {
|
||||
enum {
|
||||
SEQ_ID_STRING,
|
||||
SEQ_VERSION,
|
||||
SEQ_DICT_SIZE,
|
||||
SEQ_CODER_INIT,
|
||||
SEQ_LZMA_STREAM,
|
||||
SEQ_MEMBER_FOOTER,
|
||||
} sequence;
|
||||
|
||||
/// .lz member format version
|
||||
uint32_t version;
|
||||
|
||||
/// CRC32 of the uncompressed data in the .lz member
|
||||
uint32_t crc32;
|
||||
|
||||
/// Uncompressed size of the .lz member
|
||||
uint64_t uncompressed_size;
|
||||
|
||||
/// Compressed size of the .lz member
|
||||
uint64_t member_size;
|
||||
|
||||
/// Memory usage limit
|
||||
uint64_t memlimit;
|
||||
|
||||
/// Amount of memory actually needed
|
||||
uint64_t memusage;
|
||||
|
||||
/// If true, LZMA_GET_CHECK is returned after decoding the header
|
||||
/// fields. As all files use CRC32 this is redundant but it's
|
||||
/// implemented anyway since the initialization functions supports
|
||||
/// all other flags in addition to LZMA_TELL_ANY_CHECK.
|
||||
bool tell_any_check;
|
||||
|
||||
/// If true, we won't calculate or verify the CRC32 of
|
||||
/// the uncompressed data.
|
||||
bool ignore_check;
|
||||
|
||||
/// If true, we will decode concatenated .lz members and stop if
|
||||
/// non-.lz data is seen after at least one member has been
|
||||
/// successfully decoded.
|
||||
bool concatenated;
|
||||
|
||||
/// When decoding concatenated .lz members, this is true as long as
|
||||
/// we are decoding the first .lz member. This is needed to avoid
|
||||
/// incorrect LZMA_FORMAT_ERROR in case there is non-.lz data at
|
||||
/// the end of the file.
|
||||
bool first_member;
|
||||
|
||||
/// Reading position in the header and footer fields
|
||||
size_t pos;
|
||||
|
||||
/// Buffer to hold the .lz footer fields
|
||||
uint8_t buffer[LZIP_FOOTER_SIZE_MAX];
|
||||
|
||||
/// Options decoded from the .lz header that needed to initialize
|
||||
/// the LZMA1 decoder.
|
||||
lzma_options_lzma options;
|
||||
|
||||
/// LZMA1 decoder
|
||||
lzma_next_coder lzma_decoder;
|
||||
|
||||
} lzma_lzip_coder;
|
||||
|
||||
|
||||
static lzma_ret
|
||||
lzip_decode(void *coder_ptr, const lzma_allocator *allocator,
|
||||
const uint8_t *restrict in, size_t *restrict in_pos,
|
||||
size_t in_size, uint8_t *restrict out,
|
||||
size_t *restrict out_pos, size_t out_size, lzma_action action)
|
||||
{
|
||||
lzma_lzip_coder *coder = coder_ptr;
|
||||
|
||||
while (true)
|
||||
switch (coder->sequence) {
|
||||
case SEQ_ID_STRING: {
|
||||
// The "ID string" or magic bytes are "LZIP" in US-ASCII.
|
||||
const uint8_t lzip_id_string[4] = { 0x4C, 0x5A, 0x49, 0x50 };
|
||||
|
||||
while (coder->pos < sizeof(lzip_id_string)) {
|
||||
if (*in_pos >= in_size) {
|
||||
// If we are on the 2nd+ concatenated member
|
||||
// and the input ends before we can read
|
||||
// the magic bytes, we discard the bytes that
|
||||
// were already read (up to 3) and finish.
|
||||
// See the reasoning below.
|
||||
return !coder->first_member
|
||||
&& action == LZMA_FINISH
|
||||
? LZMA_STREAM_END : LZMA_OK;
|
||||
}
|
||||
|
||||
if (in[*in_pos] != lzip_id_string[coder->pos]) {
|
||||
// The .lz format allows putting non-.lz data
|
||||
// at the end of the file. If we have seen
|
||||
// at least one valid .lz member already,
|
||||
// then we won't consume the byte at *in_pos
|
||||
// and will return LZMA_STREAM_END. This way
|
||||
// apps can easily locate and read the non-.lz
|
||||
// data after the .lz member(s).
|
||||
//
|
||||
// NOTE: If the first 1-3 bytes of the non-.lz
|
||||
// data match the .lz ID string then the first
|
||||
// 1-3 bytes of the junk will get ignored by
|
||||
// us. If apps want to properly locate the
|
||||
// trailing data they must ensure that the
|
||||
// first byte of their custom data isn't the
|
||||
// same as the first byte of .lz ID string.
|
||||
// With the liblzma API we cannot rewind the
|
||||
// input position across calls to lzma_code().
|
||||
return !coder->first_member
|
||||
? LZMA_STREAM_END : LZMA_FORMAT_ERROR;
|
||||
}
|
||||
|
||||
++*in_pos;
|
||||
++coder->pos;
|
||||
}
|
||||
|
||||
coder->pos = 0;
|
||||
|
||||
coder->crc32 = 0;
|
||||
coder->uncompressed_size = 0;
|
||||
coder->member_size = sizeof(lzip_id_string);
|
||||
|
||||
coder->sequence = SEQ_VERSION;
|
||||
}
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_VERSION:
|
||||
if (*in_pos >= in_size)
|
||||
return LZMA_OK;
|
||||
|
||||
coder->version = in[(*in_pos)++];
|
||||
|
||||
// We support version 0 and unextended version 1.
|
||||
if (coder->version > 1)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
++coder->member_size;
|
||||
coder->sequence = SEQ_DICT_SIZE;
|
||||
|
||||
// .lz versions 0 and 1 use CRC32 as the integrity check
|
||||
// so if the application wanted to know that
|
||||
// (LZMA_TELL_ANY_CHECK) we can tell it now.
|
||||
if (coder->tell_any_check)
|
||||
return LZMA_GET_CHECK;
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_DICT_SIZE: {
|
||||
if (*in_pos >= in_size)
|
||||
return LZMA_OK;
|
||||
|
||||
const uint32_t ds = in[(*in_pos)++];
|
||||
++coder->member_size;
|
||||
|
||||
// The five lowest bits are for the base-2 logarithm of
|
||||
// the dictionary size and the highest three bits are
|
||||
// the fractional part (0/16 to 7/16) that will be
|
||||
// substracted to get the final value.
|
||||
//
|
||||
// For example, with 0xB5:
|
||||
// b2log = 21
|
||||
// fracnum = 5
|
||||
// dict_size = 2^21 - 2^21 * 5 / 16 = 1408 KiB
|
||||
const uint32_t b2log = ds & 0x1F;
|
||||
const uint32_t fracnum = ds >> 5;
|
||||
|
||||
// The format versions 0 and 1 allow dictionary size in the
|
||||
// range [4 KiB, 512 MiB].
|
||||
if (b2log < 12 || b2log > 29 || (b2log == 12 && fracnum > 0))
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
// 2^[b2log] - 2^[b2log] * [fracnum] / 16
|
||||
// = 2^[b2log] - [fracnum] * 2^([b2log] - 4)
|
||||
coder->options.dict_size = (UINT32_C(1) << b2log)
|
||||
- (fracnum << (b2log - 4));
|
||||
|
||||
assert(coder->options.dict_size >= 4096);
|
||||
assert(coder->options.dict_size <= (UINT32_C(512) << 20));
|
||||
|
||||
coder->options.preset_dict = NULL;
|
||||
coder->options.lc = LZIP_LC;
|
||||
coder->options.lp = LZIP_LP;
|
||||
coder->options.pb = LZIP_PB;
|
||||
|
||||
// Calculate the memory usage.
|
||||
coder->memusage = lzma_lzma_decoder_memusage(&coder->options)
|
||||
+ LZMA_MEMUSAGE_BASE;
|
||||
|
||||
// Initialization is a separate step because if we return
|
||||
// LZMA_MEMLIMIT_ERROR we need to be able to restart after
|
||||
// the memlimit has been increased.
|
||||
coder->sequence = SEQ_CODER_INIT;
|
||||
}
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_CODER_INIT: {
|
||||
if (coder->memusage > coder->memlimit)
|
||||
return LZMA_MEMLIMIT_ERROR;
|
||||
|
||||
const lzma_filter_info filters[2] = {
|
||||
{
|
||||
.id = LZMA_FILTER_LZMA1,
|
||||
.init = &lzma_lzma_decoder_init,
|
||||
.options = &coder->options,
|
||||
}, {
|
||||
.init = NULL,
|
||||
}
|
||||
};
|
||||
|
||||
return_if_error(lzma_next_filter_init(&coder->lzma_decoder,
|
||||
allocator, filters));
|
||||
|
||||
coder->crc32 = 0;
|
||||
coder->sequence = SEQ_LZMA_STREAM;
|
||||
}
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_LZMA_STREAM: {
|
||||
const size_t in_start = *in_pos;
|
||||
const size_t out_start = *out_pos;
|
||||
|
||||
const lzma_ret ret = coder->lzma_decoder.code(
|
||||
coder->lzma_decoder.coder, allocator,
|
||||
in, in_pos, in_size, out, out_pos, out_size,
|
||||
action);
|
||||
|
||||
const size_t out_used = *out_pos - out_start;
|
||||
|
||||
coder->member_size += *in_pos - in_start;
|
||||
coder->uncompressed_size += out_used;
|
||||
|
||||
if (!coder->ignore_check)
|
||||
coder->crc32 = lzma_crc32(out + out_start, out_used,
|
||||
coder->crc32);
|
||||
|
||||
if (ret != LZMA_STREAM_END)
|
||||
return ret;
|
||||
|
||||
coder->sequence = SEQ_MEMBER_FOOTER;
|
||||
}
|
||||
|
||||
// Fall through
|
||||
|
||||
case SEQ_MEMBER_FOOTER: {
|
||||
// The footer of .lz version 0 lacks the Member size field.
|
||||
// This is the only difference between version 0 and
|
||||
// unextended version 1 formats.
|
||||
const size_t footer_size = coder->version == 0
|
||||
? LZIP_V0_FOOTER_SIZE
|
||||
: LZIP_V1_FOOTER_SIZE;
|
||||
|
||||
// Copy the CRC32, Data size, and Member size fields to
|
||||
// the internal buffer.
|
||||
lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos,
|
||||
footer_size);
|
||||
|
||||
// Return if we didn't get the whole footer yet.
|
||||
if (coder->pos < footer_size)
|
||||
return LZMA_OK;
|
||||
|
||||
coder->pos = 0;
|
||||
coder->member_size += footer_size;
|
||||
|
||||
// Check that the footer fields match the observed data.
|
||||
if (!coder->ignore_check
|
||||
&& coder->crc32 != read32le(&coder->buffer[0]))
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
if (coder->uncompressed_size != read64le(&coder->buffer[4]))
|
||||
return LZMA_DATA_ERROR;
|
||||
|
||||
if (coder->version > 0) {
|
||||
// .lz version 0 has no Member size field.
|
||||
if (coder->member_size != read64le(&coder->buffer[12]))
|
||||
return LZMA_DATA_ERROR;
|
||||
}
|
||||
|
||||
// Decoding is finished if we weren't requested to decode
|
||||
// more than one .lz member.
|
||||
if (!coder->concatenated)
|
||||
return LZMA_STREAM_END;
|
||||
|
||||
coder->first_member = false;
|
||||
coder->sequence = SEQ_ID_STRING;
|
||||
break;
|
||||
}
|
||||
|
||||
default:
|
||||
assert(0);
|
||||
return LZMA_PROG_ERROR;
|
||||
}
|
||||
|
||||
// Never reached
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
lzip_decoder_end(void *coder_ptr, const lzma_allocator *allocator)
|
||||
{
|
||||
lzma_lzip_coder *coder = coder_ptr;
|
||||
lzma_next_end(&coder->lzma_decoder, allocator);
|
||||
lzma_free(coder, allocator);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
static lzma_check
|
||||
lzip_decoder_get_check(const void *coder_ptr lzma_attribute((__unused__)))
|
||||
{
|
||||
return LZMA_CHECK_CRC32;
|
||||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
lzip_decoder_memconfig(void *coder_ptr, uint64_t *memusage,
|
||||
uint64_t *old_memlimit, uint64_t new_memlimit)
|
||||
{
|
||||
lzma_lzip_coder *coder = coder_ptr;
|
||||
|
||||
*memusage = coder->memusage;
|
||||
*old_memlimit = coder->memlimit;
|
||||
|
||||
if (new_memlimit != 0) {
|
||||
if (new_memlimit < coder->memusage)
|
||||
return LZMA_MEMLIMIT_ERROR;
|
||||
|
||||
coder->memlimit = new_memlimit;
|
||||
}
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
extern lzma_ret
|
||||
lzma_lzip_decoder_init(
|
||||
lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
uint64_t memlimit, uint32_t flags)
|
||||
{
|
||||
lzma_next_coder_init(&lzma_lzip_decoder_init, next, allocator);
|
||||
|
||||
if (flags & ~LZMA_SUPPORTED_FLAGS)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
lzma_lzip_coder *coder = next->coder;
|
||||
if (coder == NULL) {
|
||||
coder = lzma_alloc(sizeof(lzma_lzip_coder), allocator);
|
||||
if (coder == NULL)
|
||||
return LZMA_MEM_ERROR;
|
||||
|
||||
next->coder = coder;
|
||||
next->code = &lzip_decode;
|
||||
next->end = &lzip_decoder_end;
|
||||
next->get_check = &lzip_decoder_get_check;
|
||||
next->memconfig = &lzip_decoder_memconfig;
|
||||
|
||||
coder->lzma_decoder = LZMA_NEXT_CODER_INIT;
|
||||
}
|
||||
|
||||
coder->sequence = SEQ_ID_STRING;
|
||||
coder->memlimit = my_max(1, memlimit);
|
||||
coder->memusage = LZMA_MEMUSAGE_BASE;
|
||||
coder->tell_any_check = (flags & LZMA_TELL_ANY_CHECK) != 0;
|
||||
coder->ignore_check = (flags & LZMA_IGNORE_CHECK) != 0;
|
||||
coder->concatenated = (flags & LZMA_CONCATENATED) != 0;
|
||||
coder->first_member = true;
|
||||
coder->pos = 0;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
extern LZMA_API(lzma_ret)
|
||||
lzma_lzip_decoder(lzma_stream *strm, uint64_t memlimit, uint32_t flags)
|
||||
{
|
||||
lzma_next_strm_init(lzma_lzip_decoder_init, strm, memlimit, flags);
|
||||
|
||||
strm->internal->supported_actions[LZMA_RUN] = true;
|
||||
strm->internal->supported_actions[LZMA_FINISH] = true;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
22
src/liblzma/common/lzip_decoder.h
Normal file
22
src/liblzma/common/lzip_decoder.h
Normal file
|
@ -0,0 +1,22 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
/// \file lzip_decoder.h
|
||||
/// \brief Decodes .lz (lzip) files
|
||||
//
|
||||
// Author: Michał Górny
|
||||
//
|
||||
// This file has been put into the public domain.
|
||||
// You can do whatever you want with this file.
|
||||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#ifndef LZMA_LZIP_DECODER_H
|
||||
#define LZMA_LZIP_DECODER_H
|
||||
|
||||
#include "common.h"
|
||||
|
||||
extern lzma_ret lzma_lzip_decoder_init(
|
||||
lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
uint64_t memlimit, uint32_t flags);
|
||||
|
||||
#endif
|
|
@ -51,10 +51,6 @@ lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2,
|
|||
|| (defined(__INTEL_COMPILER) && defined(__x86_64__)) \
|
||||
|| (defined(__INTEL_COMPILER) && defined(_M_X64)) \
|
||||
|| (defined(_MSC_VER) && defined(_M_X64)))
|
||||
// NOTE: This will use 64-bit unaligned access which
|
||||
// TUKLIB_FAST_UNALIGNED_ACCESS wasn't meant to permit, but
|
||||
// it's convenient here at least as long as it's x86-64 only.
|
||||
//
|
||||
// I keep this x86-64 only for now since that's where I know this
|
||||
// to be a good method. This may be fine on other 64-bit CPUs too.
|
||||
// On big endian one should use xor instead of subtraction and switch
|
||||
|
@ -83,8 +79,9 @@ lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2,
|
|||
&& (defined(__SSE2__) \
|
||||
|| (defined(_MSC_VER) && defined(_M_IX86_FP) \
|
||||
&& _M_IX86_FP >= 2))
|
||||
// NOTE: Like above, this will use 128-bit unaligned access which
|
||||
// TUKLIB_FAST_UNALIGNED_ACCESS wasn't meant to permit.
|
||||
// NOTE: This will use 128-bit unaligned access which
|
||||
// TUKLIB_FAST_UNALIGNED_ACCESS wasn't meant to permit,
|
||||
// but it's convenient here since this is x86-only.
|
||||
//
|
||||
// SSE2 version for 32-bit and 64-bit x86. On x86-64 the above
|
||||
// version is sometimes significantly faster and sometimes
|
||||
|
|
221
src/liblzma/common/microlzma_decoder.c
Normal file
221
src/liblzma/common/microlzma_decoder.c
Normal file
|
@ -0,0 +1,221 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
/// \file microlzma_decoder.c
|
||||
/// \brief Decode MicroLZMA format
|
||||
//
|
||||
// Author: Lasse Collin
|
||||
//
|
||||
// This file has been put into the public domain.
|
||||
// You can do whatever you want with this file.
|
||||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#include "lzma_decoder.h"
|
||||
#include "lz_decoder.h"
|
||||
|
||||
|
||||
typedef struct {
|
||||
/// LZMA1 decoder
|
||||
lzma_next_coder lzma;
|
||||
|
||||
/// Compressed size of the stream as given by the application.
|
||||
/// This must be exactly correct.
|
||||
///
|
||||
/// This will be decremented when input is read.
|
||||
uint64_t comp_size;
|
||||
|
||||
/// Uncompressed size of the stream as given by the application.
|
||||
/// This may be less than the actual uncompressed size if
|
||||
/// uncomp_size_is_exact is false.
|
||||
///
|
||||
/// This will be decremented when output is produced.
|
||||
lzma_vli uncomp_size;
|
||||
|
||||
/// LZMA dictionary size as given by the application
|
||||
uint32_t dict_size;
|
||||
|
||||
/// If true, the exact uncompressed size is known. If false,
|
||||
/// uncomp_size may be smaller than the real uncompressed size;
|
||||
/// uncomp_size may never be bigger than the real uncompressed size.
|
||||
bool uncomp_size_is_exact;
|
||||
|
||||
/// True once the first byte of the MicroLZMA stream
|
||||
/// has been processed.
|
||||
bool props_decoded;
|
||||
} lzma_microlzma_coder;
|
||||
|
||||
|
||||
static lzma_ret
|
||||
microlzma_decode(void *coder_ptr, const lzma_allocator *allocator,
|
||||
const uint8_t *restrict in, size_t *restrict in_pos,
|
||||
size_t in_size, uint8_t *restrict out,
|
||||
size_t *restrict out_pos, size_t out_size, lzma_action action)
|
||||
{
|
||||
lzma_microlzma_coder *coder = coder_ptr;
|
||||
|
||||
// Remember the in start position so that we can update comp_size.
|
||||
const size_t in_start = *in_pos;
|
||||
|
||||
// Remember the out start position so that we can update uncomp_size.
|
||||
const size_t out_start = *out_pos;
|
||||
|
||||
// Limit the amount of input so that the decoder won't read more than
|
||||
// comp_size. This is required when uncomp_size isn't exact because
|
||||
// in that case the LZMA decoder will try to decode more input even
|
||||
// when it has no output space (it can be looking for EOPM).
|
||||
if (in_size - *in_pos > coder->comp_size)
|
||||
in_size = *in_pos + (size_t)(coder->comp_size);
|
||||
|
||||
// When the exact uncompressed size isn't known, we must limit
|
||||
// the available output space to prevent the LZMA decoder from
|
||||
// trying to decode too much.
|
||||
if (!coder->uncomp_size_is_exact
|
||||
&& out_size - *out_pos > coder->uncomp_size)
|
||||
out_size = *out_pos + (size_t)(coder->uncomp_size);
|
||||
|
||||
if (!coder->props_decoded) {
|
||||
// There must be at least one byte of input to decode
|
||||
// the properties byte.
|
||||
if (*in_pos >= in_size)
|
||||
return LZMA_OK;
|
||||
|
||||
lzma_options_lzma options = {
|
||||
.dict_size = coder->dict_size,
|
||||
.preset_dict = NULL,
|
||||
.preset_dict_size = 0,
|
||||
.ext_flags = 0, // EOPM not allowed when size is known
|
||||
.ext_size_low = UINT32_MAX, // Unknown size by default
|
||||
.ext_size_high = UINT32_MAX,
|
||||
};
|
||||
|
||||
if (coder->uncomp_size_is_exact)
|
||||
lzma_set_ext_size(options, coder->uncomp_size);
|
||||
|
||||
// The properties are stored as bitwise-negation
|
||||
// of the typical encoding.
|
||||
if (lzma_lzma_lclppb_decode(&options, ~in[*in_pos]))
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
++*in_pos;
|
||||
|
||||
// Initialize the decoder.
|
||||
lzma_filter_info filters[2] = {
|
||||
{
|
||||
.id = LZMA_FILTER_LZMA1EXT,
|
||||
.init = &lzma_lzma_decoder_init,
|
||||
.options = &options,
|
||||
}, {
|
||||
.init = NULL,
|
||||
}
|
||||
};
|
||||
|
||||
return_if_error(lzma_next_filter_init(&coder->lzma,
|
||||
allocator, filters));
|
||||
|
||||
// Pass one dummy 0x00 byte to the LZMA decoder since that
|
||||
// is what it expects the first byte to be.
|
||||
const uint8_t dummy_in = 0;
|
||||
size_t dummy_in_pos = 0;
|
||||
if (coder->lzma.code(coder->lzma.coder, allocator,
|
||||
&dummy_in, &dummy_in_pos, 1,
|
||||
out, out_pos, out_size, LZMA_RUN) != LZMA_OK)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
assert(dummy_in_pos == 1);
|
||||
coder->props_decoded = true;
|
||||
}
|
||||
|
||||
// The rest is normal LZMA decoding.
|
||||
lzma_ret ret = coder->lzma.code(coder->lzma.coder, allocator,
|
||||
in, in_pos, in_size,
|
||||
out, out_pos, out_size, action);
|
||||
|
||||
// Update the remaining compressed size.
|
||||
assert(coder->comp_size >= *in_pos - in_start);
|
||||
coder->comp_size -= *in_pos - in_start;
|
||||
|
||||
if (coder->uncomp_size_is_exact) {
|
||||
// After successful decompression of the complete stream
|
||||
// the compressed size must match.
|
||||
if (ret == LZMA_STREAM_END && coder->comp_size != 0)
|
||||
ret = LZMA_DATA_ERROR;
|
||||
} else {
|
||||
// Update the amount of output remaining.
|
||||
assert(coder->uncomp_size >= *out_pos - out_start);
|
||||
coder->uncomp_size -= *out_pos - out_start;
|
||||
|
||||
// - We must not get LZMA_STREAM_END because the stream
|
||||
// shouldn't have EOPM.
|
||||
// - We must use uncomp_size to determine when to
|
||||
// return LZMA_STREAM_END.
|
||||
if (ret == LZMA_STREAM_END)
|
||||
ret = LZMA_DATA_ERROR;
|
||||
else if (coder->uncomp_size == 0)
|
||||
ret = LZMA_STREAM_END;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
microlzma_decoder_end(void *coder_ptr, const lzma_allocator *allocator)
|
||||
{
|
||||
lzma_microlzma_coder *coder = coder_ptr;
|
||||
lzma_next_end(&coder->lzma, allocator);
|
||||
lzma_free(coder, allocator);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
microlzma_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
uint64_t comp_size,
|
||||
uint64_t uncomp_size, bool uncomp_size_is_exact,
|
||||
uint32_t dict_size)
|
||||
{
|
||||
lzma_next_coder_init(µlzma_decoder_init, next, allocator);
|
||||
|
||||
lzma_microlzma_coder *coder = next->coder;
|
||||
|
||||
if (coder == NULL) {
|
||||
coder = lzma_alloc(sizeof(lzma_microlzma_coder), allocator);
|
||||
if (coder == NULL)
|
||||
return LZMA_MEM_ERROR;
|
||||
|
||||
next->coder = coder;
|
||||
next->code = µlzma_decode;
|
||||
next->end = µlzma_decoder_end;
|
||||
|
||||
coder->lzma = LZMA_NEXT_CODER_INIT;
|
||||
}
|
||||
|
||||
// The public API is uint64_t but the internal LZ decoder API uses
|
||||
// lzma_vli.
|
||||
if (uncomp_size > LZMA_VLI_MAX)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
coder->comp_size = comp_size;
|
||||
coder->uncomp_size = uncomp_size;
|
||||
coder->uncomp_size_is_exact = uncomp_size_is_exact;
|
||||
coder->dict_size = dict_size;
|
||||
|
||||
coder->props_decoded = false;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
extern LZMA_API(lzma_ret)
|
||||
lzma_microlzma_decoder(lzma_stream *strm, uint64_t comp_size,
|
||||
uint64_t uncomp_size, lzma_bool uncomp_size_is_exact,
|
||||
uint32_t dict_size)
|
||||
{
|
||||
lzma_next_strm_init(microlzma_decoder_init, strm, comp_size,
|
||||
uncomp_size, uncomp_size_is_exact, dict_size);
|
||||
|
||||
strm->internal->supported_actions[LZMA_RUN] = true;
|
||||
strm->internal->supported_actions[LZMA_FINISH] = true;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
140
src/liblzma/common/microlzma_encoder.c
Normal file
140
src/liblzma/common/microlzma_encoder.c
Normal file
|
@ -0,0 +1,140 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
/// \file microlzma_encoder.c
|
||||
/// \brief Encode into MicroLZMA format
|
||||
//
|
||||
// Author: Lasse Collin
|
||||
//
|
||||
// This file has been put into the public domain.
|
||||
// You can do whatever you want with this file.
|
||||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#include "lzma_encoder.h"
|
||||
|
||||
|
||||
typedef struct {
|
||||
/// LZMA1 encoder
|
||||
lzma_next_coder lzma;
|
||||
|
||||
/// LZMA properties byte (lc/lp/pb)
|
||||
uint8_t props;
|
||||
} lzma_microlzma_coder;
|
||||
|
||||
|
||||
static lzma_ret
|
||||
microlzma_encode(void *coder_ptr, const lzma_allocator *allocator,
|
||||
const uint8_t *restrict in, size_t *restrict in_pos,
|
||||
size_t in_size, uint8_t *restrict out,
|
||||
size_t *restrict out_pos, size_t out_size, lzma_action action)
|
||||
{
|
||||
lzma_microlzma_coder *coder = coder_ptr;
|
||||
|
||||
// Remember *out_pos so that we can overwrite the first byte with
|
||||
// the LZMA properties byte.
|
||||
const size_t out_start = *out_pos;
|
||||
|
||||
// Remember *in_pos so that we can set it based on how many
|
||||
// uncompressed bytes were actually encoded.
|
||||
const size_t in_start = *in_pos;
|
||||
|
||||
// Set the output size limit based on the available output space.
|
||||
// We know that the encoder supports set_out_limit() so
|
||||
// LZMA_OPTIONS_ERROR isn't possible. LZMA_BUF_ERROR is possible
|
||||
// but lzma_code() has an assertion to not allow it to be returned
|
||||
// from here and I don't want to change that for now, so
|
||||
// LZMA_BUF_ERROR becomes LZMA_PROG_ERROR.
|
||||
uint64_t uncomp_size;
|
||||
if (coder->lzma.set_out_limit(coder->lzma.coder,
|
||||
&uncomp_size, out_size - *out_pos) != LZMA_OK)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
// set_out_limit fails if this isn't true.
|
||||
assert(out_size - *out_pos >= 6);
|
||||
|
||||
// Encode as much as possible.
|
||||
const lzma_ret ret = coder->lzma.code(coder->lzma.coder, allocator,
|
||||
in, in_pos, in_size, out, out_pos, out_size, action);
|
||||
|
||||
if (ret != LZMA_STREAM_END) {
|
||||
if (ret == LZMA_OK) {
|
||||
assert(0);
|
||||
return LZMA_PROG_ERROR;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
// The first output byte is bitwise-negation of the properties byte.
|
||||
// We know that there is space for this byte because set_out_limit
|
||||
// and the actual encoding succeeded.
|
||||
out[out_start] = (uint8_t)(~coder->props);
|
||||
|
||||
// The LZMA encoder likely read more input than it was able to encode.
|
||||
// Set *in_pos based on uncomp_size.
|
||||
assert(uncomp_size <= in_size - in_start);
|
||||
*in_pos = in_start + (size_t)(uncomp_size);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
microlzma_encoder_end(void *coder_ptr, const lzma_allocator *allocator)
|
||||
{
|
||||
lzma_microlzma_coder *coder = coder_ptr;
|
||||
lzma_next_end(&coder->lzma, allocator);
|
||||
lzma_free(coder, allocator);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
microlzma_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
const lzma_options_lzma *options)
|
||||
{
|
||||
lzma_next_coder_init(µlzma_encoder_init, next, allocator);
|
||||
|
||||
lzma_microlzma_coder *coder = next->coder;
|
||||
|
||||
if (coder == NULL) {
|
||||
coder = lzma_alloc(sizeof(lzma_microlzma_coder), allocator);
|
||||
if (coder == NULL)
|
||||
return LZMA_MEM_ERROR;
|
||||
|
||||
next->coder = coder;
|
||||
next->code = µlzma_encode;
|
||||
next->end = µlzma_encoder_end;
|
||||
|
||||
coder->lzma = LZMA_NEXT_CODER_INIT;
|
||||
}
|
||||
|
||||
// Encode the properties byte. Bitwise-negation of it will be the
|
||||
// first output byte.
|
||||
return_if_error(lzma_lzma_lclppb_encode(options, &coder->props));
|
||||
|
||||
// Initialize the LZMA encoder.
|
||||
const lzma_filter_info filters[2] = {
|
||||
{
|
||||
.id = LZMA_FILTER_LZMA1,
|
||||
.init = &lzma_lzma_encoder_init,
|
||||
.options = (void *)(options),
|
||||
}, {
|
||||
.init = NULL,
|
||||
}
|
||||
};
|
||||
|
||||
return lzma_next_filter_init(&coder->lzma, allocator, filters);
|
||||
}
|
||||
|
||||
|
||||
extern LZMA_API(lzma_ret)
|
||||
lzma_microlzma_encoder(lzma_stream *strm, const lzma_options_lzma *options)
|
||||
{
|
||||
lzma_next_strm_init(microlzma_encoder_init, strm, options);
|
||||
|
||||
strm->internal->supported_actions[LZMA_FINISH] = true;
|
||||
|
||||
return LZMA_OK;
|
||||
|
||||
}
|
|
@ -13,84 +13,121 @@
|
|||
#include "outqueue.h"
|
||||
|
||||
|
||||
/// This is to ease integer overflow checking: We may allocate up to
|
||||
/// 2 * LZMA_THREADS_MAX buffers and we need some extra memory for other
|
||||
/// data structures (that's the second /2).
|
||||
#define BUF_SIZE_MAX (UINT64_MAX / LZMA_THREADS_MAX / 2 / 2)
|
||||
|
||||
|
||||
static lzma_ret
|
||||
get_options(uint64_t *bufs_alloc_size, uint32_t *bufs_count,
|
||||
uint64_t buf_size_max, uint32_t threads)
|
||||
{
|
||||
if (threads > LZMA_THREADS_MAX || buf_size_max > BUF_SIZE_MAX)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
// The number of buffers is twice the number of threads.
|
||||
// This wastes RAM but keeps the threads busy when buffers
|
||||
// finish out of order.
|
||||
//
|
||||
// NOTE: If this is changed, update BUF_SIZE_MAX too.
|
||||
*bufs_count = threads * 2;
|
||||
*bufs_alloc_size = *bufs_count * buf_size_max;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
/// Get the maximum number of buffers that may be allocated based
|
||||
/// on the number of threads. For now this is twice the number of threads.
|
||||
/// It's a compromise between RAM usage and keeping the worker threads busy
|
||||
/// when buffers finish out of order.
|
||||
#define GET_BUFS_LIMIT(threads) (2 * (threads))
|
||||
|
||||
|
||||
extern uint64_t
|
||||
lzma_outq_memusage(uint64_t buf_size_max, uint32_t threads)
|
||||
{
|
||||
uint64_t bufs_alloc_size;
|
||||
uint32_t bufs_count;
|
||||
// This is to ease integer overflow checking: We may allocate up to
|
||||
// GET_BUFS_LIMIT(LZMA_THREADS_MAX) buffers and we need some extra
|
||||
// memory for other data structures too (that's the /2).
|
||||
//
|
||||
// lzma_outq_prealloc_buf() will still accept bigger buffers than this.
|
||||
const uint64_t limit
|
||||
= UINT64_MAX / GET_BUFS_LIMIT(LZMA_THREADS_MAX) / 2;
|
||||
|
||||
if (get_options(&bufs_alloc_size, &bufs_count, buf_size_max, threads)
|
||||
!= LZMA_OK)
|
||||
if (threads > LZMA_THREADS_MAX || buf_size_max > limit)
|
||||
return UINT64_MAX;
|
||||
|
||||
return sizeof(lzma_outq) + bufs_count * sizeof(lzma_outbuf)
|
||||
+ bufs_alloc_size;
|
||||
return GET_BUFS_LIMIT(threads)
|
||||
* lzma_outq_outbuf_memusage(buf_size_max);
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
move_head_to_cache(lzma_outq *outq, const lzma_allocator *allocator)
|
||||
{
|
||||
assert(outq->head != NULL);
|
||||
assert(outq->tail != NULL);
|
||||
assert(outq->bufs_in_use > 0);
|
||||
|
||||
lzma_outbuf *buf = outq->head;
|
||||
outq->head = buf->next;
|
||||
if (outq->head == NULL)
|
||||
outq->tail = NULL;
|
||||
|
||||
if (outq->cache != NULL && outq->cache->allocated != buf->allocated)
|
||||
lzma_outq_clear_cache(outq, allocator);
|
||||
|
||||
buf->next = outq->cache;
|
||||
outq->cache = buf;
|
||||
|
||||
--outq->bufs_in_use;
|
||||
outq->mem_in_use -= lzma_outq_outbuf_memusage(buf->allocated);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
free_one_cached_buffer(lzma_outq *outq, const lzma_allocator *allocator)
|
||||
{
|
||||
assert(outq->cache != NULL);
|
||||
|
||||
lzma_outbuf *buf = outq->cache;
|
||||
outq->cache = buf->next;
|
||||
|
||||
--outq->bufs_allocated;
|
||||
outq->mem_allocated -= lzma_outq_outbuf_memusage(buf->allocated);
|
||||
|
||||
lzma_free(buf, allocator);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
extern void
|
||||
lzma_outq_clear_cache(lzma_outq *outq, const lzma_allocator *allocator)
|
||||
{
|
||||
while (outq->cache != NULL)
|
||||
free_one_cached_buffer(outq, allocator);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
extern void
|
||||
lzma_outq_clear_cache2(lzma_outq *outq, const lzma_allocator *allocator,
|
||||
size_t keep_size)
|
||||
{
|
||||
if (outq->cache == NULL)
|
||||
return;
|
||||
|
||||
// Free all but one.
|
||||
while (outq->cache->next != NULL)
|
||||
free_one_cached_buffer(outq, allocator);
|
||||
|
||||
// Free the last one only if its size doesn't equal to keep_size.
|
||||
if (outq->cache->allocated != keep_size)
|
||||
free_one_cached_buffer(outq, allocator);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
extern lzma_ret
|
||||
lzma_outq_init(lzma_outq *outq, const lzma_allocator *allocator,
|
||||
uint64_t buf_size_max, uint32_t threads)
|
||||
uint32_t threads)
|
||||
{
|
||||
uint64_t bufs_alloc_size;
|
||||
uint32_t bufs_count;
|
||||
if (threads > LZMA_THREADS_MAX)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
// Set bufs_count and bufs_alloc_size.
|
||||
return_if_error(get_options(&bufs_alloc_size, &bufs_count,
|
||||
buf_size_max, threads));
|
||||
const uint32_t bufs_limit = GET_BUFS_LIMIT(threads);
|
||||
|
||||
// Allocate memory if needed.
|
||||
if (outq->buf_size_max != buf_size_max
|
||||
|| outq->bufs_allocated != bufs_count) {
|
||||
lzma_outq_end(outq, allocator);
|
||||
// Clear head/tail.
|
||||
while (outq->head != NULL)
|
||||
move_head_to_cache(outq, allocator);
|
||||
|
||||
#if SIZE_MAX < UINT64_MAX
|
||||
if (bufs_alloc_size > SIZE_MAX)
|
||||
return LZMA_MEM_ERROR;
|
||||
#endif
|
||||
// If new buf_limit is lower than the old one, we may need to free
|
||||
// a few cached buffers.
|
||||
while (bufs_limit < outq->bufs_allocated)
|
||||
free_one_cached_buffer(outq, allocator);
|
||||
|
||||
outq->bufs = lzma_alloc(bufs_count * sizeof(lzma_outbuf),
|
||||
allocator);
|
||||
outq->bufs_mem = lzma_alloc((size_t)(bufs_alloc_size),
|
||||
allocator);
|
||||
|
||||
if (outq->bufs == NULL || outq->bufs_mem == NULL) {
|
||||
lzma_outq_end(outq, allocator);
|
||||
return LZMA_MEM_ERROR;
|
||||
}
|
||||
}
|
||||
|
||||
// Initialize the rest of the main structure. Initialization of
|
||||
// outq->bufs[] is done when they are actually needed.
|
||||
outq->buf_size_max = (size_t)(buf_size_max);
|
||||
outq->bufs_allocated = bufs_count;
|
||||
outq->bufs_pos = 0;
|
||||
outq->bufs_used = 0;
|
||||
outq->bufs_limit = bufs_limit;
|
||||
outq->read_pos = 0;
|
||||
|
||||
return LZMA_OK;
|
||||
|
@ -100,33 +137,81 @@ lzma_outq_init(lzma_outq *outq, const lzma_allocator *allocator,
|
|||
extern void
|
||||
lzma_outq_end(lzma_outq *outq, const lzma_allocator *allocator)
|
||||
{
|
||||
lzma_free(outq->bufs, allocator);
|
||||
outq->bufs = NULL;
|
||||
|
||||
lzma_free(outq->bufs_mem, allocator);
|
||||
outq->bufs_mem = NULL;
|
||||
while (outq->head != NULL)
|
||||
move_head_to_cache(outq, allocator);
|
||||
|
||||
lzma_outq_clear_cache(outq, allocator);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
extern lzma_outbuf *
|
||||
lzma_outq_get_buf(lzma_outq *outq)
|
||||
extern lzma_ret
|
||||
lzma_outq_prealloc_buf(lzma_outq *outq, const lzma_allocator *allocator,
|
||||
size_t size)
|
||||
{
|
||||
// Caller must have checked it with lzma_outq_has_buf().
|
||||
assert(outq->bufs_used < outq->bufs_allocated);
|
||||
assert(outq->bufs_in_use < outq->bufs_limit);
|
||||
|
||||
// Initialize the new buffer.
|
||||
lzma_outbuf *buf = &outq->bufs[outq->bufs_pos];
|
||||
buf->buf = outq->bufs_mem + outq->bufs_pos * outq->buf_size_max;
|
||||
buf->size = 0;
|
||||
// If there already is appropriately-sized buffer in the cache,
|
||||
// we need to do nothing.
|
||||
if (outq->cache != NULL && outq->cache->allocated == size)
|
||||
return LZMA_OK;
|
||||
|
||||
if (size > SIZE_MAX - sizeof(lzma_outbuf))
|
||||
return LZMA_MEM_ERROR;
|
||||
|
||||
const size_t alloc_size = lzma_outq_outbuf_memusage(size);
|
||||
|
||||
// The cache may have buffers but their size is wrong.
|
||||
lzma_outq_clear_cache(outq, allocator);
|
||||
|
||||
outq->cache = lzma_alloc(alloc_size, allocator);
|
||||
if (outq->cache == NULL)
|
||||
return LZMA_MEM_ERROR;
|
||||
|
||||
outq->cache->next = NULL;
|
||||
outq->cache->allocated = size;
|
||||
|
||||
++outq->bufs_allocated;
|
||||
outq->mem_allocated += alloc_size;
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
extern lzma_outbuf *
|
||||
lzma_outq_get_buf(lzma_outq *outq, void *worker)
|
||||
{
|
||||
// Caller must have used lzma_outq_prealloc_buf() to ensure these.
|
||||
assert(outq->bufs_in_use < outq->bufs_limit);
|
||||
assert(outq->bufs_in_use < outq->bufs_allocated);
|
||||
assert(outq->cache != NULL);
|
||||
|
||||
lzma_outbuf *buf = outq->cache;
|
||||
outq->cache = buf->next;
|
||||
buf->next = NULL;
|
||||
|
||||
if (outq->tail != NULL) {
|
||||
assert(outq->head != NULL);
|
||||
outq->tail->next = buf;
|
||||
} else {
|
||||
assert(outq->head == NULL);
|
||||
outq->head = buf;
|
||||
}
|
||||
|
||||
outq->tail = buf;
|
||||
|
||||
buf->worker = worker;
|
||||
buf->finished = false;
|
||||
buf->finish_ret = LZMA_STREAM_END;
|
||||
buf->pos = 0;
|
||||
buf->decoder_in_pos = 0;
|
||||
|
||||
// Update the queue state.
|
||||
if (++outq->bufs_pos == outq->bufs_allocated)
|
||||
outq->bufs_pos = 0;
|
||||
buf->unpadded_size = 0;
|
||||
buf->uncompressed_size = 0;
|
||||
|
||||
++outq->bufs_used;
|
||||
++outq->bufs_in_use;
|
||||
outq->mem_in_use += lzma_outq_outbuf_memusage(buf->allocated);
|
||||
|
||||
return buf;
|
||||
}
|
||||
|
@ -135,50 +220,68 @@ lzma_outq_get_buf(lzma_outq *outq)
|
|||
extern bool
|
||||
lzma_outq_is_readable(const lzma_outq *outq)
|
||||
{
|
||||
uint32_t i = outq->bufs_pos - outq->bufs_used;
|
||||
if (outq->bufs_pos < outq->bufs_used)
|
||||
i += outq->bufs_allocated;
|
||||
if (outq->head == NULL)
|
||||
return false;
|
||||
|
||||
return outq->bufs[i].finished;
|
||||
return outq->read_pos < outq->head->pos || outq->head->finished;
|
||||
}
|
||||
|
||||
|
||||
extern lzma_ret
|
||||
lzma_outq_read(lzma_outq *restrict outq, uint8_t *restrict out,
|
||||
size_t *restrict out_pos, size_t out_size,
|
||||
lzma_outq_read(lzma_outq *restrict outq,
|
||||
const lzma_allocator *restrict allocator,
|
||||
uint8_t *restrict out, size_t *restrict out_pos,
|
||||
size_t out_size,
|
||||
lzma_vli *restrict unpadded_size,
|
||||
lzma_vli *restrict uncompressed_size)
|
||||
{
|
||||
// There must be at least one buffer from which to read.
|
||||
if (outq->bufs_used == 0)
|
||||
if (outq->bufs_in_use == 0)
|
||||
return LZMA_OK;
|
||||
|
||||
// Get the buffer.
|
||||
uint32_t i = outq->bufs_pos - outq->bufs_used;
|
||||
if (outq->bufs_pos < outq->bufs_used)
|
||||
i += outq->bufs_allocated;
|
||||
|
||||
lzma_outbuf *buf = &outq->bufs[i];
|
||||
|
||||
// If it isn't finished yet, we cannot read from it.
|
||||
if (!buf->finished)
|
||||
return LZMA_OK;
|
||||
lzma_outbuf *buf = outq->head;
|
||||
|
||||
// Copy from the buffer to output.
|
||||
lzma_bufcpy(buf->buf, &outq->read_pos, buf->size,
|
||||
//
|
||||
// FIXME? In threaded decoder it may be bad to do this copy while
|
||||
// the mutex is being held.
|
||||
lzma_bufcpy(buf->buf, &outq->read_pos, buf->pos,
|
||||
out, out_pos, out_size);
|
||||
|
||||
// Return if we didn't get all the data from the buffer.
|
||||
if (outq->read_pos < buf->size)
|
||||
if (!buf->finished || outq->read_pos < buf->pos)
|
||||
return LZMA_OK;
|
||||
|
||||
// The buffer was finished. Tell the caller its size information.
|
||||
*unpadded_size = buf->unpadded_size;
|
||||
*uncompressed_size = buf->uncompressed_size;
|
||||
if (unpadded_size != NULL)
|
||||
*unpadded_size = buf->unpadded_size;
|
||||
|
||||
if (uncompressed_size != NULL)
|
||||
*uncompressed_size = buf->uncompressed_size;
|
||||
|
||||
// Remember the return value.
|
||||
const lzma_ret finish_ret = buf->finish_ret;
|
||||
|
||||
// Free this buffer for further use.
|
||||
--outq->bufs_used;
|
||||
move_head_to_cache(outq, allocator);
|
||||
outq->read_pos = 0;
|
||||
|
||||
return LZMA_STREAM_END;
|
||||
return finish_ret;
|
||||
}
|
||||
|
||||
|
||||
extern void
|
||||
lzma_outq_enable_partial_output(lzma_outq *outq,
|
||||
void (*enable_partial_output)(void *worker))
|
||||
{
|
||||
if (outq->head != NULL && !outq->head->finished
|
||||
&& outq->head->worker != NULL) {
|
||||
enable_partial_output(outq->head->worker);
|
||||
|
||||
// Set it to NULL since calling it twice is pointless.
|
||||
outq->head->worker = NULL;
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
|
|
@ -14,16 +14,36 @@
|
|||
|
||||
|
||||
/// Output buffer for a single thread
|
||||
typedef struct {
|
||||
/// Pointer to the output buffer of lzma_outq.buf_size_max bytes
|
||||
uint8_t *buf;
|
||||
typedef struct lzma_outbuf_s lzma_outbuf;
|
||||
struct lzma_outbuf_s {
|
||||
/// Pointer to the next buffer. This is used for the cached buffers.
|
||||
/// The worker thread must not modify this.
|
||||
lzma_outbuf *next;
|
||||
|
||||
/// Amount of data written to buf
|
||||
size_t size;
|
||||
/// This initialized by lzma_outq_get_buf() and
|
||||
/// is used by lzma_outq_enable_partial_output().
|
||||
/// The worker thread must not modify this.
|
||||
void *worker;
|
||||
|
||||
/// Additional size information
|
||||
lzma_vli unpadded_size;
|
||||
lzma_vli uncompressed_size;
|
||||
/// Amount of memory allocated for buf[].
|
||||
/// The worker thread must not modify this.
|
||||
size_t allocated;
|
||||
|
||||
/// Writing position in the worker thread or, in other words, the
|
||||
/// amount of finished data written to buf[] which can be copied out
|
||||
///
|
||||
/// \note This is read by another thread and thus access
|
||||
/// to this variable needs a mutex.
|
||||
size_t pos;
|
||||
|
||||
/// Decompression: Position in the input buffer in the worker thread
|
||||
/// that matches the output "pos" above. This is used to detect if
|
||||
/// more output might be possible from the worker thread: if it has
|
||||
/// consumed all its input, then more output isn't possible.
|
||||
///
|
||||
/// \note This is read by another thread and thus access
|
||||
/// to this variable needs a mutex.
|
||||
size_t decoder_in_pos;
|
||||
|
||||
/// True when no more data will be written into this buffer.
|
||||
///
|
||||
|
@ -31,32 +51,55 @@ typedef struct {
|
|||
/// to this variable needs a mutex.
|
||||
bool finished;
|
||||
|
||||
} lzma_outbuf;
|
||||
/// Return value for lzma_outq_read() when the last byte from
|
||||
/// a finished buffer has been read. Defaults to LZMA_STREAM_END.
|
||||
/// This must *not* be LZMA_OK. The idea is to allow a decoder to
|
||||
/// pass an error code to the main thread, setting the code here
|
||||
/// together with finished = true.
|
||||
lzma_ret finish_ret;
|
||||
|
||||
/// Additional size information. lzma_outq_read() may read these
|
||||
/// when "finished" is true.
|
||||
lzma_vli unpadded_size;
|
||||
lzma_vli uncompressed_size;
|
||||
|
||||
/// Buffer of "allocated" bytes
|
||||
uint8_t buf[];
|
||||
};
|
||||
|
||||
|
||||
typedef struct {
|
||||
/// Array of buffers that are used cyclically.
|
||||
lzma_outbuf *bufs;
|
||||
/// Linked list of buffers in use. The next output byte will be
|
||||
/// read from the head and buffers for the next thread will be
|
||||
/// appended to the tail. tail->next is always NULL.
|
||||
lzma_outbuf *head;
|
||||
lzma_outbuf *tail;
|
||||
|
||||
/// Memory allocated for all the buffers
|
||||
uint8_t *bufs_mem;
|
||||
|
||||
/// Amount of buffer space available in each buffer
|
||||
size_t buf_size_max;
|
||||
|
||||
/// Number of buffers allocated
|
||||
uint32_t bufs_allocated;
|
||||
|
||||
/// Position in the bufs array. The next buffer to be taken
|
||||
/// into use is bufs[bufs_pos].
|
||||
uint32_t bufs_pos;
|
||||
|
||||
/// Number of buffers in use
|
||||
uint32_t bufs_used;
|
||||
|
||||
/// Position in the buffer in lzma_outq_read()
|
||||
/// Number of bytes read from head->buf[] in lzma_outq_read()
|
||||
size_t read_pos;
|
||||
|
||||
/// Linked list of allocated buffers that aren't currently used.
|
||||
/// This way buffers of similar size can be reused and don't
|
||||
/// need to be reallocated every time. For simplicity, all
|
||||
/// cached buffers in the list have the same allocated size.
|
||||
lzma_outbuf *cache;
|
||||
|
||||
/// Total amount of memory allocated for buffers
|
||||
uint64_t mem_allocated;
|
||||
|
||||
/// Amount of memory used by the buffers that are in use in
|
||||
/// the head...tail linked list.
|
||||
uint64_t mem_in_use;
|
||||
|
||||
/// Number of buffers in use in the head...tail list. If and only if
|
||||
/// this is zero, the pointers head and tail above are NULL.
|
||||
uint32_t bufs_in_use;
|
||||
|
||||
/// Number of buffers allocated (in use + cached)
|
||||
uint32_t bufs_allocated;
|
||||
|
||||
/// Maximum allowed number of allocated buffers
|
||||
uint32_t bufs_limit;
|
||||
} lzma_outq;
|
||||
|
||||
|
||||
|
@ -76,32 +119,60 @@ extern uint64_t lzma_outq_memusage(uint64_t buf_size_max, uint32_t threads);
|
|||
/// function knows that there are no previous
|
||||
/// allocations to free.
|
||||
/// \param allocator Pointer to allocator or NULL
|
||||
/// \param buf_size_max Maximum amount of data that a single buffer
|
||||
/// in the queue may need to store.
|
||||
/// \param threads Number of buffers that may be in use
|
||||
/// concurrently. Note that more than this number
|
||||
/// of buffers will actually get allocated to
|
||||
/// of buffers may actually get allocated to
|
||||
/// improve performance when buffers finish
|
||||
/// out of order.
|
||||
/// out of order. The actual maximum number of
|
||||
/// allocated buffers is derived from the number
|
||||
/// of threads.
|
||||
///
|
||||
/// \return - LZMA_OK
|
||||
/// - LZMA_MEM_ERROR
|
||||
///
|
||||
extern lzma_ret lzma_outq_init(
|
||||
lzma_outq *outq, const lzma_allocator *allocator,
|
||||
uint64_t buf_size_max, uint32_t threads);
|
||||
extern lzma_ret lzma_outq_init(lzma_outq *outq,
|
||||
const lzma_allocator *allocator, uint32_t threads);
|
||||
|
||||
|
||||
/// \brief Free the memory associated with the output queue
|
||||
extern void lzma_outq_end(lzma_outq *outq, const lzma_allocator *allocator);
|
||||
|
||||
|
||||
/// \brief Free all cached buffers that consume memory but aren't in use
|
||||
extern void lzma_outq_clear_cache(
|
||||
lzma_outq *outq, const lzma_allocator *allocator);
|
||||
|
||||
|
||||
/// \brief Like lzma_outq_clear_cache() but might keep one buffer
|
||||
///
|
||||
/// One buffer is not freed if its size is equal to keep_size.
|
||||
/// This is useful if the caller knows that it will soon need a buffer of
|
||||
/// keep_size bytes. This way it won't be freed and immediately reallocated.
|
||||
extern void lzma_outq_clear_cache2(
|
||||
lzma_outq *outq, const lzma_allocator *allocator,
|
||||
size_t keep_size);
|
||||
|
||||
|
||||
/// \brief Preallocate a new buffer into cache
|
||||
///
|
||||
/// Splitting the buffer allocation into a separate function makes it
|
||||
/// possible to ensure that way lzma_outq_get_buf() cannot fail.
|
||||
/// If the preallocated buffer isn't actually used (for example, some
|
||||
/// other error occurs), the caller has to do nothing as the buffer will
|
||||
/// be used later or cleared from the cache when not needed.
|
||||
///
|
||||
/// \return LZMA_OK on success, LZMA_MEM_ERROR if allocation fails
|
||||
///
|
||||
extern lzma_ret lzma_outq_prealloc_buf(
|
||||
lzma_outq *outq, const lzma_allocator *allocator, size_t size);
|
||||
|
||||
|
||||
/// \brief Get a new buffer
|
||||
///
|
||||
/// lzma_outq_has_buf() must be used to check that there is a buffer
|
||||
/// lzma_outq_prealloc_buf() must be used to ensure that there is a buffer
|
||||
/// available before calling lzma_outq_get_buf().
|
||||
///
|
||||
extern lzma_outbuf *lzma_outq_get_buf(lzma_outq *outq);
|
||||
extern lzma_outbuf *lzma_outq_get_buf(lzma_outq *outq, void *worker);
|
||||
|
||||
|
||||
/// \brief Test if there is data ready to be read
|
||||
|
@ -126,17 +197,32 @@ extern bool lzma_outq_is_readable(const lzma_outq *outq);
|
|||
/// \return - LZMA: All OK. Either no data was available or the buffer
|
||||
/// being read didn't become empty yet.
|
||||
/// - LZMA_STREAM_END: The buffer being read was finished.
|
||||
/// *unpadded_size and *uncompressed_size were set.
|
||||
/// *unpadded_size and *uncompressed_size were set if they
|
||||
/// were not NULL.
|
||||
///
|
||||
/// \note This reads lzma_outbuf.finished variables and thus call
|
||||
/// to this function needs to be protected with a mutex.
|
||||
/// \note This reads lzma_outbuf.finished and .pos variables and thus
|
||||
/// calls to this function need to be protected with a mutex.
|
||||
///
|
||||
extern lzma_ret lzma_outq_read(lzma_outq *restrict outq,
|
||||
const lzma_allocator *restrict allocator,
|
||||
uint8_t *restrict out, size_t *restrict out_pos,
|
||||
size_t out_size, lzma_vli *restrict unpadded_size,
|
||||
lzma_vli *restrict uncompressed_size);
|
||||
|
||||
|
||||
/// \brief Enable partial output from a worker thread
|
||||
///
|
||||
/// If the buffer at the head of the output queue isn't finished,
|
||||
/// this will call enable_partial_output on the worker associated with
|
||||
/// that output buffer.
|
||||
///
|
||||
/// \note This reads a lzma_outbuf.finished variable and thus
|
||||
/// calls to this function need to be protected with a mutex.
|
||||
///
|
||||
extern void lzma_outq_enable_partial_output(lzma_outq *outq,
|
||||
void (*enable_partial_output)(void *worker));
|
||||
|
||||
|
||||
/// \brief Test if there is at least one buffer free
|
||||
///
|
||||
/// This must be used before getting a new buffer with lzma_outq_get_buf().
|
||||
|
@ -144,7 +230,7 @@ extern lzma_ret lzma_outq_read(lzma_outq *restrict outq,
|
|||
static inline bool
|
||||
lzma_outq_has_buf(const lzma_outq *outq)
|
||||
{
|
||||
return outq->bufs_used < outq->bufs_allocated;
|
||||
return outq->bufs_in_use < outq->bufs_limit;
|
||||
}
|
||||
|
||||
|
||||
|
@ -152,5 +238,17 @@ lzma_outq_has_buf(const lzma_outq *outq)
|
|||
static inline bool
|
||||
lzma_outq_is_empty(const lzma_outq *outq)
|
||||
{
|
||||
return outq->bufs_used == 0;
|
||||
return outq->bufs_in_use == 0;
|
||||
}
|
||||
|
||||
|
||||
/// \brief Get the amount of memory needed for a single lzma_outbuf
|
||||
///
|
||||
/// \note Caller must check that the argument is significantly less
|
||||
/// than SIZE_MAX to avoid an integer overflow!
|
||||
static inline uint64_t
|
||||
lzma_outq_outbuf_memusage(size_t buf_size)
|
||||
{
|
||||
assert(buf_size <= SIZE_MAX - sizeof(lzma_outbuf));
|
||||
return sizeof(lzma_outbuf) + buf_size;
|
||||
}
|
||||
|
|
|
@ -243,9 +243,7 @@ stream_decode(void *coder_ptr, const lzma_allocator *allocator,
|
|||
|
||||
// Free the allocated filter options since they are needed
|
||||
// only to initialize the Block decoder.
|
||||
for (size_t i = 0; i < LZMA_FILTERS_MAX; ++i)
|
||||
lzma_free(filters[i].options, allocator);
|
||||
|
||||
lzma_filters_free(filters, allocator);
|
||||
coder->block_options.filters = NULL;
|
||||
|
||||
// Check if memory usage calculation and Block decoder
|
||||
|
|
2016
src/liblzma/common/stream_decoder_mt.c
Normal file
2016
src/liblzma/common/stream_decoder_mt.c
Normal file
File diff suppressed because it is too large
Load diff
|
@ -219,8 +219,7 @@ stream_encoder_end(void *coder_ptr, const lzma_allocator *allocator)
|
|||
lzma_next_end(&coder->index_encoder, allocator);
|
||||
lzma_index_end(coder->index, allocator);
|
||||
|
||||
for (size_t i = 0; coder->filters[i].id != LZMA_VLI_UNKNOWN; ++i)
|
||||
lzma_free(coder->filters[i].options, allocator);
|
||||
lzma_filters_free(coder->filters, allocator);
|
||||
|
||||
lzma_free(coder, allocator);
|
||||
return;
|
||||
|
@ -271,22 +270,15 @@ stream_encoder_update(void *coder_ptr, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
// Free the options of the old chain.
|
||||
for (size_t i = 0; coder->filters[i].id != LZMA_VLI_UNKNOWN; ++i)
|
||||
lzma_free(coder->filters[i].options, allocator);
|
||||
lzma_filters_free(coder->filters, allocator);
|
||||
|
||||
// Copy the new filter chain in place.
|
||||
size_t j = 0;
|
||||
do {
|
||||
coder->filters[j].id = temp[j].id;
|
||||
coder->filters[j].options = temp[j].options;
|
||||
} while (temp[j++].id != LZMA_VLI_UNKNOWN);
|
||||
memcpy(coder->filters, temp, sizeof(temp));
|
||||
|
||||
return LZMA_OK;
|
||||
|
||||
error:
|
||||
for (size_t i = 0; temp[i].id != LZMA_VLI_UNKNOWN; ++i)
|
||||
lzma_free(temp[i].options, allocator);
|
||||
|
||||
lzma_filters_free(temp, allocator);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
|
|
@ -85,6 +85,11 @@ struct worker_thread_s {
|
|||
/// Compression options for this Block
|
||||
lzma_block block_options;
|
||||
|
||||
/// Filter chain for this thread. By copying the filters array
|
||||
/// to each thread it is possible to change the filter chain
|
||||
/// between Blocks using lzma_filters_update().
|
||||
lzma_filter filters[LZMA_FILTERS_MAX + 1];
|
||||
|
||||
/// Next structure in the stack of free worker threads.
|
||||
worker_thread *next;
|
||||
|
||||
|
@ -109,9 +114,22 @@ struct lzma_stream_coder_s {
|
|||
/// LZMA_FULL_FLUSH or LZMA_FULL_BARRIER is used earlier.
|
||||
size_t block_size;
|
||||
|
||||
/// The filter chain currently in use
|
||||
/// The filter chain to use for the next Block.
|
||||
/// This can be updated using lzma_filters_update()
|
||||
/// after LZMA_FULL_BARRIER or LZMA_FULL_FLUSH.
|
||||
lzma_filter filters[LZMA_FILTERS_MAX + 1];
|
||||
|
||||
/// A copy of filters[] will be put here when attempting to get
|
||||
/// a new worker thread. This will be copied to a worker thread
|
||||
/// when a thread becomes free and then this cache is marked as
|
||||
/// empty by setting [0].id = LZMA_VLI_UNKNOWN. Without this cache
|
||||
/// the filter options from filters[] would get uselessly copied
|
||||
/// multiple times (allocated and freed) when waiting for a new free
|
||||
/// worker thread.
|
||||
///
|
||||
/// This is freed if filters[] is updated via lzma_filters_update().
|
||||
lzma_filter filters_cache[LZMA_FILTERS_MAX + 1];
|
||||
|
||||
|
||||
/// Index to hold sizes of the Blocks
|
||||
lzma_index *index;
|
||||
|
@ -133,6 +151,9 @@ struct lzma_stream_coder_s {
|
|||
/// Output buffer queue for compressed data
|
||||
lzma_outq outq;
|
||||
|
||||
/// How much memory to allocate for each lzma_outbuf.buf
|
||||
size_t outbuf_alloc_size;
|
||||
|
||||
|
||||
/// Maximum wait time if cannot use all the input and cannot
|
||||
/// fill the output buffer. This is in milliseconds.
|
||||
|
@ -196,7 +217,7 @@ worker_error(worker_thread *thr, lzma_ret ret)
|
|||
|
||||
|
||||
static worker_state
|
||||
worker_encode(worker_thread *thr, worker_state state)
|
||||
worker_encode(worker_thread *thr, size_t *out_pos, worker_state state)
|
||||
{
|
||||
assert(thr->progress_in == 0);
|
||||
assert(thr->progress_out == 0);
|
||||
|
@ -205,12 +226,9 @@ worker_encode(worker_thread *thr, worker_state state)
|
|||
thr->block_options = (lzma_block){
|
||||
.version = 0,
|
||||
.check = thr->coder->stream_flags.check,
|
||||
.compressed_size = thr->coder->outq.buf_size_max,
|
||||
.compressed_size = thr->outbuf->allocated,
|
||||
.uncompressed_size = thr->coder->block_size,
|
||||
|
||||
// TODO: To allow changing the filter chain, the filters
|
||||
// array must be copied to each worker_thread.
|
||||
.filters = thr->coder->filters,
|
||||
.filters = thr->filters,
|
||||
};
|
||||
|
||||
// Calculate maximum size of the Block Header. This amount is
|
||||
|
@ -234,12 +252,12 @@ worker_encode(worker_thread *thr, worker_state state)
|
|||
size_t in_pos = 0;
|
||||
size_t in_size = 0;
|
||||
|
||||
thr->outbuf->size = thr->block_options.header_size;
|
||||
const size_t out_size = thr->coder->outq.buf_size_max;
|
||||
*out_pos = thr->block_options.header_size;
|
||||
const size_t out_size = thr->outbuf->allocated;
|
||||
|
||||
do {
|
||||
mythread_sync(thr->mutex) {
|
||||
// Store in_pos and out_pos into *thr so that
|
||||
// Store in_pos and *out_pos into *thr so that
|
||||
// an application may read them via
|
||||
// lzma_get_progress() to get progress information.
|
||||
//
|
||||
|
@ -247,7 +265,7 @@ worker_encode(worker_thread *thr, worker_state state)
|
|||
// finishes. Instead, the final values are taken
|
||||
// later from thr->outbuf.
|
||||
thr->progress_in = in_pos;
|
||||
thr->progress_out = thr->outbuf->size;
|
||||
thr->progress_out = *out_pos;
|
||||
|
||||
while (in_size == thr->in_size
|
||||
&& thr->state == THR_RUN)
|
||||
|
@ -277,8 +295,8 @@ worker_encode(worker_thread *thr, worker_state state)
|
|||
ret = thr->block_encoder.code(
|
||||
thr->block_encoder.coder, thr->allocator,
|
||||
thr->in, &in_pos, in_limit, thr->outbuf->buf,
|
||||
&thr->outbuf->size, out_size, action);
|
||||
} while (ret == LZMA_OK && thr->outbuf->size < out_size);
|
||||
out_pos, out_size, action);
|
||||
} while (ret == LZMA_OK && *out_pos < out_size);
|
||||
|
||||
switch (ret) {
|
||||
case LZMA_STREAM_END:
|
||||
|
@ -313,10 +331,10 @@ worker_encode(worker_thread *thr, worker_state state)
|
|||
return state;
|
||||
|
||||
// Do the encoding. This takes care of the Block Header too.
|
||||
thr->outbuf->size = 0;
|
||||
*out_pos = 0;
|
||||
ret = lzma_block_uncomp_encode(&thr->block_options,
|
||||
thr->in, in_size, thr->outbuf->buf,
|
||||
&thr->outbuf->size, out_size);
|
||||
out_pos, out_size);
|
||||
|
||||
// It shouldn't fail.
|
||||
if (ret != LZMA_OK) {
|
||||
|
@ -367,11 +385,13 @@ worker_start(void *thr_ptr)
|
|||
}
|
||||
}
|
||||
|
||||
size_t out_pos = 0;
|
||||
|
||||
assert(state != THR_IDLE);
|
||||
assert(state != THR_STOP);
|
||||
|
||||
if (state <= THR_FINISH)
|
||||
state = worker_encode(thr, state);
|
||||
state = worker_encode(thr, &out_pos, state);
|
||||
|
||||
if (state == THR_EXIT)
|
||||
break;
|
||||
|
@ -387,14 +407,17 @@ worker_start(void *thr_ptr)
|
|||
}
|
||||
|
||||
mythread_sync(thr->coder->mutex) {
|
||||
// Mark the output buffer as finished if
|
||||
// no errors occurred.
|
||||
thr->outbuf->finished = state == THR_FINISH;
|
||||
// If no errors occurred, make the encoded data
|
||||
// available to be copied out.
|
||||
if (state == THR_FINISH) {
|
||||
thr->outbuf->pos = out_pos;
|
||||
thr->outbuf->finished = true;
|
||||
}
|
||||
|
||||
// Update the main progress info.
|
||||
thr->coder->progress_in
|
||||
+= thr->outbuf->uncompressed_size;
|
||||
thr->coder->progress_out += thr->outbuf->size;
|
||||
thr->coder->progress_out += out_pos;
|
||||
thr->progress_in = 0;
|
||||
thr->progress_out = 0;
|
||||
|
||||
|
@ -407,6 +430,8 @@ worker_start(void *thr_ptr)
|
|||
}
|
||||
|
||||
// Exiting, free the resources.
|
||||
lzma_filters_free(thr->filters, thr->allocator);
|
||||
|
||||
mythread_mutex_destroy(&thr->mutex);
|
||||
mythread_cond_destroy(&thr->cond);
|
||||
|
||||
|
@ -490,6 +515,7 @@ initialize_new_thread(lzma_stream_coder *coder,
|
|||
thr->progress_in = 0;
|
||||
thr->progress_out = 0;
|
||||
thr->block_encoder = LZMA_NEXT_CODER_INIT;
|
||||
thr->filters[0].id = LZMA_VLI_UNKNOWN;
|
||||
|
||||
if (mythread_create(&thr->thread_id, &worker_start, thr))
|
||||
goto error_thread;
|
||||
|
@ -519,6 +545,18 @@ get_thread(lzma_stream_coder *coder, const lzma_allocator *allocator)
|
|||
if (!lzma_outq_has_buf(&coder->outq))
|
||||
return LZMA_OK;
|
||||
|
||||
// That's also true if we cannot allocate memory for the output
|
||||
// buffer in the output queue.
|
||||
return_if_error(lzma_outq_prealloc_buf(&coder->outq, allocator,
|
||||
coder->outbuf_alloc_size));
|
||||
|
||||
// Make a thread-specific copy of the filter chain. Put it in
|
||||
// the cache array first so that if we cannot get a new thread yet,
|
||||
// the allocation is ready when we try again.
|
||||
if (coder->filters_cache[0].id == LZMA_VLI_UNKNOWN)
|
||||
return_if_error(lzma_filters_copy(
|
||||
coder->filters, coder->filters_cache, allocator));
|
||||
|
||||
// If there is a free structure on the stack, use it.
|
||||
mythread_sync(coder->mutex) {
|
||||
if (coder->threads_free != NULL) {
|
||||
|
@ -541,7 +579,16 @@ get_thread(lzma_stream_coder *coder, const lzma_allocator *allocator)
|
|||
mythread_sync(coder->thr->mutex) {
|
||||
coder->thr->state = THR_RUN;
|
||||
coder->thr->in_size = 0;
|
||||
coder->thr->outbuf = lzma_outq_get_buf(&coder->outq);
|
||||
coder->thr->outbuf = lzma_outq_get_buf(&coder->outq, NULL);
|
||||
|
||||
// Free the old thread-specific filter options and replace
|
||||
// them with the already-allocated new options from
|
||||
// coder->filters_cache[]. Then mark the cache as empty.
|
||||
lzma_filters_free(coder->thr->filters, allocator);
|
||||
memcpy(coder->thr->filters, coder->filters_cache,
|
||||
sizeof(coder->filters_cache));
|
||||
coder->filters_cache[0].id = LZMA_VLI_UNKNOWN;
|
||||
|
||||
mythread_cond_signal(&coder->thr->cond);
|
||||
}
|
||||
|
||||
|
@ -627,9 +674,13 @@ wait_for_work(lzma_stream_coder *coder, mythread_condtime *wait_abs,
|
|||
// to true here and calculate the absolute time when
|
||||
// we must return if there's nothing to do.
|
||||
//
|
||||
// The idea of *has_blocked is to avoid unneeded calls
|
||||
// to mythread_condtime_set(), which may do a syscall
|
||||
// depending on the operating system.
|
||||
// This way if we block multiple times for short moments
|
||||
// less than "timeout" milliseconds, we will return once
|
||||
// "timeout" amount of time has passed since the *first*
|
||||
// blocking occurred. If the absolute time was calculated
|
||||
// again every time we block, "timeout" would effectively
|
||||
// be meaningless if we never consecutively block longer
|
||||
// than "timeout" ms.
|
||||
*has_blocked = true;
|
||||
mythread_condtime_set(wait_abs, &coder->cond, coder->timeout);
|
||||
}
|
||||
|
@ -704,7 +755,7 @@ stream_encode_mt(void *coder_ptr, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
// Try to read compressed data to out[].
|
||||
ret = lzma_outq_read(&coder->outq,
|
||||
ret = lzma_outq_read(&coder->outq, allocator,
|
||||
out, out_pos, out_size,
|
||||
&unpadded_size,
|
||||
&uncompressed_size);
|
||||
|
@ -849,8 +900,8 @@ stream_encoder_mt_end(void *coder_ptr, const lzma_allocator *allocator)
|
|||
threads_end(coder, allocator);
|
||||
lzma_outq_end(&coder->outq, allocator);
|
||||
|
||||
for (size_t i = 0; coder->filters[i].id != LZMA_VLI_UNKNOWN; ++i)
|
||||
lzma_free(coder->filters[i].options, allocator);
|
||||
lzma_filters_free(coder->filters, allocator);
|
||||
lzma_filters_free(coder->filters_cache, allocator);
|
||||
|
||||
lzma_next_end(&coder->index_encoder, allocator);
|
||||
lzma_index_end(coder->index, allocator);
|
||||
|
@ -863,6 +914,45 @@ stream_encoder_mt_end(void *coder_ptr, const lzma_allocator *allocator)
|
|||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
stream_encoder_mt_update(void *coder_ptr, const lzma_allocator *allocator,
|
||||
const lzma_filter *filters,
|
||||
const lzma_filter *reversed_filters
|
||||
lzma_attribute((__unused__)))
|
||||
{
|
||||
lzma_stream_coder *coder = coder_ptr;
|
||||
|
||||
// Applications shouldn't attempt to change the options when
|
||||
// we are already encoding the Index or Stream Footer.
|
||||
if (coder->sequence > SEQ_BLOCK)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
// For now the threaded encoder doesn't support changing
|
||||
// the options in the middle of a Block.
|
||||
if (coder->thr != NULL)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
// Check if the filter chain seems mostly valid. See the comment
|
||||
// in stream_encoder_mt_init().
|
||||
if (lzma_raw_encoder_memusage(filters) == UINT64_MAX)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
// Make a copy to a temporary buffer first. This way the encoder
|
||||
// state stays unchanged if an error occurs in lzma_filters_copy().
|
||||
lzma_filter temp[LZMA_FILTERS_MAX + 1];
|
||||
return_if_error(lzma_filters_copy(filters, temp, allocator));
|
||||
|
||||
// Free the options of the old chain as well as the cache.
|
||||
lzma_filters_free(coder->filters, allocator);
|
||||
lzma_filters_free(coder->filters_cache, allocator);
|
||||
|
||||
// Copy the new filter chain in place.
|
||||
memcpy(coder->filters, temp, sizeof(temp));
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
/// Options handling for lzma_stream_encoder_mt_init() and
|
||||
/// lzma_stream_encoder_mt_memusage()
|
||||
static lzma_ret
|
||||
|
@ -954,14 +1044,16 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
&block_size, &outbuf_size_max));
|
||||
|
||||
#if SIZE_MAX < UINT64_MAX
|
||||
if (block_size > SIZE_MAX)
|
||||
if (block_size > SIZE_MAX || outbuf_size_max > SIZE_MAX)
|
||||
return LZMA_MEM_ERROR;
|
||||
#endif
|
||||
|
||||
// Validate the filter chain so that we can give an error in this
|
||||
// function instead of delaying it to the first call to lzma_code().
|
||||
// The memory usage calculation verifies the filter chain as
|
||||
// a side effect so we take advantage of that.
|
||||
// a side effect so we take advantage of that. It's not a perfect
|
||||
// check though as raw encoder allows LZMA1 too but such problems
|
||||
// will be caught eventually with Block Header encoder.
|
||||
if (lzma_raw_encoder_memusage(filters) == UINT64_MAX)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
|
@ -1001,9 +1093,10 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
next->code = &stream_encode_mt;
|
||||
next->end = &stream_encoder_mt_end;
|
||||
next->get_progress = &get_progress;
|
||||
// next->update = &stream_encoder_mt_update;
|
||||
next->update = &stream_encoder_mt_update;
|
||||
|
||||
coder->filters[0].id = LZMA_VLI_UNKNOWN;
|
||||
coder->filters_cache[0].id = LZMA_VLI_UNKNOWN;
|
||||
coder->index_encoder = LZMA_NEXT_CODER_INIT;
|
||||
coder->index = NULL;
|
||||
memzero(&coder->outq, sizeof(coder->outq));
|
||||
|
@ -1015,6 +1108,7 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
// Basic initializations
|
||||
coder->sequence = SEQ_STREAM_HEADER;
|
||||
coder->block_size = (size_t)(block_size);
|
||||
coder->outbuf_alloc_size = (size_t)(outbuf_size_max);
|
||||
coder->thread_error = LZMA_OK;
|
||||
coder->thr = NULL;
|
||||
|
||||
|
@ -1044,19 +1138,16 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
|
||||
// Output queue
|
||||
return_if_error(lzma_outq_init(&coder->outq, allocator,
|
||||
outbuf_size_max, options->threads));
|
||||
options->threads));
|
||||
|
||||
// Timeout
|
||||
coder->timeout = options->timeout;
|
||||
|
||||
// Free the old filter chain and copy the new one.
|
||||
for (size_t i = 0; coder->filters[i].id != LZMA_VLI_UNKNOWN; ++i)
|
||||
lzma_free(coder->filters[i].options, allocator);
|
||||
|
||||
// Mark it as empty so that it is in a safe state in case
|
||||
// lzma_filters_copy() fails.
|
||||
coder->filters[0].id = LZMA_VLI_UNKNOWN;
|
||||
// Free the old filter chain and the cache.
|
||||
lzma_filters_free(coder->filters, allocator);
|
||||
lzma_filters_free(coder->filters_cache, allocator);
|
||||
|
||||
// Copy the new filter chain.
|
||||
return_if_error(lzma_filters_copy(
|
||||
filters, coder->filters, allocator));
|
||||
|
||||
|
|
|
@ -39,8 +39,11 @@ lzma_stream_header_decode(lzma_stream_flags *options, const uint8_t *in)
|
|||
const uint32_t crc = lzma_crc32(in + sizeof(lzma_header_magic),
|
||||
LZMA_STREAM_FLAGS_SIZE, 0);
|
||||
if (crc != read32le(in + sizeof(lzma_header_magic)
|
||||
+ LZMA_STREAM_FLAGS_SIZE))
|
||||
+ LZMA_STREAM_FLAGS_SIZE)) {
|
||||
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
|
||||
return LZMA_DATA_ERROR;
|
||||
#endif
|
||||
}
|
||||
|
||||
// Stream Flags
|
||||
if (stream_flags_decode(options, in + sizeof(lzma_header_magic)))
|
||||
|
@ -67,8 +70,11 @@ lzma_stream_footer_decode(lzma_stream_flags *options, const uint8_t *in)
|
|||
// CRC32
|
||||
const uint32_t crc = lzma_crc32(in + sizeof(uint32_t),
|
||||
sizeof(uint32_t) + LZMA_STREAM_FLAGS_SIZE, 0);
|
||||
if (crc != read32le(in))
|
||||
if (crc != read32le(in)) {
|
||||
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
|
||||
return LZMA_DATA_ERROR;
|
||||
#endif
|
||||
}
|
||||
|
||||
// Stream Flags
|
||||
if (stream_flags_decode(options, in + sizeof(uint32_t) * 2))
|
||||
|
|
1317
src/liblzma/common/string_conversion.c
Normal file
1317
src/liblzma/common/string_conversion.c
Normal file
File diff suppressed because it is too large
Load diff
|
@ -106,3 +106,16 @@ global:
|
|||
lzma_stream_encoder_mt;
|
||||
lzma_stream_encoder_mt_memusage;
|
||||
} XZ_5.0;
|
||||
|
||||
XZ_5.4 {
|
||||
global:
|
||||
lzma_file_info_decoder;
|
||||
lzma_filters_free;
|
||||
lzma_lzip_decoder;
|
||||
lzma_microlzma_decoder;
|
||||
lzma_microlzma_encoder;
|
||||
lzma_stream_decoder_mt;
|
||||
lzma_str_from_filters;
|
||||
lzma_str_list_filters;
|
||||
lzma_str_to_filters;
|
||||
} XZ_5.2;
|
||||
|
|
|
@ -121,3 +121,16 @@ global:
|
|||
lzma_stream_encoder_mt;
|
||||
lzma_stream_encoder_mt_memusage;
|
||||
} XZ_5.1.2alpha;
|
||||
|
||||
XZ_5.4 {
|
||||
global:
|
||||
lzma_file_info_decoder;
|
||||
lzma_filters_free;
|
||||
lzma_lzip_decoder;
|
||||
lzma_microlzma_decoder;
|
||||
lzma_microlzma_encoder;
|
||||
lzma_stream_decoder_mt;
|
||||
lzma_str_from_filters;
|
||||
lzma_str_list_filters;
|
||||
lzma_str_to_filters;
|
||||
} XZ_5.2;
|
||||
|
|
|
@ -212,7 +212,8 @@ extern lzma_ret
|
|||
lzma_lz_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters,
|
||||
lzma_ret (*lz_init)(lzma_lz_decoder *lz,
|
||||
const lzma_allocator *allocator, const void *options,
|
||||
const lzma_allocator *allocator,
|
||||
lzma_vli id, const void *options,
|
||||
lzma_lz_options *lz_options))
|
||||
{
|
||||
// Allocate the base structure if it isn't already allocated.
|
||||
|
@ -236,7 +237,7 @@ lzma_lz_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
// us the dictionary size.
|
||||
lzma_lz_options lz_options;
|
||||
return_if_error(lz_init(&coder->lz, allocator,
|
||||
filters[0].options, &lz_options));
|
||||
filters[0].id, filters[0].options, &lz_options));
|
||||
|
||||
// If the dictionary size is very small, increase it to 4096 bytes.
|
||||
// This is to prevent constant wrapping of the dictionary, which
|
||||
|
@ -301,17 +302,3 @@ lzma_lz_decoder_memusage(size_t dictionary_size)
|
|||
{
|
||||
return sizeof(lzma_coder) + (uint64_t)(dictionary_size);
|
||||
}
|
||||
|
||||
|
||||
extern void
|
||||
lzma_lz_decoder_uncompressed(void *coder_ptr, lzma_vli uncompressed_size,
|
||||
bool allow_eopm)
|
||||
{
|
||||
lzma_coder *coder = coder_ptr;
|
||||
|
||||
if (uncompressed_size == LZMA_VLI_UNKNOWN)
|
||||
allow_eopm = true;
|
||||
|
||||
coder->lz.set_uncompressed(coder->lz.coder, uncompressed_size,
|
||||
allow_eopm);
|
||||
}
|
||||
|
|
|
@ -87,14 +87,12 @@ extern lzma_ret lzma_lz_decoder_init(lzma_next_coder *next,
|
|||
const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters,
|
||||
lzma_ret (*lz_init)(lzma_lz_decoder *lz,
|
||||
const lzma_allocator *allocator, const void *options,
|
||||
const lzma_allocator *allocator,
|
||||
lzma_vli id, const void *options,
|
||||
lzma_lz_options *lz_options));
|
||||
|
||||
extern uint64_t lzma_lz_decoder_memusage(size_t dictionary_size);
|
||||
|
||||
extern void lzma_lz_decoder_uncompressed(
|
||||
void *coder, lzma_vli uncompressed_size, bool allow_eopm);
|
||||
|
||||
|
||||
//////////////////////
|
||||
// Inline functions //
|
||||
|
|
|
@ -293,11 +293,15 @@ lz_encoder_prepare(lzma_mf *mf, const lzma_allocator *allocator,
|
|||
return true;
|
||||
}
|
||||
|
||||
// Calculate the sizes of mf->hash and mf->son and check that
|
||||
// nice_len is big enough for the selected match finder.
|
||||
const uint32_t hash_bytes = lz_options->match_finder & 0x0F;
|
||||
if (hash_bytes > mf->nice_len)
|
||||
return true;
|
||||
// Calculate the sizes of mf->hash and mf->son.
|
||||
//
|
||||
// NOTE: Since 5.3.5beta the LZMA encoder ensures that nice_len
|
||||
// is big enough for the selected match finder. This makes it
|
||||
// easier for applications as nice_len = 2 will always be accepted
|
||||
// even though the effective value can be slightly bigger.
|
||||
const uint32_t hash_bytes
|
||||
= mf_get_hash_bytes(lz_options->match_finder);
|
||||
assert(hash_bytes <= mf->nice_len);
|
||||
|
||||
const bool is_bt = (lz_options->match_finder & 0x10) != 0;
|
||||
uint32_t hs;
|
||||
|
@ -521,14 +525,30 @@ lz_encoder_update(void *coder_ptr, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
lz_encoder_set_out_limit(void *coder_ptr, uint64_t *uncomp_size,
|
||||
uint64_t out_limit)
|
||||
{
|
||||
lzma_coder *coder = coder_ptr;
|
||||
|
||||
// This is supported only if there are no other filters chained.
|
||||
if (coder->next.code == NULL && coder->lz.set_out_limit != NULL)
|
||||
return coder->lz.set_out_limit(
|
||||
coder->lz.coder, uncomp_size, out_limit);
|
||||
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
}
|
||||
|
||||
|
||||
extern lzma_ret
|
||||
lzma_lz_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters,
|
||||
lzma_ret (*lz_init)(lzma_lz_encoder *lz,
|
||||
const lzma_allocator *allocator, const void *options,
|
||||
const lzma_allocator *allocator,
|
||||
lzma_vli id, const void *options,
|
||||
lzma_lz_options *lz_options))
|
||||
{
|
||||
#ifdef HAVE_SMALL
|
||||
#if defined(HAVE_SMALL) && !defined(HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR)
|
||||
// We need that the CRC32 table has been initialized.
|
||||
lzma_crc32_init();
|
||||
#endif
|
||||
|
@ -544,6 +564,7 @@ lzma_lz_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
next->code = &lz_encode;
|
||||
next->end = &lz_encoder_end;
|
||||
next->update = &lz_encoder_update;
|
||||
next->set_out_limit = &lz_encoder_set_out_limit;
|
||||
|
||||
coder->lz.coder = NULL;
|
||||
coder->lz.code = NULL;
|
||||
|
@ -565,7 +586,7 @@ lzma_lz_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
// Initialize the LZ-based encoder.
|
||||
lzma_lz_options lz_options;
|
||||
return_if_error(lz_init(&coder->lz, allocator,
|
||||
filters[0].options, &lz_options));
|
||||
filters[0].id, filters[0].options, &lz_options));
|
||||
|
||||
// Setup the size information into coder->mf and deallocate
|
||||
// old buffers if they have wrong size.
|
||||
|
|
|
@ -204,6 +204,10 @@ typedef struct {
|
|||
/// Update the options in the middle of the encoding.
|
||||
lzma_ret (*options_update)(void *coder, const lzma_filter *filter);
|
||||
|
||||
/// Set maximum allowed output size
|
||||
lzma_ret (*set_out_limit)(void *coder, uint64_t *uncomp_size,
|
||||
uint64_t out_limit);
|
||||
|
||||
} lzma_lz_encoder;
|
||||
|
||||
|
||||
|
@ -216,6 +220,15 @@ typedef struct {
|
|||
// are called `read ahead'.
|
||||
|
||||
|
||||
/// Get how many bytes the match finder hashes in its initial step.
|
||||
/// This is also the minimum nice_len value with the match finder.
|
||||
static inline uint32_t
|
||||
mf_get_hash_bytes(lzma_match_finder match_finder)
|
||||
{
|
||||
return (uint32_t)match_finder & 0x0F;
|
||||
}
|
||||
|
||||
|
||||
/// Get pointer to the first byte not ran through the match finder
|
||||
static inline const uint8_t *
|
||||
mf_ptr(const lzma_mf *mf)
|
||||
|
@ -298,7 +311,8 @@ extern lzma_ret lzma_lz_encoder_init(
|
|||
lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters,
|
||||
lzma_ret (*lz_init)(lzma_lz_encoder *lz,
|
||||
const lzma_allocator *allocator, const void *options,
|
||||
const lzma_allocator *allocator,
|
||||
lzma_vli id, const void *options,
|
||||
lzma_lz_options *lz_options));
|
||||
|
||||
|
||||
|
|
|
@ -226,7 +226,8 @@ lzma2_decoder_end(void *coder_ptr, const lzma_allocator *allocator)
|
|||
|
||||
static lzma_ret
|
||||
lzma2_decoder_init(lzma_lz_decoder *lz, const lzma_allocator *allocator,
|
||||
const void *opt, lzma_lz_options *lz_options)
|
||||
lzma_vli id lzma_attribute((__unused__)), const void *opt,
|
||||
lzma_lz_options *lz_options)
|
||||
{
|
||||
lzma_lzma2_coder *coder = lz->coder;
|
||||
if (coder == NULL) {
|
||||
|
|
|
@ -310,7 +310,8 @@ lzma2_encoder_options_update(void *coder_ptr, const lzma_filter *filter)
|
|||
|
||||
static lzma_ret
|
||||
lzma2_encoder_init(lzma_lz_encoder *lz, const lzma_allocator *allocator,
|
||||
const void *options, lzma_lz_options *lz_options)
|
||||
lzma_vli id lzma_attribute((__unused__)), const void *options,
|
||||
lzma_lz_options *lz_options)
|
||||
{
|
||||
if (options == NULL)
|
||||
return LZMA_PROG_ERROR;
|
||||
|
@ -340,7 +341,7 @@ lzma2_encoder_init(lzma_lz_encoder *lz, const lzma_allocator *allocator,
|
|||
|
||||
// Initialize LZMA encoder
|
||||
return_if_error(lzma_lzma_encoder_create(&coder->lzma, allocator,
|
||||
&coder->opt_cur, lz_options));
|
||||
LZMA_FILTER_LZMA2, &coder->opt_cur, lz_options));
|
||||
|
||||
// Make sure that we will always have enough history available in
|
||||
// case we need to use uncompressed chunks. They are used when the
|
||||
|
|
|
@ -986,7 +986,7 @@ lzma_decoder_reset(void *coder_ptr, const void *opt)
|
|||
|
||||
extern lzma_ret
|
||||
lzma_lzma_decoder_create(lzma_lz_decoder *lz, const lzma_allocator *allocator,
|
||||
const void *opt, lzma_lz_options *lz_options)
|
||||
const lzma_options_lzma *options, lzma_lz_options *lz_options)
|
||||
{
|
||||
if (lz->coder == NULL) {
|
||||
lz->coder = lzma_alloc(sizeof(lzma_lzma1_decoder), allocator);
|
||||
|
@ -1000,7 +1000,6 @@ lzma_lzma_decoder_create(lzma_lz_decoder *lz, const lzma_allocator *allocator,
|
|||
|
||||
// All dictionary sizes are OK here. LZ decoder will take care of
|
||||
// the special cases.
|
||||
const lzma_options_lzma *options = opt;
|
||||
lz_options->dict_size = options->dict_size;
|
||||
lz_options->preset_dict = options->preset_dict;
|
||||
lz_options->preset_dict_size = options->preset_dict_size;
|
||||
|
@ -1014,16 +1013,40 @@ lzma_lzma_decoder_create(lzma_lz_decoder *lz, const lzma_allocator *allocator,
|
|||
/// the LZ initialization).
|
||||
static lzma_ret
|
||||
lzma_decoder_init(lzma_lz_decoder *lz, const lzma_allocator *allocator,
|
||||
const void *options, lzma_lz_options *lz_options)
|
||||
lzma_vli id, const void *options, lzma_lz_options *lz_options)
|
||||
{
|
||||
if (!is_lclppb_valid(options))
|
||||
return LZMA_PROG_ERROR;
|
||||
|
||||
lzma_vli uncomp_size = LZMA_VLI_UNKNOWN;
|
||||
bool allow_eopm = true;
|
||||
|
||||
if (id == LZMA_FILTER_LZMA1EXT) {
|
||||
const lzma_options_lzma *opt = options;
|
||||
|
||||
// Only one flag is supported.
|
||||
if (opt->ext_flags & ~LZMA_LZMA1EXT_ALLOW_EOPM)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
// FIXME? Using lzma_vli instead of uint64_t is weird because
|
||||
// this has nothing to do with .xz headers and variable-length
|
||||
// integer encoding. On the other hand, using LZMA_VLI_UNKNOWN
|
||||
// instead of UINT64_MAX is clearer when unknown size is
|
||||
// meant. A problem with using lzma_vli is that now we
|
||||
// allow > LZMA_VLI_MAX which is fine in this file but
|
||||
// it's still confusing. Note that alone_decoder.c also
|
||||
// allows > LZMA_VLI_MAX when setting uncompressed size.
|
||||
uncomp_size = opt->ext_size_low
|
||||
+ ((uint64_t)(opt->ext_size_high) << 32);
|
||||
allow_eopm = (opt->ext_flags & LZMA_LZMA1EXT_ALLOW_EOPM) != 0
|
||||
|| uncomp_size == LZMA_VLI_UNKNOWN;
|
||||
}
|
||||
|
||||
return_if_error(lzma_lzma_decoder_create(
|
||||
lz, allocator, options, lz_options));
|
||||
|
||||
lzma_decoder_reset(lz->coder, options);
|
||||
lzma_decoder_uncompressed(lz->coder, LZMA_VLI_UNKNOWN, true);
|
||||
lzma_decoder_uncompressed(lz->coder, uncomp_size, allow_eopm);
|
||||
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
|
|
@ -42,7 +42,7 @@ extern bool lzma_lzma_lclppb_decode(
|
|||
/// LZMA2 decoders.
|
||||
extern lzma_ret lzma_lzma_decoder_create(
|
||||
lzma_lz_decoder *lz, const lzma_allocator *allocator,
|
||||
const void *opt, lzma_lz_options *lz_options);
|
||||
const lzma_options_lzma *opt, lzma_lz_options *lz_options);
|
||||
|
||||
/// Gets memory usage without validating lc/lp/pb. This is used by LZMA2
|
||||
/// decoder, because raw LZMA2 decoding doesn't need lc/lp/pb.
|
||||
|
|
|
@ -268,6 +268,7 @@ static bool
|
|||
encode_init(lzma_lzma1_encoder *coder, lzma_mf *mf)
|
||||
{
|
||||
assert(mf_position(mf) == 0);
|
||||
assert(coder->uncomp_size == 0);
|
||||
|
||||
if (mf->read_pos == mf->read_limit) {
|
||||
if (mf->action == LZMA_RUN)
|
||||
|
@ -283,6 +284,7 @@ encode_init(lzma_lzma1_encoder *coder, lzma_mf *mf)
|
|||
mf->read_ahead = 0;
|
||||
rc_bit(&coder->rc, &coder->is_match[0][0], 0);
|
||||
rc_bittree(&coder->rc, coder->literal[0], 8, mf->buffer[0]);
|
||||
++coder->uncomp_size;
|
||||
}
|
||||
|
||||
// Initialization is done (except if empty file).
|
||||
|
@ -317,21 +319,28 @@ lzma_lzma_encode(lzma_lzma1_encoder *restrict coder, lzma_mf *restrict mf,
|
|||
if (!coder->is_initialized && !encode_init(coder, mf))
|
||||
return LZMA_OK;
|
||||
|
||||
// Get the lowest bits of the uncompressed offset from the LZ layer.
|
||||
uint32_t position = mf_position(mf);
|
||||
// Encode pending output bytes from the range encoder.
|
||||
// At the start of the stream, encode_init() encodes one literal.
|
||||
// Later there can be pending output only with LZMA1 because LZMA2
|
||||
// ensures that there is always enough output space. Thus when using
|
||||
// LZMA2, rc_encode() calls in this function will always return false.
|
||||
if (rc_encode(&coder->rc, out, out_pos, out_size)) {
|
||||
// We don't get here with LZMA2.
|
||||
assert(limit == UINT32_MAX);
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
// If the range encoder was flushed in an earlier call to this
|
||||
// function but there wasn't enough output buffer space, those
|
||||
// bytes would have now been encoded by the above rc_encode() call
|
||||
// and the stream has now been finished. This can only happen with
|
||||
// LZMA1 as LZMA2 always provides enough output buffer space.
|
||||
if (coder->is_flushed) {
|
||||
assert(limit == UINT32_MAX);
|
||||
return LZMA_STREAM_END;
|
||||
}
|
||||
|
||||
while (true) {
|
||||
// Encode pending bits, if any. Calling this before encoding
|
||||
// the next symbol is needed only with plain LZMA, since
|
||||
// LZMA2 always provides big enough buffer to flush
|
||||
// everything out from the range encoder. For the same reason,
|
||||
// rc_encode() never returns true when this function is used
|
||||
// as part of LZMA2 encoder.
|
||||
if (rc_encode(&coder->rc, out, out_pos, out_size)) {
|
||||
assert(limit == UINT32_MAX);
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
// With LZMA2 we need to take care that compressed size of
|
||||
// a chunk doesn't get too big.
|
||||
// FIXME? Check if this could be improved.
|
||||
|
@ -365,37 +374,64 @@ lzma_lzma_encode(lzma_lzma1_encoder *restrict coder, lzma_mf *restrict mf,
|
|||
if (coder->fast_mode)
|
||||
lzma_lzma_optimum_fast(coder, mf, &back, &len);
|
||||
else
|
||||
lzma_lzma_optimum_normal(
|
||||
coder, mf, &back, &len, position);
|
||||
lzma_lzma_optimum_normal(coder, mf, &back, &len,
|
||||
(uint32_t)(coder->uncomp_size));
|
||||
|
||||
encode_symbol(coder, mf, back, len, position);
|
||||
encode_symbol(coder, mf, back, len,
|
||||
(uint32_t)(coder->uncomp_size));
|
||||
|
||||
position += len;
|
||||
}
|
||||
// If output size limiting is active (out_limit != 0), check
|
||||
// if encoding this LZMA symbol would make the output size
|
||||
// exceed the specified limit.
|
||||
if (coder->out_limit != 0 && rc_encode_dummy(
|
||||
&coder->rc, coder->out_limit)) {
|
||||
// The most recent LZMA symbol would make the output
|
||||
// too big. Throw it away.
|
||||
rc_forget(&coder->rc);
|
||||
|
||||
if (!coder->is_flushed) {
|
||||
coder->is_flushed = true;
|
||||
// FIXME: Tell the LZ layer to not read more input as
|
||||
// it would be waste of time. This doesn't matter if
|
||||
// output-size-limited encoding is done with a single
|
||||
// call though.
|
||||
|
||||
// We don't support encoding plain LZMA streams without EOPM,
|
||||
// and LZMA2 doesn't use EOPM at LZMA level.
|
||||
if (limit == UINT32_MAX)
|
||||
encode_eopm(coder, position);
|
||||
break;
|
||||
}
|
||||
|
||||
// Flush the remaining bytes from the range encoder.
|
||||
rc_flush(&coder->rc);
|
||||
// This symbol will be encoded so update the uncompressed size.
|
||||
coder->uncomp_size += len;
|
||||
|
||||
// Copy the remaining bytes to the output buffer. If there
|
||||
// isn't enough output space, we will copy out the remaining
|
||||
// bytes on the next call to this function by using
|
||||
// the rc_encode() call in the encoding loop above.
|
||||
// Encode the LZMA symbol.
|
||||
if (rc_encode(&coder->rc, out, out_pos, out_size)) {
|
||||
// Once again, this can only happen with LZMA1.
|
||||
assert(limit == UINT32_MAX);
|
||||
return LZMA_OK;
|
||||
}
|
||||
}
|
||||
|
||||
// Make it ready for the next LZMA2 chunk.
|
||||
coder->is_flushed = false;
|
||||
// Make the uncompressed size available to the application.
|
||||
if (coder->uncomp_size_ptr != NULL)
|
||||
*coder->uncomp_size_ptr = coder->uncomp_size;
|
||||
|
||||
// LZMA2 doesn't use EOPM at LZMA level.
|
||||
//
|
||||
// Plain LZMA streams without EOPM aren't supported except when
|
||||
// output size limiting is enabled.
|
||||
if (coder->use_eopm)
|
||||
encode_eopm(coder, (uint32_t)(coder->uncomp_size));
|
||||
|
||||
// Flush the remaining bytes from the range encoder.
|
||||
rc_flush(&coder->rc);
|
||||
|
||||
// Copy the remaining bytes to the output buffer. If there
|
||||
// isn't enough output space, we will copy out the remaining
|
||||
// bytes on the next call to this function.
|
||||
if (rc_encode(&coder->rc, out, out_pos, out_size)) {
|
||||
// This cannot happen with LZMA2.
|
||||
assert(limit == UINT32_MAX);
|
||||
|
||||
coder->is_flushed = true;
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
return LZMA_STREAM_END;
|
||||
}
|
||||
|
@ -414,6 +450,23 @@ lzma_encode(void *coder, lzma_mf *restrict mf,
|
|||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
lzma_lzma_set_out_limit(
|
||||
void *coder_ptr, uint64_t *uncomp_size, uint64_t out_limit)
|
||||
{
|
||||
// Minimum output size is 5 bytes but that cannot hold any output
|
||||
// so we use 6 bytes.
|
||||
if (out_limit < 6)
|
||||
return LZMA_BUF_ERROR;
|
||||
|
||||
lzma_lzma1_encoder *coder = coder_ptr;
|
||||
coder->out_limit = out_limit;
|
||||
coder->uncomp_size_ptr = uncomp_size;
|
||||
coder->use_eopm = false;
|
||||
return LZMA_OK;
|
||||
}
|
||||
|
||||
|
||||
////////////////////
|
||||
// Initialization //
|
||||
////////////////////
|
||||
|
@ -440,7 +493,8 @@ set_lz_options(lzma_lz_options *lz_options, const lzma_options_lzma *options)
|
|||
lz_options->dict_size = options->dict_size;
|
||||
lz_options->after_size = LOOP_INPUT_MAX;
|
||||
lz_options->match_len_max = MATCH_LEN_MAX;
|
||||
lz_options->nice_len = options->nice_len;
|
||||
lz_options->nice_len = my_max(mf_get_hash_bytes(options->mf),
|
||||
options->nice_len);
|
||||
lz_options->match_finder = options->mf;
|
||||
lz_options->depth = options->depth;
|
||||
lz_options->preset_dict = options->preset_dict;
|
||||
|
@ -546,10 +600,13 @@ lzma_lzma_encoder_reset(lzma_lzma1_encoder *coder,
|
|||
|
||||
|
||||
extern lzma_ret
|
||||
lzma_lzma_encoder_create(void **coder_ptr,
|
||||
const lzma_allocator *allocator,
|
||||
const lzma_options_lzma *options, lzma_lz_options *lz_options)
|
||||
lzma_lzma_encoder_create(void **coder_ptr, const lzma_allocator *allocator,
|
||||
lzma_vli id, const lzma_options_lzma *options,
|
||||
lzma_lz_options *lz_options)
|
||||
{
|
||||
assert(id == LZMA_FILTER_LZMA1 || id == LZMA_FILTER_LZMA1EXT
|
||||
|| id == LZMA_FILTER_LZMA2);
|
||||
|
||||
// Allocate lzma_lzma1_encoder if it wasn't already allocated.
|
||||
if (*coder_ptr == NULL) {
|
||||
*coder_ptr = lzma_alloc(sizeof(lzma_lzma1_encoder), allocator);
|
||||
|
@ -591,10 +648,14 @@ lzma_lzma_encoder_create(void **coder_ptr,
|
|||
coder->dist_table_size = log_size * 2;
|
||||
|
||||
// Length encoders' price table size
|
||||
const uint32_t nice_len = my_max(
|
||||
mf_get_hash_bytes(options->mf),
|
||||
options->nice_len);
|
||||
|
||||
coder->match_len_encoder.table_size
|
||||
= options->nice_len + 1 - MATCH_LEN_MIN;
|
||||
= nice_len + 1 - MATCH_LEN_MIN;
|
||||
coder->rep_len_encoder.table_size
|
||||
= options->nice_len + 1 - MATCH_LEN_MIN;
|
||||
= nice_len + 1 - MATCH_LEN_MIN;
|
||||
break;
|
||||
}
|
||||
|
||||
|
@ -609,6 +670,37 @@ lzma_lzma_encoder_create(void **coder_ptr,
|
|||
coder->is_initialized = options->preset_dict != NULL
|
||||
&& options->preset_dict_size > 0;
|
||||
coder->is_flushed = false;
|
||||
coder->uncomp_size = 0;
|
||||
coder->uncomp_size_ptr = NULL;
|
||||
|
||||
// Output size limitting is disabled by default.
|
||||
coder->out_limit = 0;
|
||||
|
||||
// Determine if end marker is wanted:
|
||||
// - It is never used with LZMA2.
|
||||
// - It is always used with LZMA_FILTER_LZMA1 (unless
|
||||
// lzma_lzma_set_out_limit() is called later).
|
||||
// - LZMA_FILTER_LZMA1EXT has a flag for it in the options.
|
||||
coder->use_eopm = (id == LZMA_FILTER_LZMA1);
|
||||
if (id == LZMA_FILTER_LZMA1EXT) {
|
||||
// Check if unsupported flags are present.
|
||||
if (options->ext_flags & ~LZMA_LZMA1EXT_ALLOW_EOPM)
|
||||
return LZMA_OPTIONS_ERROR;
|
||||
|
||||
coder->use_eopm = (options->ext_flags
|
||||
& LZMA_LZMA1EXT_ALLOW_EOPM) != 0;
|
||||
|
||||
// TODO? As long as there are no filters that change the size
|
||||
// of the data, it is enough to look at lzma_stream.total_in
|
||||
// after encoding has been finished to know the uncompressed
|
||||
// size of the LZMA1 stream. But in the future there could be
|
||||
// filters that change the size of the data and then total_in
|
||||
// doesn't work as the LZMA1 stream size might be different
|
||||
// due to another filter in the chain. The problem is simple
|
||||
// to solve: Add another flag to ext_flags and then set
|
||||
// coder->uncomp_size_ptr to the address stored in
|
||||
// lzma_options_lzma.reserved_ptr2 (or _ptr1).
|
||||
}
|
||||
|
||||
set_lz_options(lz_options, options);
|
||||
|
||||
|
@ -618,11 +710,12 @@ lzma_lzma_encoder_create(void **coder_ptr,
|
|||
|
||||
static lzma_ret
|
||||
lzma_encoder_init(lzma_lz_encoder *lz, const lzma_allocator *allocator,
|
||||
const void *options, lzma_lz_options *lz_options)
|
||||
lzma_vli id, const void *options, lzma_lz_options *lz_options)
|
||||
{
|
||||
lz->code = &lzma_encode;
|
||||
lz->set_out_limit = &lzma_lzma_set_out_limit;
|
||||
return lzma_lzma_encoder_create(
|
||||
&lz->coder, allocator, options, lz_options);
|
||||
&lz->coder, allocator, id, options, lz_options);
|
||||
}
|
||||
|
||||
|
||||
|
|
|
@ -40,7 +40,8 @@ extern bool lzma_lzma_lclppb_encode(
|
|||
/// Initializes raw LZMA encoder; this is used by LZMA2.
|
||||
extern lzma_ret lzma_lzma_encoder_create(
|
||||
void **coder_ptr, const lzma_allocator *allocator,
|
||||
const lzma_options_lzma *options, lzma_lz_options *lz_options);
|
||||
lzma_vli id, const lzma_options_lzma *options,
|
||||
lzma_lz_options *lz_options);
|
||||
|
||||
|
||||
/// Resets an already initialized LZMA encoder; this is used by LZMA2.
|
||||
|
|
|
@ -72,6 +72,18 @@ struct lzma_lzma1_encoder_s {
|
|||
/// Range encoder
|
||||
lzma_range_encoder rc;
|
||||
|
||||
/// Uncompressed size (doesn't include possible preset dictionary)
|
||||
uint64_t uncomp_size;
|
||||
|
||||
/// If non-zero, produce at most this much output.
|
||||
/// Some input may then be missing from the output.
|
||||
uint64_t out_limit;
|
||||
|
||||
/// If the above out_limit is non-zero, *uncomp_size_ptr is set to
|
||||
/// the amount of uncompressed data that we were able to fit
|
||||
/// in the output buffer.
|
||||
uint64_t *uncomp_size_ptr;
|
||||
|
||||
/// State
|
||||
lzma_lzma_state state;
|
||||
|
||||
|
@ -99,6 +111,9 @@ struct lzma_lzma1_encoder_s {
|
|||
/// have been written to the output buffer yet.
|
||||
bool is_flushed;
|
||||
|
||||
/// True if end of payload marker will be written.
|
||||
bool use_eopm;
|
||||
|
||||
uint32_t pos_mask; ///< (1 << pos_bits) - 1
|
||||
uint32_t literal_context_bits;
|
||||
uint32_t literal_pos_mask;
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
|
||||
/// Maximum number of symbols that can be put pending into lzma_range_encoder
|
||||
/// structure between calls to lzma_rc_encode(). For LZMA, 52+5 is enough
|
||||
/// structure between calls to lzma_rc_encode(). For LZMA, 48+5 is enough
|
||||
/// (match with big distance and length followed by range encoder flush).
|
||||
#define RC_SYMBOLS_MAX 58
|
||||
#define RC_SYMBOLS_MAX 53
|
||||
|
||||
|
||||
typedef struct {
|
||||
|
@ -30,6 +30,9 @@ typedef struct {
|
|||
uint32_t range;
|
||||
uint8_t cache;
|
||||
|
||||
/// Number of bytes written out by rc_encode() -> rc_shift_low()
|
||||
uint64_t out_total;
|
||||
|
||||
/// Number of symbols in the tables
|
||||
size_t count;
|
||||
|
||||
|
@ -58,11 +61,21 @@ rc_reset(lzma_range_encoder *rc)
|
|||
rc->cache_size = 1;
|
||||
rc->range = UINT32_MAX;
|
||||
rc->cache = 0;
|
||||
rc->out_total = 0;
|
||||
rc->count = 0;
|
||||
rc->pos = 0;
|
||||
}
|
||||
|
||||
|
||||
static inline void
|
||||
rc_forget(lzma_range_encoder *rc)
|
||||
{
|
||||
// This must not be called when rc_encode() is partially done.
|
||||
assert(rc->pos == 0);
|
||||
rc->count = 0;
|
||||
}
|
||||
|
||||
|
||||
static inline void
|
||||
rc_bit(lzma_range_encoder *rc, probability *prob, uint32_t bit)
|
||||
{
|
||||
|
@ -132,6 +145,7 @@ rc_shift_low(lzma_range_encoder *rc,
|
|||
|
||||
out[*out_pos] = rc->cache + (uint8_t)(rc->low >> 32);
|
||||
++*out_pos;
|
||||
++rc->out_total;
|
||||
rc->cache = 0xFF;
|
||||
|
||||
} while (--rc->cache_size != 0);
|
||||
|
@ -146,6 +160,34 @@ rc_shift_low(lzma_range_encoder *rc,
|
|||
}
|
||||
|
||||
|
||||
// NOTE: The last two arguments are uint64_t instead of size_t because in
|
||||
// the dummy version these refer to the size of the whole range-encoded
|
||||
// output stream, not just to the currently available output buffer space.
|
||||
static inline bool
|
||||
rc_shift_low_dummy(uint64_t *low, uint64_t *cache_size, uint8_t *cache,
|
||||
uint64_t *out_pos, uint64_t out_size)
|
||||
{
|
||||
if ((uint32_t)(*low) < (uint32_t)(0xFF000000)
|
||||
|| (uint32_t)(*low >> 32) != 0) {
|
||||
do {
|
||||
if (*out_pos == out_size)
|
||||
return true;
|
||||
|
||||
++*out_pos;
|
||||
*cache = 0xFF;
|
||||
|
||||
} while (--*cache_size != 0);
|
||||
|
||||
*cache = (*low >> 24) & 0xFF;
|
||||
}
|
||||
|
||||
++*cache_size;
|
||||
*low = (*low & 0x00FFFFFF) << RC_SHIFT_BITS;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
static inline bool
|
||||
rc_encode(lzma_range_encoder *rc,
|
||||
uint8_t *out, size_t *out_pos, size_t out_size)
|
||||
|
@ -222,6 +264,83 @@ rc_encode(lzma_range_encoder *rc,
|
|||
}
|
||||
|
||||
|
||||
static inline bool
|
||||
rc_encode_dummy(const lzma_range_encoder *rc, uint64_t out_limit)
|
||||
{
|
||||
assert(rc->count <= RC_SYMBOLS_MAX);
|
||||
|
||||
uint64_t low = rc->low;
|
||||
uint64_t cache_size = rc->cache_size;
|
||||
uint32_t range = rc->range;
|
||||
uint8_t cache = rc->cache;
|
||||
uint64_t out_pos = rc->out_total;
|
||||
|
||||
size_t pos = rc->pos;
|
||||
|
||||
while (true) {
|
||||
// Normalize
|
||||
if (range < RC_TOP_VALUE) {
|
||||
if (rc_shift_low_dummy(&low, &cache_size, &cache,
|
||||
&out_pos, out_limit))
|
||||
return true;
|
||||
|
||||
range <<= RC_SHIFT_BITS;
|
||||
}
|
||||
|
||||
// This check is here because the normalization above must
|
||||
// be done before flushing the last bytes.
|
||||
if (pos == rc->count)
|
||||
break;
|
||||
|
||||
// Encode a bit
|
||||
switch (rc->symbols[pos]) {
|
||||
case RC_BIT_0: {
|
||||
probability prob = *rc->probs[pos];
|
||||
range = (range >> RC_BIT_MODEL_TOTAL_BITS)
|
||||
* prob;
|
||||
break;
|
||||
}
|
||||
|
||||
case RC_BIT_1: {
|
||||
probability prob = *rc->probs[pos];
|
||||
const uint32_t bound = prob * (range
|
||||
>> RC_BIT_MODEL_TOTAL_BITS);
|
||||
low += bound;
|
||||
range -= bound;
|
||||
break;
|
||||
}
|
||||
|
||||
case RC_DIRECT_0:
|
||||
range >>= 1;
|
||||
break;
|
||||
|
||||
case RC_DIRECT_1:
|
||||
range >>= 1;
|
||||
low += range;
|
||||
break;
|
||||
|
||||
case RC_FLUSH:
|
||||
default:
|
||||
assert(0);
|
||||
break;
|
||||
}
|
||||
|
||||
++pos;
|
||||
}
|
||||
|
||||
// Flush the last bytes. This isn't in rc->symbols[] so we do
|
||||
// it after the above loop to take into account the size of
|
||||
// the flushing that will be done at the end of the stream.
|
||||
for (pos = 0; pos < 5; ++pos) {
|
||||
if (rc_shift_low_dummy(&low, &cache_size,
|
||||
&cache, &out_pos, out_limit))
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
static inline uint64_t
|
||||
rc_pending(const lzma_range_encoder *rc)
|
||||
{
|
||||
|
|
|
@ -53,6 +53,7 @@ arm_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
|
||||
#ifdef HAVE_ENCODER_ARM
|
||||
extern lzma_ret
|
||||
lzma_simple_arm_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -60,8 +61,10 @@ lzma_simple_arm_encoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return arm_coder_init(next, allocator, filters, true);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef HAVE_DECODER_ARM
|
||||
extern lzma_ret
|
||||
lzma_simple_arm_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -69,3 +72,4 @@ lzma_simple_arm_decoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return arm_coder_init(next, allocator, filters, false);
|
||||
}
|
||||
#endif
|
||||
|
|
136
src/liblzma/simple/arm64.c
Normal file
136
src/liblzma/simple/arm64.c
Normal file
|
@ -0,0 +1,136 @@
|
|||
///////////////////////////////////////////////////////////////////////////////
|
||||
//
|
||||
/// \file arm64.c
|
||||
/// \brief Filter for ARM64 binaries
|
||||
///
|
||||
/// This converts ARM64 relative addresses in the BL and ADRP immediates
|
||||
/// to absolute values to increase redundancy of ARM64 code.
|
||||
///
|
||||
/// Converting B or ADR instructions was also tested but it's not useful.
|
||||
/// A majority of the jumps for the B instruction are very small (+/- 0xFF).
|
||||
/// These are typical for loops and if-statements. Encoding them to their
|
||||
/// absolute address reduces redundancy since many of the small relative
|
||||
/// jump values are repeated, but very few of the absolute addresses are.
|
||||
//
|
||||
// Authors: Lasse Collin
|
||||
// Jia Tan
|
||||
//
|
||||
// This file has been put into the public domain.
|
||||
// You can do whatever you want with this file.
|
||||
//
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
#include "simple_private.h"
|
||||
|
||||
|
||||
static size_t
|
||||
arm64_code(void *simple lzma_attribute((__unused__)),
|
||||
uint32_t now_pos, bool is_encoder,
|
||||
uint8_t *buffer, size_t size)
|
||||
{
|
||||
size_t i;
|
||||
|
||||
// Clang 14.0.6 on x86-64 makes this four times bigger and 40 % slower
|
||||
// with auto-vectorization that is enabled by default with -O2.
|
||||
// Such vectorization bloat happens with -O2 when targeting ARM64 too
|
||||
// but performance hasn't been tested.
|
||||
#ifdef __clang__
|
||||
# pragma clang loop vectorize(disable)
|
||||
#endif
|
||||
for (i = 0; i + 4 <= size; i += 4) {
|
||||
uint32_t pc = (uint32_t)(now_pos + i);
|
||||
uint32_t instr = read32le(buffer + i);
|
||||
|
||||
if ((instr >> 26) == 0x25) {
|
||||
// BL instruction:
|
||||
// The full 26-bit immediate is converted.
|
||||
// The range is +/-128 MiB.
|
||||
//
|
||||
// Using the full range is helps quite a lot with
|
||||
// big executables. Smaller range would reduce false
|
||||
// positives in non-code sections of the input though
|
||||
// so this is a compromise that slightly favors big
|
||||
// files. With the full range only six bits of the 32
|
||||
// need to match to trigger a conversion.
|
||||
const uint32_t src = instr;
|
||||
instr = 0x94000000;
|
||||
|
||||
pc >>= 2;
|
||||
if (!is_encoder)
|
||||
pc = 0U - pc;
|
||||
|
||||
instr |= (src + pc) & 0x03FFFFFF;
|
||||
write32le(buffer + i, instr);
|
||||
|
||||
} else if ((instr & 0x9F000000) == 0x90000000) {
|
||||
// ADRP instruction:
|
||||
// Only values in the range +/-512 MiB are converted.
|
||||
//
|
||||
// Using less than the full +/-4 GiB range reduces
|
||||
// false positives on non-code sections of the input
|
||||
// while being excellent for executables up to 512 MiB.
|
||||
// The positive effect of ADRP conversion is smaller
|
||||
// than that of BL but it also doesn't hurt so much in
|
||||
// non-code sections of input because, with +/-512 MiB
|
||||
// range, nine bits of 32 need to match to trigger a
|
||||
// conversion (two 10-bit match choices = 9 bits).
|
||||
const uint32_t src = ((instr >> 29) & 3)
|
||||
| ((instr >> 3) & 0x001FFFFC);
|
||||
|
||||
// With the addition only one branch is needed to
|
||||
// check the +/- range. This is usually false when
|
||||
// processing ARM64 code so branch prediction will
|
||||
// handle it well in terms of performance.
|
||||
//
|
||||
//if ((src & 0x001E0000) != 0
|
||||
// && (src & 0x001E0000) != 0x001E0000)
|
||||
if ((src + 0x00020000) & 0x001C0000)
|
||||
continue;
|
||||
|
||||
instr &= 0x9000001F;
|
||||
|
||||
pc >>= 12;
|
||||
if (!is_encoder)
|
||||
pc = 0U - pc;
|
||||
|
||||
const uint32_t dest = src + pc;
|
||||
instr |= (dest & 3) << 29;
|
||||
instr |= (dest & 0x0003FFFC) << 3;
|
||||
instr |= (0U - (dest & 0x00020000)) & 0x00E00000;
|
||||
write32le(buffer + i, instr);
|
||||
}
|
||||
}
|
||||
|
||||
return i;
|
||||
}
|
||||
|
||||
|
||||
static lzma_ret
|
||||
arm64_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters, bool is_encoder)
|
||||
{
|
||||
return lzma_simple_coder_init(next, allocator, filters,
|
||||
&arm64_code, 0, 4, 4, is_encoder);
|
||||
}
|
||||
|
||||
|
||||
#ifdef HAVE_ENCODER_ARM64
|
||||
extern lzma_ret
|
||||
lzma_simple_arm64_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters)
|
||||
{
|
||||
return arm64_coder_init(next, allocator, filters, true);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef HAVE_DECODER_ARM64
|
||||
extern lzma_ret
|
||||
lzma_simple_arm64_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters)
|
||||
{
|
||||
return arm64_coder_init(next, allocator, filters, false);
|
||||
}
|
||||
#endif
|
|
@ -58,6 +58,7 @@ armthumb_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
|
||||
#ifdef HAVE_ENCODER_ARMTHUMB
|
||||
extern lzma_ret
|
||||
lzma_simple_armthumb_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -65,8 +66,10 @@ lzma_simple_armthumb_encoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return armthumb_coder_init(next, allocator, filters, true);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef HAVE_DECODER_ARMTHUMB
|
||||
extern lzma_ret
|
||||
lzma_simple_armthumb_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -74,3 +77,4 @@ lzma_simple_armthumb_decoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return armthumb_coder_init(next, allocator, filters, false);
|
||||
}
|
||||
#endif
|
||||
|
|
|
@ -94,6 +94,7 @@ ia64_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
|
||||
#ifdef HAVE_ENCODER_IA64
|
||||
extern lzma_ret
|
||||
lzma_simple_ia64_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -101,8 +102,10 @@ lzma_simple_ia64_encoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return ia64_coder_init(next, allocator, filters, true);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef HAVE_DECODER_IA64
|
||||
extern lzma_ret
|
||||
lzma_simple_ia64_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -110,3 +113,4 @@ lzma_simple_ia64_decoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return ia64_coder_init(next, allocator, filters, false);
|
||||
}
|
||||
#endif
|
||||
|
|
|
@ -58,6 +58,7 @@ powerpc_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
|
||||
#ifdef HAVE_ENCODER_POWERPC
|
||||
extern lzma_ret
|
||||
lzma_simple_powerpc_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -65,8 +66,10 @@ lzma_simple_powerpc_encoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return powerpc_coder_init(next, allocator, filters, true);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef HAVE_DECODER_POWERPC
|
||||
extern lzma_ret
|
||||
lzma_simple_powerpc_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -74,3 +77,4 @@ lzma_simple_powerpc_decoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return powerpc_coder_init(next, allocator, filters, false);
|
||||
}
|
||||
#endif
|
||||
|
|
|
@ -61,6 +61,15 @@ extern lzma_ret lzma_simple_armthumb_decoder_init(lzma_next_coder *next,
|
|||
const lzma_filter_info *filters);
|
||||
|
||||
|
||||
extern lzma_ret lzma_simple_arm64_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters);
|
||||
|
||||
extern lzma_ret lzma_simple_arm64_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters);
|
||||
|
||||
|
||||
extern lzma_ret lzma_simple_sparc_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
const lzma_filter_info *filters);
|
||||
|
|
|
@ -65,6 +65,7 @@ sparc_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
|
||||
#ifdef HAVE_ENCODER_SPARC
|
||||
extern lzma_ret
|
||||
lzma_simple_sparc_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -72,8 +73,10 @@ lzma_simple_sparc_encoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return sparc_coder_init(next, allocator, filters, true);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef HAVE_DECODER_SPARC
|
||||
extern lzma_ret
|
||||
lzma_simple_sparc_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -81,3 +84,4 @@ lzma_simple_sparc_decoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return sparc_coder_init(next, allocator, filters, false);
|
||||
}
|
||||
#endif
|
||||
|
|
|
@ -141,6 +141,7 @@ x86_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
|
|||
}
|
||||
|
||||
|
||||
#ifdef HAVE_ENCODER_X86
|
||||
extern lzma_ret
|
||||
lzma_simple_x86_encoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -148,8 +149,10 @@ lzma_simple_x86_encoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return x86_coder_init(next, allocator, filters, true);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef HAVE_DECODER_X86
|
||||
extern lzma_ret
|
||||
lzma_simple_x86_decoder_init(lzma_next_coder *next,
|
||||
const lzma_allocator *allocator,
|
||||
|
@ -157,3 +160,4 @@ lzma_simple_x86_decoder_init(lzma_next_coder *next,
|
|||
{
|
||||
return x86_coder_init(next, allocator, filters, false);
|
||||
}
|
||||
#endif
|
||||
|
|
|
@ -29,19 +29,29 @@ bool opt_ignore_check = false;
|
|||
const char stdin_filename[] = "(stdin)";
|
||||
|
||||
|
||||
/// Parse and set the memory usage limit for compression and/or decompression.
|
||||
/// Parse and set the memory usage limit for compression, decompression,
|
||||
/// and/or multithreaded decompression.
|
||||
static void
|
||||
parse_memlimit(const char *name, const char *name_percentage, char *str,
|
||||
bool set_compress, bool set_decompress)
|
||||
parse_memlimit(const char *name, const char *name_percentage, const char *str,
|
||||
bool set_compress, bool set_decompress, bool set_mtdec)
|
||||
{
|
||||
bool is_percentage = false;
|
||||
uint64_t value;
|
||||
|
||||
const size_t len = strlen(str);
|
||||
if (len > 0 && str[len - 1] == '%') {
|
||||
str[len - 1] = '\0';
|
||||
// Make a copy so that we can get rid of %.
|
||||
//
|
||||
// In the past str wasn't const and we modified it directly
|
||||
// but that modified argv[] and thus affected what was visible
|
||||
// in "ps auxf" or similar tools which was confusing. For
|
||||
// example, --memlimit=50% would show up as --memlimit=50
|
||||
// since the percent sign was overwritten here.
|
||||
char *s = xstrdup(str);
|
||||
s[len - 1] = '\0';
|
||||
is_percentage = true;
|
||||
value = str_to_uint64(name_percentage, str, 1, 100);
|
||||
value = str_to_uint64(name_percentage, s, 1, 100);
|
||||
free(s);
|
||||
} else {
|
||||
// On 32-bit systems, SIZE_MAX would make more sense than
|
||||
// UINT64_MAX. But use UINT64_MAX still so that scripts
|
||||
|
@ -49,15 +59,19 @@ parse_memlimit(const char *name, const char *name_percentage, char *str,
|
|||
value = str_to_uint64(name, str, 0, UINT64_MAX);
|
||||
}
|
||||
|
||||
hardware_memlimit_set(
|
||||
value, set_compress, set_decompress, is_percentage);
|
||||
hardware_memlimit_set(value, set_compress, set_decompress, set_mtdec,
|
||||
is_percentage);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
parse_block_list(char *str)
|
||||
parse_block_list(const char *str_const)
|
||||
{
|
||||
// We need a modifiable string in the for-loop.
|
||||
char *str_start = xstrdup(str_const);
|
||||
char *str = str_start;
|
||||
|
||||
// It must be non-empty and not begin with a comma.
|
||||
if (str[0] == '\0' || str[0] == ',')
|
||||
message_fatal(_("%s: Invalid argument to --block-list"), str);
|
||||
|
@ -112,6 +126,8 @@ parse_block_list(char *str)
|
|||
|
||||
// Terminate the array.
|
||||
opt_block_list[count] = 0;
|
||||
|
||||
free(str_start);
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -125,6 +141,7 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
OPT_IA64,
|
||||
OPT_ARM,
|
||||
OPT_ARMTHUMB,
|
||||
OPT_ARM64,
|
||||
OPT_SPARC,
|
||||
OPT_DELTA,
|
||||
OPT_LZMA1,
|
||||
|
@ -138,6 +155,7 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
OPT_BLOCK_LIST,
|
||||
OPT_MEM_COMPRESS,
|
||||
OPT_MEM_DECOMPRESS,
|
||||
OPT_MEM_MT_DECOMPRESS,
|
||||
OPT_NO_ADJUST,
|
||||
OPT_INFO_MEMORY,
|
||||
OPT_ROBOT,
|
||||
|
@ -176,6 +194,7 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
{ "block-list", required_argument, NULL, OPT_BLOCK_LIST },
|
||||
{ "memlimit-compress", required_argument, NULL, OPT_MEM_COMPRESS },
|
||||
{ "memlimit-decompress", required_argument, NULL, OPT_MEM_DECOMPRESS },
|
||||
{ "memlimit-mt-decompress", required_argument, NULL, OPT_MEM_MT_DECOMPRESS },
|
||||
{ "memlimit", required_argument, NULL, 'M' },
|
||||
{ "memory", required_argument, NULL, 'M' }, // Old alias
|
||||
{ "no-adjust", no_argument, NULL, OPT_NO_ADJUST },
|
||||
|
@ -194,6 +213,7 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
{ "ia64", optional_argument, NULL, OPT_IA64 },
|
||||
{ "arm", optional_argument, NULL, OPT_ARM },
|
||||
{ "armthumb", optional_argument, NULL, OPT_ARMTHUMB },
|
||||
{ "arm64", optional_argument, NULL, OPT_ARM64 },
|
||||
{ "sparc", optional_argument, NULL, OPT_SPARC },
|
||||
{ "delta", optional_argument, NULL, OPT_DELTA },
|
||||
|
||||
|
@ -225,20 +245,27 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
case OPT_MEM_COMPRESS:
|
||||
parse_memlimit("memlimit-compress",
|
||||
"memlimit-compress%", optarg,
|
||||
true, false);
|
||||
true, false, false);
|
||||
break;
|
||||
|
||||
// --memlimit-decompress
|
||||
case OPT_MEM_DECOMPRESS:
|
||||
parse_memlimit("memlimit-decompress",
|
||||
"memlimit-decompress%", optarg,
|
||||
false, true);
|
||||
false, true, false);
|
||||
break;
|
||||
|
||||
// --memlimit-mt-decompress
|
||||
case OPT_MEM_MT_DECOMPRESS:
|
||||
parse_memlimit("memlimit-mt-decompress",
|
||||
"memlimit-mt-decompress%", optarg,
|
||||
false, false, true);
|
||||
break;
|
||||
|
||||
// --memlimit
|
||||
case 'M':
|
||||
parse_memlimit("memlimit", "memlimit%", optarg,
|
||||
true, true);
|
||||
true, true, true);
|
||||
break;
|
||||
|
||||
// --suffix
|
||||
|
@ -246,11 +273,23 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
suffix_set(optarg);
|
||||
break;
|
||||
|
||||
case 'T':
|
||||
case 'T': {
|
||||
// Since xz 5.4.0: Ignore leading '+' first.
|
||||
const char *s = optarg;
|
||||
if (optarg[0] == '+')
|
||||
++s;
|
||||
|
||||
// The max is from src/liblzma/common/common.h.
|
||||
hardware_threads_set(str_to_uint64("threads",
|
||||
optarg, 0, 16384));
|
||||
uint32_t t = str_to_uint64("threads", s, 0, 16384);
|
||||
|
||||
// If leading '+' was used then use multi-threaded
|
||||
// mode even if exactly one thread was specified.
|
||||
if (t == 1 && optarg[0] == '+')
|
||||
t = UINT32_MAX;
|
||||
|
||||
hardware_threads_set(t);
|
||||
break;
|
||||
}
|
||||
|
||||
// --version
|
||||
case 'V':
|
||||
|
@ -360,6 +399,11 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
options_bcj(optarg));
|
||||
break;
|
||||
|
||||
case OPT_ARM64:
|
||||
coder_add_filter(LZMA_FILTER_ARM64,
|
||||
options_bcj(optarg));
|
||||
break;
|
||||
|
||||
case OPT_SPARC:
|
||||
coder_add_filter(LZMA_FILTER_SPARC,
|
||||
options_bcj(optarg));
|
||||
|
@ -395,8 +439,9 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
{ "xz", FORMAT_XZ },
|
||||
{ "lzma", FORMAT_LZMA },
|
||||
{ "alone", FORMAT_LZMA },
|
||||
// { "gzip", FORMAT_GZIP },
|
||||
// { "gz", FORMAT_GZIP },
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
{ "lzip", FORMAT_LZIP },
|
||||
#endif
|
||||
{ "raw", FORMAT_RAW },
|
||||
};
|
||||
|
||||
|
@ -475,7 +520,7 @@ parse_real(args_info *args, int argc, char **argv)
|
|||
"or `--files0'."));
|
||||
|
||||
if (optarg == NULL) {
|
||||
args->files_name = (char *)stdin_filename;
|
||||
args->files_name = stdin_filename;
|
||||
args->files_file = stdin;
|
||||
} else {
|
||||
args->files_name = optarg;
|
||||
|
@ -651,6 +696,12 @@ args_parse(args_info *args, int argc, char **argv)
|
|||
"at build time"));
|
||||
#endif
|
||||
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
if (opt_mode == MODE_COMPRESS && opt_format == FORMAT_LZIP)
|
||||
message_fatal(_("Compression of lzip files (.lz) "
|
||||
"is not supported"));
|
||||
#endif
|
||||
|
||||
// Never remove the source file when the destination is not on disk.
|
||||
// In test mode the data is written nowhere, but setting opt_stdout
|
||||
// will make the rest of the code behave well.
|
||||
|
|
|
@ -19,7 +19,7 @@ typedef struct {
|
|||
|
||||
/// Name of the file from which to read filenames. This is NULL
|
||||
/// if --files or --files0 was not used.
|
||||
char *files_name;
|
||||
const char *files_name;
|
||||
|
||||
/// File opened for reading from which filenames are read. This is
|
||||
/// non-NULL only if files_name is non-NULL.
|
||||
|
|
199
src/xz/coder.c
199
src/xz/coder.c
|
@ -51,7 +51,12 @@ static lzma_check check;
|
|||
/// This becomes false if the --check=CHECK option is used.
|
||||
static bool check_default = true;
|
||||
|
||||
#if defined(HAVE_ENCODERS) && defined(MYTHREAD_ENABLED)
|
||||
/// Indicates if unconsumed input is allowed to remain after
|
||||
/// decoding has successfully finished. This is set for each file
|
||||
/// in coder_init().
|
||||
static bool allow_trailing_input;
|
||||
|
||||
#ifdef MYTHREAD_ENABLED
|
||||
static lzma_mt mt_options = {
|
||||
.flags = 0,
|
||||
.timeout = 300,
|
||||
|
@ -136,6 +141,11 @@ memlimit_too_small(uint64_t memory_usage)
|
|||
extern void
|
||||
coder_set_compression_settings(void)
|
||||
{
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
// .lz compression isn't supported.
|
||||
assert(opt_format != FORMAT_LZIP);
|
||||
#endif
|
||||
|
||||
// The default check type is CRC64, but fallback to CRC32
|
||||
// if CRC64 isn't supported by the copy of liblzma we are
|
||||
// using. CRC32 is always supported.
|
||||
|
@ -211,7 +221,7 @@ coder_set_compression_settings(void)
|
|||
}
|
||||
}
|
||||
|
||||
if (hardware_threads_get() > 1) {
|
||||
if (hardware_threads_is_mt()) {
|
||||
message(V_WARNING, _("Switching to single-threaded "
|
||||
"mode due to --flush-timeout"));
|
||||
hardware_threads_set(1);
|
||||
|
@ -220,12 +230,16 @@ coder_set_compression_settings(void)
|
|||
|
||||
// Get the memory usage. Note that if --format=raw was used,
|
||||
// we can be decompressing.
|
||||
const uint64_t memory_limit = hardware_memlimit_get(opt_mode);
|
||||
//
|
||||
// If multithreaded .xz compression is done, this value will be
|
||||
// replaced.
|
||||
uint64_t memory_limit = hardware_memlimit_get(opt_mode);
|
||||
uint64_t memory_usage = UINT64_MAX;
|
||||
if (opt_mode == MODE_COMPRESS) {
|
||||
#ifdef HAVE_ENCODERS
|
||||
# ifdef MYTHREAD_ENABLED
|
||||
if (opt_format == FORMAT_XZ && hardware_threads_get() > 1) {
|
||||
if (opt_format == FORMAT_XZ && hardware_threads_is_mt()) {
|
||||
memory_limit = hardware_memlimit_mtenc_get();
|
||||
mt_options.threads = hardware_threads_get();
|
||||
mt_options.block_size = opt_block_size;
|
||||
mt_options.check = check;
|
||||
|
@ -269,47 +283,90 @@ coder_set_compression_settings(void)
|
|||
if (memory_usage <= memory_limit)
|
||||
return;
|
||||
|
||||
// If --no-adjust was used or we didn't find LZMA1 or
|
||||
// LZMA2 as the last filter, give an error immediately.
|
||||
// --format=raw implies --no-adjust.
|
||||
if (!opt_auto_adjust || opt_format == FORMAT_RAW)
|
||||
// With --format=raw settings are never adjusted to meet
|
||||
// the memory usage limit.
|
||||
if (opt_format == FORMAT_RAW)
|
||||
memlimit_too_small(memory_usage);
|
||||
|
||||
assert(opt_mode == MODE_COMPRESS);
|
||||
|
||||
#ifdef HAVE_ENCODERS
|
||||
# ifdef MYTHREAD_ENABLED
|
||||
if (opt_format == FORMAT_XZ && mt_options.threads > 1) {
|
||||
if (opt_format == FORMAT_XZ && hardware_threads_is_mt()) {
|
||||
// Try to reduce the number of threads before
|
||||
// adjusting the compression settings down.
|
||||
do {
|
||||
// FIXME? The real single-threaded mode has
|
||||
// lower memory usage, but it's not comparable
|
||||
// because it doesn't write the size info
|
||||
// into Block Headers.
|
||||
if (--mt_options.threads == 0)
|
||||
memlimit_too_small(memory_usage);
|
||||
|
||||
while (mt_options.threads > 1) {
|
||||
// Reduce the number of threads by one and check
|
||||
// the memory usage.
|
||||
--mt_options.threads;
|
||||
memory_usage = lzma_stream_encoder_mt_memusage(
|
||||
&mt_options);
|
||||
if (memory_usage == UINT64_MAX)
|
||||
message_bug();
|
||||
|
||||
} while (memory_usage > memory_limit);
|
||||
if (memory_usage <= memory_limit) {
|
||||
// The memory usage is now low enough.
|
||||
message(V_WARNING, _("Reduced the number of "
|
||||
"threads from %s to %s to not exceed "
|
||||
"the memory usage limit of %s MiB"),
|
||||
uint64_to_str(
|
||||
hardware_threads_get(), 0),
|
||||
uint64_to_str(mt_options.threads, 1),
|
||||
uint64_to_str(round_up_to_mib(
|
||||
memory_limit), 2));
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
message(V_WARNING, _("Adjusted the number of threads "
|
||||
"from %s to %s to not exceed "
|
||||
"the memory usage limit of %s MiB"),
|
||||
uint64_to_str(hardware_threads_get(), 0),
|
||||
uint64_to_str(mt_options.threads, 1),
|
||||
uint64_to_str(round_up_to_mib(
|
||||
memory_limit), 2));
|
||||
// If the memory usage limit is only a soft limit (automatic
|
||||
// number of threads and no --memlimit-compress), the limit
|
||||
// is only used to reduce the number of threads and once at
|
||||
// just one thread, the limit is completely ignored. This
|
||||
// way -T0 won't use insane amount of memory but at the same
|
||||
// time the soft limit will never make xz fail and never make
|
||||
// xz change settings that would affect the compressed output.
|
||||
if (hardware_memlimit_mtenc_is_default()) {
|
||||
message(V_WARNING, _("Reduced the number of threads "
|
||||
"from %s to one. The automatic memory usage "
|
||||
"limit of %s MiB is still being exceeded. "
|
||||
"%s MiB of memory is required. "
|
||||
"Continuing anyway."),
|
||||
uint64_to_str(hardware_threads_get(), 0),
|
||||
uint64_to_str(
|
||||
round_up_to_mib(memory_limit), 1),
|
||||
uint64_to_str(
|
||||
round_up_to_mib(memory_usage), 2));
|
||||
return;
|
||||
}
|
||||
|
||||
// If --no-adjust was used, we cannot drop to single-threaded
|
||||
// mode since it produces different compressed output.
|
||||
//
|
||||
// NOTE: In xz 5.2.x, --no-adjust also prevented reducing
|
||||
// the number of threads. This changed in 5.3.3alpha.
|
||||
if (!opt_auto_adjust)
|
||||
memlimit_too_small(memory_usage);
|
||||
|
||||
// Switch to single-threaded mode. It uses
|
||||
// less memory than using one thread in
|
||||
// the multithreaded mode but the output
|
||||
// is also different.
|
||||
hardware_threads_set(1);
|
||||
memory_usage = lzma_raw_encoder_memusage(filters);
|
||||
message(V_WARNING, _("Switching to single-threaded mode "
|
||||
"to not exceed the memory usage limit of %s MiB"),
|
||||
uint64_to_str(round_up_to_mib(memory_limit), 0));
|
||||
}
|
||||
# endif
|
||||
|
||||
if (memory_usage <= memory_limit)
|
||||
return;
|
||||
|
||||
// Don't adjust LZMA2 or LZMA1 dictionary size if --no-adjust
|
||||
// was specified as that would change the compressed output.
|
||||
if (!opt_auto_adjust)
|
||||
memlimit_too_small(memory_usage);
|
||||
|
||||
// Look for the last filter if it is LZMA2 or LZMA1, so we can make
|
||||
// it use less RAM. With other filters we don't know what to do.
|
||||
size_t i = 0;
|
||||
|
@ -423,6 +480,18 @@ is_format_lzma(void)
|
|||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
/// Return true if the data in in_buf seems to be in the .lz format.
|
||||
static bool
|
||||
is_format_lzip(void)
|
||||
{
|
||||
static const uint8_t magic[4] = { 0x4C, 0x5A, 0x49, 0x50 };
|
||||
return strm.avail_in >= sizeof(magic)
|
||||
&& memcmp(in_buf.u8, magic, sizeof(magic)) == 0;
|
||||
}
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
|
@ -436,6 +505,12 @@ coder_init(file_pair *pair)
|
|||
{
|
||||
lzma_ret ret = LZMA_PROG_ERROR;
|
||||
|
||||
// In most cases if there is input left when coding finishes,
|
||||
// something has gone wrong. Exceptions are --single-stream
|
||||
// and decoding .lz files which can contain trailing non-.lz data.
|
||||
// These will be handled later in this function.
|
||||
allow_trailing_input = false;
|
||||
|
||||
if (opt_mode == MODE_COMPRESS) {
|
||||
#ifdef HAVE_ENCODERS
|
||||
switch (opt_format) {
|
||||
|
@ -446,7 +521,7 @@ coder_init(file_pair *pair)
|
|||
|
||||
case FORMAT_XZ:
|
||||
# ifdef MYTHREAD_ENABLED
|
||||
if (hardware_threads_get() > 1)
|
||||
if (hardware_threads_is_mt())
|
||||
ret = lzma_stream_encoder_mt(
|
||||
&strm, &mt_options);
|
||||
else
|
||||
|
@ -459,6 +534,14 @@ coder_init(file_pair *pair)
|
|||
ret = lzma_alone_encoder(&strm, filters[0].options);
|
||||
break;
|
||||
|
||||
# ifdef HAVE_LZIP_DECODER
|
||||
case FORMAT_LZIP:
|
||||
// args.c should disallow this.
|
||||
assert(0);
|
||||
ret = LZMA_PROG_ERROR;
|
||||
break;
|
||||
# endif
|
||||
|
||||
case FORMAT_RAW:
|
||||
ret = lzma_raw_encoder(&strm, filters);
|
||||
break;
|
||||
|
@ -475,7 +558,9 @@ coder_init(file_pair *pair)
|
|||
else
|
||||
flags |= LZMA_TELL_UNSUPPORTED_CHECK;
|
||||
|
||||
if (!opt_single_stream)
|
||||
if (opt_single_stream)
|
||||
allow_trailing_input = true;
|
||||
else
|
||||
flags |= LZMA_CONCATENATED;
|
||||
|
||||
// We abuse FORMAT_AUTO to indicate unknown file format,
|
||||
|
@ -484,8 +569,14 @@ coder_init(file_pair *pair)
|
|||
|
||||
switch (opt_format) {
|
||||
case FORMAT_AUTO:
|
||||
// .lz is checked before .lzma since .lzma detection
|
||||
// is more complicated (no magic bytes).
|
||||
if (is_format_xz())
|
||||
init_format = FORMAT_XZ;
|
||||
# ifdef HAVE_LZIP_DECODER
|
||||
else if (is_format_lzip())
|
||||
init_format = FORMAT_LZIP;
|
||||
# endif
|
||||
else if (is_format_lzma())
|
||||
init_format = FORMAT_LZMA;
|
||||
break;
|
||||
|
@ -500,6 +591,13 @@ coder_init(file_pair *pair)
|
|||
init_format = FORMAT_LZMA;
|
||||
break;
|
||||
|
||||
# ifdef HAVE_LZIP_DECODER
|
||||
case FORMAT_LZIP:
|
||||
if (is_format_lzip())
|
||||
init_format = FORMAT_LZIP;
|
||||
break;
|
||||
# endif
|
||||
|
||||
case FORMAT_RAW:
|
||||
init_format = FORMAT_RAW;
|
||||
break;
|
||||
|
@ -524,9 +622,31 @@ coder_init(file_pair *pair)
|
|||
break;
|
||||
|
||||
case FORMAT_XZ:
|
||||
# ifdef MYTHREAD_ENABLED
|
||||
mt_options.flags = flags;
|
||||
|
||||
mt_options.threads = hardware_threads_get();
|
||||
mt_options.memlimit_stop
|
||||
= hardware_memlimit_get(MODE_DECOMPRESS);
|
||||
|
||||
// If single-threaded mode was requested, set the
|
||||
// memlimit for threading to zero. This forces the
|
||||
// decoder to use single-threaded mode which matches
|
||||
// the behavior of lzma_stream_decoder().
|
||||
//
|
||||
// Otherwise use the limit for threaded decompression
|
||||
// which has a sane default (users are still free to
|
||||
// make it insanely high though).
|
||||
mt_options.memlimit_threading
|
||||
= mt_options.threads == 1
|
||||
? 0 : hardware_memlimit_mtdec_get();
|
||||
|
||||
ret = lzma_stream_decoder_mt(&strm, &mt_options);
|
||||
# else
|
||||
ret = lzma_stream_decoder(&strm,
|
||||
hardware_memlimit_get(
|
||||
MODE_DECOMPRESS), flags);
|
||||
# endif
|
||||
break;
|
||||
|
||||
case FORMAT_LZMA:
|
||||
|
@ -535,6 +655,15 @@ coder_init(file_pair *pair)
|
|||
MODE_DECOMPRESS));
|
||||
break;
|
||||
|
||||
# ifdef HAVE_LZIP_DECODER
|
||||
case FORMAT_LZIP:
|
||||
allow_trailing_input = true;
|
||||
ret = lzma_lzip_decoder(&strm,
|
||||
hardware_memlimit_get(
|
||||
MODE_DECOMPRESS), flags);
|
||||
break;
|
||||
# endif
|
||||
|
||||
case FORMAT_RAW:
|
||||
// Memory usage has already been checked in
|
||||
// coder_set_compression_settings().
|
||||
|
@ -598,7 +727,7 @@ split_block(uint64_t *block_remaining,
|
|||
{
|
||||
if (*next_block_remaining > 0) {
|
||||
// The Block at *list_pos has previously been split up.
|
||||
assert(hardware_threads_get() == 1);
|
||||
assert(!hardware_threads_is_mt());
|
||||
assert(opt_block_size > 0);
|
||||
assert(opt_block_list != NULL);
|
||||
|
||||
|
@ -626,7 +755,7 @@ split_block(uint64_t *block_remaining,
|
|||
// If in single-threaded mode, split up the Block if needed.
|
||||
// This is not needed in multi-threaded mode because liblzma
|
||||
// will do this due to how threaded encoding works.
|
||||
if (hardware_threads_get() == 1 && opt_block_size > 0
|
||||
if (!hardware_threads_is_mt() && opt_block_size > 0
|
||||
&& *block_remaining > opt_block_size) {
|
||||
*next_block_remaining
|
||||
= *block_remaining - opt_block_size;
|
||||
|
@ -686,7 +815,7 @@ coder_normal(file_pair *pair)
|
|||
// --block-size doesn't do anything here in threaded mode,
|
||||
// because the threaded encoder will take care of splitting
|
||||
// to fixed-sized Blocks.
|
||||
if (hardware_threads_get() == 1 && opt_block_size > 0)
|
||||
if (!hardware_threads_is_mt() && opt_block_size > 0)
|
||||
block_remaining = opt_block_size;
|
||||
|
||||
// If --block-list was used, start with the first size.
|
||||
|
@ -700,7 +829,7 @@ coder_normal(file_pair *pair)
|
|||
// mode the size info isn't written into Block Headers.
|
||||
if (opt_block_list != NULL) {
|
||||
if (block_remaining < opt_block_list[list_pos]) {
|
||||
assert(hardware_threads_get() == 1);
|
||||
assert(!hardware_threads_is_mt());
|
||||
next_block_remaining = opt_block_list[list_pos]
|
||||
- block_remaining;
|
||||
} else {
|
||||
|
@ -764,7 +893,7 @@ coder_normal(file_pair *pair)
|
|||
} else {
|
||||
// Start a new Block after LZMA_FULL_BARRIER.
|
||||
if (opt_block_list == NULL) {
|
||||
assert(hardware_threads_get() == 1);
|
||||
assert(!hardware_threads_is_mt());
|
||||
assert(opt_block_size > 0);
|
||||
block_remaining = opt_block_size;
|
||||
} else {
|
||||
|
@ -795,7 +924,7 @@ coder_normal(file_pair *pair)
|
|||
}
|
||||
|
||||
if (ret == LZMA_STREAM_END) {
|
||||
if (opt_single_stream) {
|
||||
if (allow_trailing_input) {
|
||||
io_fix_src_pos(pair, strm.avail_in);
|
||||
success = true;
|
||||
break;
|
||||
|
@ -803,7 +932,9 @@ coder_normal(file_pair *pair)
|
|||
|
||||
// Check that there is no trailing garbage.
|
||||
// This is needed for LZMA_Alone and raw
|
||||
// streams.
|
||||
// streams. This is *not* done with .lz files
|
||||
// as that format specifically requires
|
||||
// allowing trailing garbage.
|
||||
if (strm.avail_in == 0 && !pair->src_eof) {
|
||||
// Try reading one more byte.
|
||||
// Hopefully we don't get any more
|
||||
|
|
|
@ -23,7 +23,9 @@ enum format_type {
|
|||
FORMAT_AUTO,
|
||||
FORMAT_XZ,
|
||||
FORMAT_LZMA,
|
||||
// HEADER_GZIP,
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
FORMAT_LZIP,
|
||||
#endif
|
||||
FORMAT_RAW,
|
||||
};
|
||||
|
||||
|
|
|
@ -212,6 +212,17 @@ io_sandbox_enter(int src_fd)
|
|||
if (cap_enter())
|
||||
goto error;
|
||||
|
||||
#elif defined(HAVE_PLEDGE)
|
||||
// pledge() was introduced in OpenBSD 5.9.
|
||||
//
|
||||
// main() unconditionally calls pledge() with fairly relaxed
|
||||
// promises which work in all situations. Here we make the
|
||||
// sandbox more strict.
|
||||
if (pledge("stdio", ""))
|
||||
goto error;
|
||||
|
||||
(void)src_fd;
|
||||
|
||||
#else
|
||||
# error ENABLE_SANDBOX is defined but no sandboxing method was found.
|
||||
#endif
|
||||
|
@ -221,7 +232,7 @@ io_sandbox_enter(int src_fd)
|
|||
return;
|
||||
|
||||
error:
|
||||
message(V_DEBUG, _("Failed to enable the sandbox"));
|
||||
message_fatal(_("Failed to enable the sandbox"));
|
||||
}
|
||||
#endif // ENABLE_SANDBOX
|
||||
|
||||
|
@ -748,8 +759,10 @@ io_open_src_real(file_pair *pair)
|
|||
extern file_pair *
|
||||
io_open_src(const char *src_name)
|
||||
{
|
||||
if (is_empty_filename(src_name))
|
||||
if (src_name[0] == '\0') {
|
||||
message_error(_("Empty filename, skipping"));
|
||||
return NULL;
|
||||
}
|
||||
|
||||
// Since we have only one file open at a time, we can use
|
||||
// a statically allocated structure.
|
||||
|
@ -1195,16 +1208,36 @@ io_read(file_pair *pair, io_buf *buf, size_t size)
|
|||
|
||||
|
||||
extern bool
|
||||
io_pread(file_pair *pair, io_buf *buf, size_t size, off_t pos)
|
||||
io_seek_src(file_pair *pair, uint64_t pos)
|
||||
{
|
||||
// Using lseek() and read() is more portable than pread() and
|
||||
// for us it is as good as real pread().
|
||||
if (lseek(pair->src_fd, pos, SEEK_SET) != pos) {
|
||||
// Caller must not attempt to seek past the end of the input file
|
||||
// (seeking to 100 in a 100-byte file is seeking to the end of
|
||||
// the file, not past the end of the file, and thus that is allowed).
|
||||
//
|
||||
// This also validates that pos can be safely cast to off_t.
|
||||
if (pos > (uint64_t)(pair->src_st.st_size))
|
||||
message_bug();
|
||||
|
||||
if (lseek(pair->src_fd, (off_t)(pos), SEEK_SET) == -1) {
|
||||
message_error(_("%s: Error seeking the file: %s"),
|
||||
pair->src_name, strerror(errno));
|
||||
return true;
|
||||
}
|
||||
|
||||
pair->src_eof = false;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
extern bool
|
||||
io_pread(file_pair *pair, io_buf *buf, size_t size, uint64_t pos)
|
||||
{
|
||||
// Using lseek() and read() is more portable than pread() and
|
||||
// for us it is as good as real pread().
|
||||
if (io_seek_src(pair, pos))
|
||||
return true;
|
||||
|
||||
const size_t amount = io_read(pair, buf, size);
|
||||
if (amount == SIZE_MAX)
|
||||
return true;
|
||||
|
|
|
@ -139,6 +139,19 @@ extern size_t io_read(file_pair *pair, io_buf *buf, size_t size);
|
|||
extern void io_fix_src_pos(file_pair *pair, size_t rewind_size);
|
||||
|
||||
|
||||
/// \brief Seek to the given absolute position in the source file
|
||||
///
|
||||
/// This calls lseek() and also clears pair->src_eof.
|
||||
///
|
||||
/// \param pair Seekable source file
|
||||
/// \param pos Offset relative to the beginning of the file,
|
||||
/// from which the data should be read.
|
||||
///
|
||||
/// \return On success, false is returned. On error, error message
|
||||
/// is printed and true is returned.
|
||||
extern bool io_seek_src(file_pair *pair, uint64_t pos);
|
||||
|
||||
|
||||
/// \brief Read from source file from given offset to a buffer
|
||||
///
|
||||
/// This is remotely similar to standard pread(). This uses lseek() though,
|
||||
|
@ -152,7 +165,7 @@ extern void io_fix_src_pos(file_pair *pair, size_t rewind_size);
|
|||
///
|
||||
/// \return On success, false is returned. On error, error message
|
||||
/// is printed and true is returned.
|
||||
extern bool io_pread(file_pair *pair, io_buf *buf, size_t size, off_t pos);
|
||||
extern bool io_pread(file_pair *pair, io_buf *buf, size_t size, uint64_t pos);
|
||||
|
||||
|
||||
/// \brief Writes a buffer to the destination file
|
||||
|
|
|
@ -17,11 +17,42 @@
|
|||
/// the --threads=NUM command line option.
|
||||
static uint32_t threads_max = 1;
|
||||
|
||||
/// True when the number of threads is automatically determined based
|
||||
/// on the available hardware threads.
|
||||
static bool threads_are_automatic = false;
|
||||
|
||||
/// If true, then try to use multi-threaded mode (if memlimit allows)
|
||||
/// even if only one thread was requested explicitly (-T+1).
|
||||
static bool use_mt_mode_with_one_thread = false;
|
||||
|
||||
/// Memory usage limit for compression
|
||||
static uint64_t memlimit_compress;
|
||||
static uint64_t memlimit_compress = 0;
|
||||
|
||||
/// Memory usage limit for decompression
|
||||
static uint64_t memlimit_decompress;
|
||||
static uint64_t memlimit_decompress = 0;
|
||||
|
||||
/// Default memory usage for multithreaded modes:
|
||||
///
|
||||
/// - Default value for --memlimit-compress when automatic number of threads
|
||||
/// is used. However, if the limit wouldn't allow even one thread then
|
||||
/// the limit is ignored in coder.c and one thread will be used anyway.
|
||||
/// This mess is a compromise: we wish to prevent -T0 from using too
|
||||
/// many threads but we also don't want xz to give an error due to
|
||||
/// a memlimit that the user didn't explicitly set.
|
||||
///
|
||||
/// - Default value for --memlimit-mt-decompress
|
||||
///
|
||||
/// This value is caluclated in hardware_init() and cannot be changed later.
|
||||
static uint64_t memlimit_mt_default;
|
||||
|
||||
/// Memory usage limit for multithreaded decompression. This is a soft limit:
|
||||
/// if reducing the number of threads to one isn't enough to keep memory
|
||||
/// usage below this limit, then one thread is used and this limit is ignored.
|
||||
/// memlimit_decompress is still obeyed.
|
||||
///
|
||||
/// This can be set with --memlimit-mt-decompress. The default value for
|
||||
/// this is memlimit_mt_default.
|
||||
static uint64_t memlimit_mtdec;
|
||||
|
||||
/// Total amount of physical RAM
|
||||
static uint64_t total_ram;
|
||||
|
@ -30,8 +61,17 @@ static uint64_t total_ram;
|
|||
extern void
|
||||
hardware_threads_set(uint32_t n)
|
||||
{
|
||||
// Reset these to false first and set them to true when appropriate.
|
||||
threads_are_automatic = false;
|
||||
use_mt_mode_with_one_thread = false;
|
||||
|
||||
if (n == 0) {
|
||||
// Automatic number of threads was requested.
|
||||
// If there is only one hardware thread, multi-threaded
|
||||
// mode will still be used if memory limit allows.
|
||||
threads_are_automatic = true;
|
||||
use_mt_mode_with_one_thread = true;
|
||||
|
||||
// If threading support was enabled at build time,
|
||||
// use the number of available CPU cores. Otherwise
|
||||
// use one thread since disabling threading support
|
||||
|
@ -43,6 +83,9 @@ hardware_threads_set(uint32_t n)
|
|||
#else
|
||||
threads_max = 1;
|
||||
#endif
|
||||
} else if (n == UINT32_MAX) {
|
||||
use_mt_mode_with_one_thread = true;
|
||||
threads_max = 1;
|
||||
} else {
|
||||
threads_max = n;
|
||||
}
|
||||
|
@ -58,9 +101,21 @@ hardware_threads_get(void)
|
|||
}
|
||||
|
||||
|
||||
extern bool
|
||||
hardware_threads_is_mt(void)
|
||||
{
|
||||
#ifdef MYTHREAD_ENABLED
|
||||
return threads_max > 1 || use_mt_mode_with_one_thread;
|
||||
#else
|
||||
return false;
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
extern void
|
||||
hardware_memlimit_set(uint64_t new_memlimit,
|
||||
bool set_compress, bool set_decompress, bool is_percentage)
|
||||
bool set_compress, bool set_decompress, bool set_mtdec,
|
||||
bool is_percentage)
|
||||
{
|
||||
if (is_percentage) {
|
||||
assert(new_memlimit > 0);
|
||||
|
@ -110,6 +165,9 @@ hardware_memlimit_set(uint64_t new_memlimit,
|
|||
if (set_decompress)
|
||||
memlimit_decompress = new_memlimit;
|
||||
|
||||
if (set_mtdec)
|
||||
memlimit_mtdec = new_memlimit;
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -117,32 +175,69 @@ hardware_memlimit_set(uint64_t new_memlimit,
|
|||
extern uint64_t
|
||||
hardware_memlimit_get(enum operation_mode mode)
|
||||
{
|
||||
// Zero is a special value that indicates the default. Currently
|
||||
// the default simply disables the limit. Once there is threading
|
||||
// support, this might be a little more complex, because there will
|
||||
// probably be a special case where a user asks for "optimal" number
|
||||
// of threads instead of a specific number (this might even become
|
||||
// the default mode). Each thread may use a significant amount of
|
||||
// memory. When there are no memory usage limits set, we need some
|
||||
// default soft limit for calculating the "optimal" number of
|
||||
// threads.
|
||||
// 0 is a special value that indicates the default.
|
||||
// It disables the limit in single-threaded mode.
|
||||
//
|
||||
// NOTE: For multithreaded decompression, this is the hard limit
|
||||
// (memlimit_stop). hardware_memlimit_mtdec_get() gives the
|
||||
// soft limit (memlimit_threaded).
|
||||
const uint64_t memlimit = mode == MODE_COMPRESS
|
||||
? memlimit_compress : memlimit_decompress;
|
||||
return memlimit != 0 ? memlimit : UINT64_MAX;
|
||||
}
|
||||
|
||||
|
||||
extern uint64_t
|
||||
hardware_memlimit_mtenc_get(void)
|
||||
{
|
||||
return hardware_memlimit_mtenc_is_default()
|
||||
? memlimit_mt_default
|
||||
: hardware_memlimit_get(MODE_COMPRESS);
|
||||
}
|
||||
|
||||
|
||||
extern bool
|
||||
hardware_memlimit_mtenc_is_default(void)
|
||||
{
|
||||
return memlimit_compress == 0 && threads_are_automatic;
|
||||
}
|
||||
|
||||
|
||||
extern uint64_t
|
||||
hardware_memlimit_mtdec_get(void)
|
||||
{
|
||||
uint64_t m = memlimit_mtdec != 0
|
||||
? memlimit_mtdec
|
||||
: memlimit_mt_default;
|
||||
|
||||
// Cap the value to memlimit_decompress if it has been specified.
|
||||
// This is nice for --info-memory. It wouldn't be needed for liblzma
|
||||
// since it does this anyway.
|
||||
if (memlimit_decompress != 0 && m > memlimit_decompress)
|
||||
m = memlimit_decompress;
|
||||
|
||||
return m;
|
||||
}
|
||||
|
||||
|
||||
/// Helper for hardware_memlimit_show() to print one human-readable info line.
|
||||
static void
|
||||
memlimit_show(const char *str, uint64_t value)
|
||||
memlimit_show(const char *str, size_t str_columns, uint64_t value)
|
||||
{
|
||||
// Calculate the field width so that str will be padded to take
|
||||
// str_columns on the terminal.
|
||||
//
|
||||
// NOTE: If the string is invalid, this will be -1. Using -1 as
|
||||
// the field width is fine here so it's not handled specially.
|
||||
const int fw = tuklib_mbstr_fw(str, (int)(str_columns));
|
||||
|
||||
// The memory usage limit is considered to be disabled if value
|
||||
// is 0 or UINT64_MAX. This might get a bit more complex once there
|
||||
// is threading support. See the comment in hardware_memlimit_get().
|
||||
if (value == 0 || value == UINT64_MAX)
|
||||
printf("%s %s\n", str, _("Disabled"));
|
||||
printf(" %-*s %s\n", fw, str, _("Disabled"));
|
||||
else
|
||||
printf("%s %s MiB (%s B)\n", str,
|
||||
printf(" %-*s %s MiB (%s B)\n", fw, str,
|
||||
uint64_to_str(round_up_to_mib(value), 0),
|
||||
uint64_to_str(value, 1));
|
||||
|
||||
|
@ -153,18 +248,60 @@ memlimit_show(const char *str, uint64_t value)
|
|||
extern void
|
||||
hardware_memlimit_show(void)
|
||||
{
|
||||
uint32_t cputhreads = 1;
|
||||
#ifdef MYTHREAD_ENABLED
|
||||
cputhreads = lzma_cputhreads();
|
||||
if (cputhreads == 0)
|
||||
cputhreads = 1;
|
||||
#endif
|
||||
|
||||
if (opt_robot) {
|
||||
printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\n", total_ram,
|
||||
memlimit_compress, memlimit_decompress);
|
||||
printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
|
||||
"\t%" PRIu64 "\t%" PRIu32 "\n",
|
||||
total_ram,
|
||||
memlimit_compress,
|
||||
memlimit_decompress,
|
||||
hardware_memlimit_mtdec_get(),
|
||||
memlimit_mt_default,
|
||||
cputhreads);
|
||||
} else {
|
||||
// TRANSLATORS: Test with "xz --info-memory" to see if
|
||||
// the alignment looks nice.
|
||||
memlimit_show(_("Total amount of physical memory (RAM): "),
|
||||
total_ram);
|
||||
memlimit_show(_("Memory usage limit for compression: "),
|
||||
memlimit_compress);
|
||||
memlimit_show(_("Memory usage limit for decompression: "),
|
||||
memlimit_decompress);
|
||||
const char *msgs[] = {
|
||||
_("Amount of physical memory (RAM):"),
|
||||
_("Number of processor threads:"),
|
||||
_("Compression:"),
|
||||
_("Decompression:"),
|
||||
_("Multi-threaded decompression:"),
|
||||
_("Default for -T0:"),
|
||||
};
|
||||
|
||||
size_t width_max = 1;
|
||||
for (unsigned i = 0; i < ARRAY_SIZE(msgs); ++i) {
|
||||
size_t w = tuklib_mbstr_width(msgs[i], NULL);
|
||||
|
||||
// When debugging, catch invalid strings with
|
||||
// an assertion. Otherwise fallback to 1 so
|
||||
// that the columns just won't be aligned.
|
||||
assert(w != (size_t)-1);
|
||||
if (w == (size_t)-1)
|
||||
w = 1;
|
||||
|
||||
if (width_max < w)
|
||||
width_max = w;
|
||||
}
|
||||
|
||||
puts(_("Hardware information:"));
|
||||
memlimit_show(msgs[0], width_max, total_ram);
|
||||
printf(" %-*s %" PRIu32 "\n",
|
||||
tuklib_mbstr_fw(msgs[1], (int)(width_max)),
|
||||
msgs[1], cputhreads);
|
||||
|
||||
putchar('\n');
|
||||
puts(_("Memory usage limits:"));
|
||||
memlimit_show(msgs[2], width_max, memlimit_compress);
|
||||
memlimit_show(msgs[3], width_max, memlimit_decompress);
|
||||
memlimit_show(msgs[4], width_max,
|
||||
hardware_memlimit_mtdec_get());
|
||||
memlimit_show(msgs[5], width_max, memlimit_mt_default);
|
||||
}
|
||||
|
||||
tuklib_exit(E_SUCCESS, E_ERROR, message_verbosity_get() != V_SILENT);
|
||||
|
@ -180,7 +317,22 @@ hardware_init(void)
|
|||
if (total_ram == 0)
|
||||
total_ram = (uint64_t)(ASSUME_RAM) * 1024 * 1024;
|
||||
|
||||
// Set the defaults.
|
||||
hardware_memlimit_set(0, true, true, false);
|
||||
// FIXME? There may be better methods to determine the default value.
|
||||
// One Linux-specific suggestion is to use MemAvailable from
|
||||
// /proc/meminfo as the starting point.
|
||||
memlimit_mt_default = total_ram / 4;
|
||||
|
||||
#if SIZE_MAX == UINT32_MAX
|
||||
// A too high value may cause 32-bit xz to run out of address space.
|
||||
// Use a conservative maximum value here. A few typical address space
|
||||
// sizes with Linux:
|
||||
// - x86-64 with 32-bit xz: 4 GiB
|
||||
// - x86: 3 GiB
|
||||
// - MIPS32: 2 GiB
|
||||
const size_t mem_ceiling = 1400U << 20;
|
||||
if (memlimit_mt_default > mem_ceiling)
|
||||
memlimit_mt_default = mem_ceiling;
|
||||
#endif
|
||||
|
||||
return;
|
||||
}
|
||||
|
|
|
@ -16,22 +16,59 @@ extern void hardware_init(void);
|
|||
|
||||
|
||||
/// Set the maximum number of worker threads.
|
||||
/// A special value of UINT32_MAX sets one thread in multi-threaded mode.
|
||||
extern void hardware_threads_set(uint32_t threadlimit);
|
||||
|
||||
/// Get the maximum number of worker threads.
|
||||
extern uint32_t hardware_threads_get(void);
|
||||
|
||||
/// Returns true if multithreaded mode should be used for .xz compression.
|
||||
/// This can be true even if the number of threads is one.
|
||||
extern bool hardware_threads_is_mt(void);
|
||||
|
||||
/// Set the memory usage limit. There are separate limits for compression
|
||||
/// and decompression (the latter includes also --list), one or both can
|
||||
/// be set with a single call to this function. Zero indicates resetting
|
||||
/// the limit back to the defaults. The limit can also be set as a percentage
|
||||
/// of installed RAM; the percentage must be in the range [1, 100].
|
||||
|
||||
/// Set the memory usage limit. There are separate limits for compression,
|
||||
/// decompression (also includes --list), and multithreaded decompression.
|
||||
/// Any combination of these can be set with a single call to this function.
|
||||
/// Zero indicates resetting the limit back to the defaults.
|
||||
/// The limit can also be set as a percentage of installed RAM; the
|
||||
/// percentage must be in the range [1, 100].
|
||||
extern void hardware_memlimit_set(uint64_t new_memlimit,
|
||||
bool set_compress, bool set_decompress, bool is_percentage);
|
||||
bool set_compress, bool set_decompress, bool set_mtdec,
|
||||
bool is_percentage);
|
||||
|
||||
/// Get the current memory usage limit for compression or decompression.
|
||||
/// This is a hard limit that will not be exceeded. This is obeyed in
|
||||
/// both single-threaded and multithreaded modes.
|
||||
extern uint64_t hardware_memlimit_get(enum operation_mode mode);
|
||||
|
||||
/// This returns a system-specific default value if all of the following
|
||||
/// conditions are true:
|
||||
///
|
||||
/// - An automatic number of threads was requested (--threads=0).
|
||||
///
|
||||
/// - --memlimit-compress wasn't used or it was reset to the default
|
||||
/// value by setting it to 0.
|
||||
///
|
||||
/// Otherwise this is identical to hardware_memlimit_get(MODE_COMPRESS).
|
||||
///
|
||||
/// The idea is to keep automatic thread count reasonable so that too
|
||||
/// high memory usage is avoided and, with 32-bit xz, running out of
|
||||
/// address space is avoided.
|
||||
extern uint64_t hardware_memlimit_mtenc_get(void);
|
||||
|
||||
/// Returns true if the value returned by hardware_memlimit_mtenc_get() is
|
||||
/// a system-specific default value. coder.c uses this to ignore the default
|
||||
/// memlimit in case it's too small even for a single thread in multithreaded
|
||||
/// mode. This way the default limit will never make xz fail or affect the
|
||||
/// compressed output; it will only make xz reduce the number of threads.
|
||||
extern bool hardware_memlimit_mtenc_is_default(void);
|
||||
|
||||
/// Get the current memory usage limit for multithreaded decompression.
|
||||
/// This is only used to reduce the number of threads. This limit can be
|
||||
/// exceeded if the number of threads are reduce to one. Then the value
|
||||
/// from hardware_memlimit_get() will be honored like in single-threaded mode.
|
||||
extern uint64_t hardware_memlimit_mtdec_get(void);
|
||||
|
||||
/// Display the amount of RAM and memory usage limits and exit.
|
||||
extern void hardware_memlimit_show(void) lzma_attribute((__noreturn__));
|
||||
|
|
723
src/xz/list.c
723
src/xz/list.c
|
@ -52,23 +52,126 @@ typedef struct {
|
|||
uint64_t memusage;
|
||||
|
||||
/// The filter chain of this Block in human-readable form
|
||||
char filter_chain[FILTERS_STR_SIZE];
|
||||
char *filter_chain;
|
||||
|
||||
} block_header_info;
|
||||
|
||||
#define BLOCK_HEADER_INFO_INIT { .filter_chain = NULL }
|
||||
#define block_header_info_end(bhi) free((bhi)->filter_chain)
|
||||
|
||||
|
||||
/// Strings ending in a colon. These are used for lines like
|
||||
/// " Foo: 123 MiB". These are grouped because translated strings
|
||||
/// may have different maximum string length, and we want to pad all
|
||||
/// strings so that the values are aligned nicely.
|
||||
static const char *colon_strs[] = {
|
||||
N_("Streams:"),
|
||||
N_("Blocks:"),
|
||||
N_("Compressed size:"),
|
||||
N_("Uncompressed size:"),
|
||||
N_("Ratio:"),
|
||||
N_("Check:"),
|
||||
N_("Stream Padding:"),
|
||||
N_("Memory needed:"),
|
||||
N_("Sizes in headers:"),
|
||||
// This won't be aligned because it's so long:
|
||||
//N_("Minimum XZ Utils version:"),
|
||||
N_("Number of files:"),
|
||||
};
|
||||
|
||||
/// Enum matching the above strings.
|
||||
enum {
|
||||
COLON_STR_STREAMS,
|
||||
COLON_STR_BLOCKS,
|
||||
COLON_STR_COMPRESSED_SIZE,
|
||||
COLON_STR_UNCOMPRESSED_SIZE,
|
||||
COLON_STR_RATIO,
|
||||
COLON_STR_CHECK,
|
||||
COLON_STR_STREAM_PADDING,
|
||||
COLON_STR_MEMORY_NEEDED,
|
||||
COLON_STR_SIZES_IN_HEADERS,
|
||||
//COLON_STR_MINIMUM_XZ_VERSION,
|
||||
COLON_STR_NUMBER_OF_FILES,
|
||||
};
|
||||
|
||||
/// Field widths to use with printf to pad the strings to use the same number
|
||||
/// of columns on a terminal.
|
||||
static int colon_strs_fw[ARRAY_SIZE(colon_strs)];
|
||||
|
||||
/// Convenience macro to get the translated string and its field width
|
||||
/// using a COLON_STR_foo enum.
|
||||
#define COLON_STR(num) colon_strs_fw[num], _(colon_strs[num])
|
||||
|
||||
|
||||
/// Column headings
|
||||
static struct {
|
||||
/// Table column heading string
|
||||
const char *str;
|
||||
|
||||
/// Number of terminal-columns to use for this table-column.
|
||||
/// If a translated string is longer than the initial value,
|
||||
/// this value will be increased in init_headings().
|
||||
int columns;
|
||||
|
||||
/// Field width to use for printf() to pad "str" to use "columns"
|
||||
/// number of columns on a terminal. This is calculated in
|
||||
/// init_headings().
|
||||
int fw;
|
||||
|
||||
} headings[] = {
|
||||
{ N_("Stream"), 6, 0 },
|
||||
{ N_("Block"), 9, 0 },
|
||||
{ N_("Blocks"), 9, 0 },
|
||||
{ N_("CompOffset"), 15, 0 },
|
||||
{ N_("UncompOffset"), 15, 0 },
|
||||
{ N_("CompSize"), 15, 0 },
|
||||
{ N_("UncompSize"), 15, 0 },
|
||||
{ N_("TotalSize"), 15, 0 },
|
||||
{ N_("Ratio"), 5, 0 },
|
||||
{ N_("Check"), 10, 0 },
|
||||
{ N_("CheckVal"), 1, 0 },
|
||||
{ N_("Padding"), 7, 0 },
|
||||
{ N_("Header"), 5, 0 },
|
||||
{ N_("Flags"), 2, 0 },
|
||||
{ N_("MemUsage"), 7 + 4, 0 }, // +4 is for " MiB"
|
||||
{ N_("Filters"), 1, 0 },
|
||||
};
|
||||
|
||||
/// Enum matching the above strings.
|
||||
enum {
|
||||
HEADING_STREAM,
|
||||
HEADING_BLOCK,
|
||||
HEADING_BLOCKS,
|
||||
HEADING_COMPOFFSET,
|
||||
HEADING_UNCOMPOFFSET,
|
||||
HEADING_COMPSIZE,
|
||||
HEADING_UNCOMPSIZE,
|
||||
HEADING_TOTALSIZE,
|
||||
HEADING_RATIO,
|
||||
HEADING_CHECK,
|
||||
HEADING_CHECKVAL,
|
||||
HEADING_PADDING,
|
||||
HEADING_HEADERSIZE,
|
||||
HEADING_HEADERFLAGS,
|
||||
HEADING_MEMUSAGE,
|
||||
HEADING_FILTERS,
|
||||
};
|
||||
|
||||
#define HEADING_STR(num) headings[num].fw, _(headings[num].str)
|
||||
|
||||
|
||||
/// Check ID to string mapping
|
||||
static const char check_names[LZMA_CHECK_ID_MAX + 1][12] = {
|
||||
// TRANSLATORS: Indicates that there is no integrity check.
|
||||
// This string is used in tables, so the width must not
|
||||
// exceed ten columns with a fixed-width font.
|
||||
// This string is used in tables. In older xz version this
|
||||
// string was limited to ten columns in a fixed-width font, but
|
||||
// nowadays there is no strict length restriction anymore.
|
||||
N_("None"),
|
||||
"CRC32",
|
||||
// TRANSLATORS: Indicates that integrity check name is not known,
|
||||
// but the Check ID is known (here 2). This and other "Unknown-N"
|
||||
// strings are used in tables, so the width must not exceed ten
|
||||
// columns with a fixed-width font. It's OK to omit the dash if
|
||||
// you need space for one extra letter, but don't use spaces.
|
||||
// but the Check ID is known (here 2). In older xz version these
|
||||
// strings were limited to ten columns in a fixed-width font, but
|
||||
// nowadays there is no strict length restriction anymore.
|
||||
N_("Unknown-2"),
|
||||
N_("Unknown-3"),
|
||||
"CRC64",
|
||||
|
@ -112,6 +215,104 @@ static struct {
|
|||
} totals = { 0, 0, 0, 0, 0, 0, 0, 0, 50000002, true };
|
||||
|
||||
|
||||
/// Initialize colon_strs_fw[].
|
||||
static void
|
||||
init_colon_strs(void)
|
||||
{
|
||||
// Lengths of translated strings as bytes.
|
||||
size_t lens[ARRAY_SIZE(colon_strs)];
|
||||
|
||||
// Lengths of translated strings as columns.
|
||||
size_t widths[ARRAY_SIZE(colon_strs)];
|
||||
|
||||
// Maximum number of columns needed by a translated string.
|
||||
size_t width_max = 0;
|
||||
|
||||
for (unsigned i = 0; i < ARRAY_SIZE(colon_strs); ++i) {
|
||||
widths[i] = tuklib_mbstr_width(_(colon_strs[i]), &lens[i]);
|
||||
|
||||
// If debugging is enabled, catch invalid strings with
|
||||
// an assertion. However, when not debugging, use the
|
||||
// byte count as the fallback width. This shouldn't
|
||||
// ever happen unless there is a bad string in the
|
||||
// translations, but in such case I guess it's better
|
||||
// to try to print something useful instead of failing
|
||||
// completely.
|
||||
assert(widths[i] != (size_t)-1);
|
||||
if (widths[i] == (size_t)-1)
|
||||
widths[i] = lens[i];
|
||||
|
||||
if (widths[i] > width_max)
|
||||
width_max = widths[i];
|
||||
}
|
||||
|
||||
// Calculate the field width for printf("%*s") so that the strings
|
||||
// will use width_max columns on a terminal.
|
||||
for (unsigned i = 0; i < ARRAY_SIZE(colon_strs); ++i)
|
||||
colon_strs_fw[i] = (int)(lens[i] + width_max - widths[i]);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
/// Initialize headings[].
|
||||
static void
|
||||
init_headings(void)
|
||||
{
|
||||
// Before going through the heading strings themselves, treat
|
||||
// the Check heading specially: Look at the widths of the various
|
||||
// check names and increase the width of the Check column if needed.
|
||||
// The width of the heading name "Check" will then be handled normally
|
||||
// with other heading names in the second loop in this function.
|
||||
for (unsigned i = 0; i < ARRAY_SIZE(check_names); ++i) {
|
||||
size_t len;
|
||||
size_t w = tuklib_mbstr_width(_(check_names[i]), &len);
|
||||
|
||||
// Error handling like in init_colon_strs().
|
||||
assert(w != (size_t)-1);
|
||||
if (w == (size_t)-1)
|
||||
w = len;
|
||||
|
||||
// If the translated string is wider than the minimum width
|
||||
// set at compile time, increase the width.
|
||||
if ((size_t)(headings[HEADING_CHECK].columns) < w)
|
||||
headings[HEADING_CHECK].columns = w;
|
||||
}
|
||||
|
||||
for (unsigned i = 0; i < ARRAY_SIZE(headings); ++i) {
|
||||
size_t len;
|
||||
size_t w = tuklib_mbstr_width(_(headings[i].str), &len);
|
||||
|
||||
// Error handling like in init_colon_strs().
|
||||
assert(w != (size_t)-1);
|
||||
if (w == (size_t)-1)
|
||||
w = len;
|
||||
|
||||
// If the translated string is wider than the minimum width
|
||||
// set at compile time, increase the width.
|
||||
if ((size_t)(headings[i].columns) < w)
|
||||
headings[i].columns = w;
|
||||
|
||||
// Calculate the field width for printf("%*s") so that
|
||||
// the string uses .columns number of columns on a terminal.
|
||||
headings[i].fw = (int)(len + (size_t)headings[i].columns - w);
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
/// Initialize the printf field widths that are needed to get nicely aligned
|
||||
/// output with translated strings.
|
||||
static void
|
||||
init_field_widths(void)
|
||||
{
|
||||
init_colon_strs();
|
||||
init_headings();
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
/// Convert XZ Utils version number to a string.
|
||||
static const char *
|
||||
xz_ver_to_str(uint32_t ver)
|
||||
|
@ -143,9 +344,6 @@ xz_ver_to_str(uint32_t ver)
|
|||
///
|
||||
/// \return On success, false is returned. On error, true is returned.
|
||||
///
|
||||
// TODO: This function is pretty big. liblzma should have a function that
|
||||
// takes a callback function to parse the Index(es) from a .xz file to make
|
||||
// it easy for applications.
|
||||
static bool
|
||||
parse_indexes(xz_file_info *xfi, file_pair *pair)
|
||||
{
|
||||
|
@ -161,238 +359,74 @@ parse_indexes(xz_file_info *xfi, file_pair *pair)
|
|||
}
|
||||
|
||||
io_buf buf;
|
||||
lzma_stream_flags header_flags;
|
||||
lzma_stream_flags footer_flags;
|
||||
lzma_ret ret;
|
||||
|
||||
// lzma_stream for the Index decoder
|
||||
lzma_stream strm = LZMA_STREAM_INIT;
|
||||
lzma_index *idx = NULL;
|
||||
|
||||
// All Indexes decoded so far
|
||||
lzma_index *combined_index = NULL;
|
||||
|
||||
// The Index currently being decoded
|
||||
lzma_index *this_index = NULL;
|
||||
|
||||
// Current position in the file. We parse the file backwards so
|
||||
// initialize it to point to the end of the file.
|
||||
off_t pos = pair->src_st.st_size;
|
||||
|
||||
// Each loop iteration decodes one Index.
|
||||
do {
|
||||
// Check that there is enough data left to contain at least
|
||||
// the Stream Header and Stream Footer. This check cannot
|
||||
// fail in the first pass of this loop.
|
||||
if (pos < 2 * LZMA_STREAM_HEADER_SIZE) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(LZMA_DATA_ERROR));
|
||||
goto error;
|
||||
}
|
||||
|
||||
pos -= LZMA_STREAM_HEADER_SIZE;
|
||||
lzma_vli stream_padding = 0;
|
||||
|
||||
// Locate the Stream Footer. There may be Stream Padding which
|
||||
// we must skip when reading backwards.
|
||||
while (true) {
|
||||
if (pos < LZMA_STREAM_HEADER_SIZE) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(
|
||||
LZMA_DATA_ERROR));
|
||||
goto error;
|
||||
}
|
||||
|
||||
if (io_pread(pair, &buf,
|
||||
LZMA_STREAM_HEADER_SIZE, pos))
|
||||
goto error;
|
||||
|
||||
// Stream Padding is always a multiple of four bytes.
|
||||
int i = 2;
|
||||
if (buf.u32[i] != 0)
|
||||
break;
|
||||
|
||||
// To avoid calling io_pread() for every four bytes
|
||||
// of Stream Padding, take advantage that we read
|
||||
// 12 bytes (LZMA_STREAM_HEADER_SIZE) already and
|
||||
// check them too before calling io_pread() again.
|
||||
do {
|
||||
stream_padding += 4;
|
||||
pos -= 4;
|
||||
--i;
|
||||
} while (i >= 0 && buf.u32[i] == 0);
|
||||
}
|
||||
|
||||
// Decode the Stream Footer.
|
||||
ret = lzma_stream_footer_decode(&footer_flags, buf.u8);
|
||||
if (ret != LZMA_OK) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(ret));
|
||||
goto error;
|
||||
}
|
||||
|
||||
// Check that the Stream Footer doesn't specify something
|
||||
// that we don't support. This can only happen if the xz
|
||||
// version is older than liblzma and liblzma supports
|
||||
// something new.
|
||||
//
|
||||
// It is enough to check Stream Footer. Stream Header must
|
||||
// match when it is compared against Stream Footer with
|
||||
// lzma_stream_flags_compare().
|
||||
if (footer_flags.version != 0) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(LZMA_OPTIONS_ERROR));
|
||||
goto error;
|
||||
}
|
||||
|
||||
// Check that the size of the Index field looks sane.
|
||||
lzma_vli index_size = footer_flags.backward_size;
|
||||
if ((lzma_vli)(pos) < index_size + LZMA_STREAM_HEADER_SIZE) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(LZMA_DATA_ERROR));
|
||||
goto error;
|
||||
}
|
||||
|
||||
// Set pos to the beginning of the Index.
|
||||
pos -= index_size;
|
||||
|
||||
// See how much memory we can use for decoding this Index.
|
||||
uint64_t memlimit = hardware_memlimit_get(MODE_LIST);
|
||||
uint64_t memused = 0;
|
||||
if (combined_index != NULL) {
|
||||
memused = lzma_index_memused(combined_index);
|
||||
if (memused > memlimit)
|
||||
message_bug();
|
||||
|
||||
memlimit -= memused;
|
||||
}
|
||||
|
||||
// Decode the Index.
|
||||
ret = lzma_index_decoder(&strm, &this_index, memlimit);
|
||||
if (ret != LZMA_OK) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(ret));
|
||||
goto error;
|
||||
}
|
||||
|
||||
do {
|
||||
// Don't give the decoder more input than the
|
||||
// Index size.
|
||||
strm.avail_in = my_min(IO_BUFFER_SIZE, index_size);
|
||||
if (io_pread(pair, &buf, strm.avail_in, pos))
|
||||
goto error;
|
||||
|
||||
pos += strm.avail_in;
|
||||
index_size -= strm.avail_in;
|
||||
lzma_ret ret = lzma_file_info_decoder(&strm, &idx,
|
||||
hardware_memlimit_get(MODE_LIST),
|
||||
(uint64_t)(pair->src_st.st_size));
|
||||
if (ret != LZMA_OK) {
|
||||
message_error("%s: %s", pair->src_name, message_strm(ret));
|
||||
return true;
|
||||
}
|
||||
|
||||
while (true) {
|
||||
if (strm.avail_in == 0) {
|
||||
strm.next_in = buf.u8;
|
||||
ret = lzma_code(&strm, LZMA_RUN);
|
||||
strm.avail_in = io_read(pair, &buf, IO_BUFFER_SIZE);
|
||||
if (strm.avail_in == SIZE_MAX)
|
||||
goto error;
|
||||
}
|
||||
|
||||
} while (ret == LZMA_OK);
|
||||
ret = lzma_code(&strm, LZMA_RUN);
|
||||
|
||||
// If the decoding seems to be successful, check also that
|
||||
// the Index decoder consumed as much input as indicated
|
||||
// by the Backward Size field.
|
||||
if (ret == LZMA_STREAM_END)
|
||||
if (index_size != 0 || strm.avail_in != 0)
|
||||
ret = LZMA_DATA_ERROR;
|
||||
switch (ret) {
|
||||
case LZMA_OK:
|
||||
break;
|
||||
|
||||
if (ret != LZMA_STREAM_END) {
|
||||
// LZMA_BUFFER_ERROR means that the Index decoder
|
||||
// would have liked more input than what the Index
|
||||
// size should be according to Stream Footer.
|
||||
// The message for LZMA_DATA_ERROR makes more
|
||||
// sense in that case.
|
||||
if (ret == LZMA_BUF_ERROR)
|
||||
ret = LZMA_DATA_ERROR;
|
||||
case LZMA_SEEK_NEEDED:
|
||||
// liblzma won't ask us to seek past the known size
|
||||
// of the input file.
|
||||
assert(strm.seek_pos
|
||||
<= (uint64_t)(pair->src_st.st_size));
|
||||
if (io_seek_src(pair, strm.seek_pos))
|
||||
goto error;
|
||||
|
||||
// avail_in must be zero so that we will read new
|
||||
// input.
|
||||
strm.avail_in = 0;
|
||||
break;
|
||||
|
||||
case LZMA_STREAM_END: {
|
||||
lzma_end(&strm);
|
||||
xfi->idx = idx;
|
||||
|
||||
// Calculate xfi->stream_padding.
|
||||
lzma_index_iter iter;
|
||||
lzma_index_iter_init(&iter, xfi->idx);
|
||||
while (!lzma_index_iter_next(&iter,
|
||||
LZMA_INDEX_ITER_STREAM))
|
||||
xfi->stream_padding += iter.stream.padding;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
default:
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(ret));
|
||||
|
||||
// If the error was too low memory usage limit,
|
||||
// show also how much memory would have been needed.
|
||||
if (ret == LZMA_MEMLIMIT_ERROR) {
|
||||
uint64_t needed = lzma_memusage(&strm);
|
||||
if (UINT64_MAX - needed < memused)
|
||||
needed = UINT64_MAX;
|
||||
else
|
||||
needed += memused;
|
||||
|
||||
message_mem_needed(V_ERROR, needed);
|
||||
}
|
||||
if (ret == LZMA_MEMLIMIT_ERROR)
|
||||
message_mem_needed(V_ERROR,
|
||||
lzma_memusage(&strm));
|
||||
|
||||
goto error;
|
||||
}
|
||||
|
||||
// Decode the Stream Header and check that its Stream Flags
|
||||
// match the Stream Footer.
|
||||
pos -= footer_flags.backward_size + LZMA_STREAM_HEADER_SIZE;
|
||||
if ((lzma_vli)(pos) < lzma_index_total_size(this_index)) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(LZMA_DATA_ERROR));
|
||||
goto error;
|
||||
}
|
||||
|
||||
pos -= lzma_index_total_size(this_index);
|
||||
if (io_pread(pair, &buf, LZMA_STREAM_HEADER_SIZE, pos))
|
||||
goto error;
|
||||
|
||||
ret = lzma_stream_header_decode(&header_flags, buf.u8);
|
||||
if (ret != LZMA_OK) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(ret));
|
||||
goto error;
|
||||
}
|
||||
|
||||
ret = lzma_stream_flags_compare(&header_flags, &footer_flags);
|
||||
if (ret != LZMA_OK) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(ret));
|
||||
goto error;
|
||||
}
|
||||
|
||||
// Store the decoded Stream Flags into this_index. This is
|
||||
// needed so that we can print which Check is used in each
|
||||
// Stream.
|
||||
ret = lzma_index_stream_flags(this_index, &footer_flags);
|
||||
if (ret != LZMA_OK)
|
||||
message_bug();
|
||||
|
||||
// Store also the size of the Stream Padding field. It is
|
||||
// needed to show the offsets of the Streams correctly.
|
||||
ret = lzma_index_stream_padding(this_index, stream_padding);
|
||||
if (ret != LZMA_OK)
|
||||
message_bug();
|
||||
|
||||
if (combined_index != NULL) {
|
||||
// Append the earlier decoded Indexes
|
||||
// after this_index.
|
||||
ret = lzma_index_cat(
|
||||
this_index, combined_index, NULL);
|
||||
if (ret != LZMA_OK) {
|
||||
message_error("%s: %s", pair->src_name,
|
||||
message_strm(ret));
|
||||
goto error;
|
||||
}
|
||||
}
|
||||
|
||||
combined_index = this_index;
|
||||
this_index = NULL;
|
||||
|
||||
xfi->stream_padding += stream_padding;
|
||||
|
||||
} while (pos > 0);
|
||||
|
||||
lzma_end(&strm);
|
||||
|
||||
// All OK. Make combined_index available to the caller.
|
||||
xfi->idx = combined_index;
|
||||
return false;
|
||||
}
|
||||
|
||||
error:
|
||||
// Something went wrong, free the allocated memory.
|
||||
lzma_end(&strm);
|
||||
lzma_index_end(combined_index, NULL);
|
||||
lzma_index_end(this_index, NULL);
|
||||
return true;
|
||||
}
|
||||
|
||||
|
@ -454,6 +488,10 @@ parse_block_header(file_pair *pair, const lzma_index_iter *iter,
|
|||
// Check the Block Flags. These must be done before calling
|
||||
// lzma_block_compressed_size(), because it overwrites
|
||||
// block.compressed_size.
|
||||
//
|
||||
// NOTE: If you add new characters here, update the minimum number of
|
||||
// columns in headings[HEADING_HEADERFLAGS] to match the number of
|
||||
// characters used here.
|
||||
bhi->flags[0] = block.compressed_size != LZMA_VLI_UNKNOWN
|
||||
? 'c' : '-';
|
||||
bhi->flags[1] = block.uncompressed_size != LZMA_VLI_UNKNOWN
|
||||
|
@ -488,9 +526,7 @@ parse_block_header(file_pair *pair, const lzma_index_iter *iter,
|
|||
|
||||
case LZMA_DATA_ERROR:
|
||||
// Free the memory allocated by lzma_block_header_decode().
|
||||
for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i)
|
||||
free(filters[i].options);
|
||||
|
||||
lzma_filters_free(filters, NULL);
|
||||
goto data_error;
|
||||
|
||||
default:
|
||||
|
@ -509,25 +545,42 @@ parse_block_header(file_pair *pair, const lzma_index_iter *iter,
|
|||
|
||||
// Determine the minimum XZ Utils version that supports this Block.
|
||||
//
|
||||
// Currently the only thing that 5.0.0 doesn't support is empty
|
||||
// LZMA2 Block. This decoder bug was fixed in 5.0.2.
|
||||
{
|
||||
// - ARM64 filter needs 5.4.0.
|
||||
//
|
||||
// - 5.0.0 doesn't support empty LZMA2 streams and thus empty
|
||||
// Blocks that use LZMA2. This decoder bug was fixed in 5.0.2.
|
||||
if (xfi->min_version < 50040002U) {
|
||||
for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) {
|
||||
if (filters[i].id == LZMA_FILTER_ARM64) {
|
||||
xfi->min_version = 50040002U;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (xfi->min_version < 50000022U) {
|
||||
size_t i = 0;
|
||||
while (filters[i + 1].id != LZMA_VLI_UNKNOWN)
|
||||
++i;
|
||||
|
||||
if (filters[i].id == LZMA_FILTER_LZMA2
|
||||
&& iter->block.uncompressed_size == 0
|
||||
&& xfi->min_version < 50000022U)
|
||||
&& iter->block.uncompressed_size == 0)
|
||||
xfi->min_version = 50000022U;
|
||||
}
|
||||
|
||||
// Convert the filter chain to human readable form.
|
||||
message_filters_to_str(bhi->filter_chain, filters, false);
|
||||
const lzma_ret str_ret = lzma_str_from_filters(
|
||||
&bhi->filter_chain, filters,
|
||||
LZMA_STR_DECODER | LZMA_STR_GETOPT_LONG, NULL);
|
||||
|
||||
// Free the memory allocated by lzma_block_header_decode().
|
||||
for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i)
|
||||
free(filters[i].options);
|
||||
lzma_filters_free(filters, NULL);
|
||||
|
||||
// Check if the stringification succeeded.
|
||||
if (str_ret != LZMA_OK) {
|
||||
message_error("%s: %s", pair->src_name, message_strm(str_ret));
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
|
||||
|
@ -553,7 +606,7 @@ parse_check_value(file_pair *pair, const lzma_index_iter *iter)
|
|||
|
||||
// Locate and read the Check field.
|
||||
const uint32_t size = lzma_check_size(iter->stream.flags->check);
|
||||
const off_t offset = iter->block.compressed_file_offset
|
||||
const uint64_t offset = iter->block.compressed_file_offset
|
||||
+ iter->block.total_size - size;
|
||||
io_buf buf;
|
||||
if (io_pread(pair, &buf, size, offset))
|
||||
|
@ -714,20 +767,20 @@ print_adv_helper(uint64_t stream_count, uint64_t block_count,
|
|||
char checks_str[CHECKS_STR_SIZE];
|
||||
get_check_names(checks_str, checks, true);
|
||||
|
||||
printf(_(" Streams: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_STREAMS),
|
||||
uint64_to_str(stream_count, 0));
|
||||
printf(_(" Blocks: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_BLOCKS),
|
||||
uint64_to_str(block_count, 0));
|
||||
printf(_(" Compressed size: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_COMPRESSED_SIZE),
|
||||
uint64_to_nicestr(compressed_size,
|
||||
NICESTR_B, NICESTR_TIB, true, 0));
|
||||
printf(_(" Uncompressed size: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_UNCOMPRESSED_SIZE),
|
||||
uint64_to_nicestr(uncompressed_size,
|
||||
NICESTR_B, NICESTR_TIB, true, 0));
|
||||
printf(_(" Ratio: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_RATIO),
|
||||
get_ratio(compressed_size, uncompressed_size));
|
||||
printf(_(" Check: %s\n"), checks_str);
|
||||
printf(_(" Stream padding: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_CHECK), checks_str);
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_STREAM_PADDING),
|
||||
uint64_to_nicestr(stream_padding,
|
||||
NICESTR_B, NICESTR_TIB, true, 0));
|
||||
return;
|
||||
|
@ -752,13 +805,19 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
|
|||
|
||||
// Print information about the Streams.
|
||||
//
|
||||
// TRANSLATORS: The second line is column headings. All except
|
||||
// Check are right aligned; Check is left aligned. Test with
|
||||
// "xz -lv foo.xz".
|
||||
puts(_(" Streams:\n Stream Blocks"
|
||||
" CompOffset UncompOffset"
|
||||
" CompSize UncompSize Ratio"
|
||||
" Check Padding"));
|
||||
// All except Check are right aligned; Check is left aligned.
|
||||
// Test with "xz -lv foo.xz".
|
||||
printf(" %s\n %*s %*s %*s %*s %*s %*s %*s %-*s %*s\n",
|
||||
_(colon_strs[COLON_STR_STREAMS]),
|
||||
HEADING_STR(HEADING_STREAM),
|
||||
HEADING_STR(HEADING_BLOCKS),
|
||||
HEADING_STR(HEADING_COMPOFFSET),
|
||||
HEADING_STR(HEADING_UNCOMPOFFSET),
|
||||
HEADING_STR(HEADING_COMPSIZE),
|
||||
HEADING_STR(HEADING_UNCOMPSIZE),
|
||||
HEADING_STR(HEADING_RATIO),
|
||||
HEADING_STR(HEADING_CHECK),
|
||||
HEADING_STR(HEADING_PADDING));
|
||||
|
||||
lzma_index_iter iter;
|
||||
lzma_index_iter_init(&iter, xfi->idx);
|
||||
|
@ -771,10 +830,18 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
|
|||
uint64_to_str(iter.stream.uncompressed_offset, 3),
|
||||
};
|
||||
printf(" %*s %*s %*s %*s ",
|
||||
tuklib_mbstr_fw(cols1[0], 6), cols1[0],
|
||||
tuklib_mbstr_fw(cols1[1], 9), cols1[1],
|
||||
tuklib_mbstr_fw(cols1[2], 15), cols1[2],
|
||||
tuklib_mbstr_fw(cols1[3], 15), cols1[3]);
|
||||
tuklib_mbstr_fw(cols1[0],
|
||||
headings[HEADING_STREAM].columns),
|
||||
cols1[0],
|
||||
tuklib_mbstr_fw(cols1[1],
|
||||
headings[HEADING_BLOCKS].columns),
|
||||
cols1[1],
|
||||
tuklib_mbstr_fw(cols1[2],
|
||||
headings[HEADING_COMPOFFSET].columns),
|
||||
cols1[2],
|
||||
tuklib_mbstr_fw(cols1[3],
|
||||
headings[HEADING_UNCOMPOFFSET].columns),
|
||||
cols1[3]);
|
||||
|
||||
const char *cols2[5] = {
|
||||
uint64_to_str(iter.stream.compressed_size, 0),
|
||||
|
@ -785,11 +852,21 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
|
|||
uint64_to_str(iter.stream.padding, 2),
|
||||
};
|
||||
printf("%*s %*s %*s %-*s %*s\n",
|
||||
tuklib_mbstr_fw(cols2[0], 15), cols2[0],
|
||||
tuklib_mbstr_fw(cols2[1], 15), cols2[1],
|
||||
tuklib_mbstr_fw(cols2[2], 5), cols2[2],
|
||||
tuklib_mbstr_fw(cols2[3], 10), cols2[3],
|
||||
tuklib_mbstr_fw(cols2[4], 7), cols2[4]);
|
||||
tuklib_mbstr_fw(cols2[0],
|
||||
headings[HEADING_COMPSIZE].columns),
|
||||
cols2[0],
|
||||
tuklib_mbstr_fw(cols2[1],
|
||||
headings[HEADING_UNCOMPSIZE].columns),
|
||||
cols2[1],
|
||||
tuklib_mbstr_fw(cols2[2],
|
||||
headings[HEADING_RATIO].columns),
|
||||
cols2[2],
|
||||
tuklib_mbstr_fw(cols2[3],
|
||||
headings[HEADING_CHECK].columns),
|
||||
cols2[3],
|
||||
tuklib_mbstr_fw(cols2[4],
|
||||
headings[HEADING_PADDING].columns),
|
||||
cols2[4]);
|
||||
|
||||
// Update the maximum Check size.
|
||||
if (lzma_check_size(iter.stream.flags->check) > check_max)
|
||||
|
@ -799,32 +876,47 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
|
|||
// Cache the verbosity level to a local variable.
|
||||
const bool detailed = message_verbosity_get() >= V_DEBUG;
|
||||
|
||||
// Information collected from Block Headers
|
||||
block_header_info bhi;
|
||||
|
||||
// Print information about the Blocks but only if there is
|
||||
// at least one Block.
|
||||
if (lzma_index_block_count(xfi->idx) > 0) {
|
||||
// Calculate the width of the CheckVal field.
|
||||
const int checkval_width = my_max(8, 2 * check_max);
|
||||
// Calculate the width of the CheckVal column. This can be
|
||||
// used as is as the field width for printf() when printing
|
||||
// the actual check value as it is hexadecimal. However, to
|
||||
// print the column heading, further calculation is needed
|
||||
// to handle a translated string (it's done a few lines later).
|
||||
assert(check_max <= LZMA_CHECK_SIZE_MAX);
|
||||
const int checkval_width = my_max(
|
||||
headings[HEADING_CHECKVAL].columns,
|
||||
(int)(2 * check_max));
|
||||
|
||||
// TRANSLATORS: The second line is column headings. All
|
||||
// except Check are right aligned; Check is left aligned.
|
||||
printf(_(" Blocks:\n Stream Block"
|
||||
" CompOffset UncompOffset"
|
||||
" TotalSize UncompSize Ratio Check"));
|
||||
// All except Check are right aligned; Check is left aligned.
|
||||
printf(" %s\n %*s %*s %*s %*s %*s %*s %*s %-*s",
|
||||
_(colon_strs[COLON_STR_BLOCKS]),
|
||||
HEADING_STR(HEADING_STREAM),
|
||||
HEADING_STR(HEADING_BLOCK),
|
||||
HEADING_STR(HEADING_COMPOFFSET),
|
||||
HEADING_STR(HEADING_UNCOMPOFFSET),
|
||||
HEADING_STR(HEADING_TOTALSIZE),
|
||||
HEADING_STR(HEADING_UNCOMPSIZE),
|
||||
HEADING_STR(HEADING_RATIO),
|
||||
detailed ? headings[HEADING_CHECK].fw : 1,
|
||||
_(headings[HEADING_CHECK].str));
|
||||
|
||||
if (detailed) {
|
||||
// TRANSLATORS: These are additional column headings
|
||||
// for the most verbose listing mode. CheckVal
|
||||
// (Check value), Flags, and Filters are left aligned.
|
||||
// Header (Block Header Size), CompSize, and MemUsage
|
||||
// are right aligned. %*s is replaced with 0-120
|
||||
// spaces to make the CheckVal column wide enough.
|
||||
// Test with "xz -lvv foo.xz".
|
||||
printf(_(" CheckVal %*s Header Flags "
|
||||
"CompSize MemUsage Filters"),
|
||||
checkval_width - 8, "");
|
||||
// CheckVal (Check value), Flags, and Filters are
|
||||
// left aligned. Block Header Size, CompSize, and
|
||||
// MemUsage are right aligned. Test with
|
||||
// "xz -lvv foo.xz".
|
||||
printf(" %-*s %*s %-*s %*s %*s %s",
|
||||
headings[HEADING_CHECKVAL].fw
|
||||
+ checkval_width
|
||||
- headings[HEADING_CHECKVAL].columns,
|
||||
_(headings[HEADING_CHECKVAL].str),
|
||||
HEADING_STR(HEADING_HEADERSIZE),
|
||||
HEADING_STR(HEADING_HEADERFLAGS),
|
||||
HEADING_STR(HEADING_COMPSIZE),
|
||||
HEADING_STR(HEADING_MEMUSAGE),
|
||||
_(headings[HEADING_FILTERS].str));
|
||||
}
|
||||
|
||||
putchar('\n');
|
||||
|
@ -833,8 +925,11 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
|
|||
|
||||
// Iterate over the Blocks.
|
||||
while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_BLOCK)) {
|
||||
// If in detailed mode, collect the information from
|
||||
// Block Header before starting to print the next line.
|
||||
block_header_info bhi = BLOCK_HEADER_INFO_INIT;
|
||||
if (detailed && parse_details(pair, &iter, &bhi, xfi))
|
||||
return true;
|
||||
return true;
|
||||
|
||||
const char *cols1[4] = {
|
||||
uint64_to_str(iter.stream.number, 0),
|
||||
|
@ -846,10 +941,18 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
|
|||
iter.block.uncompressed_file_offset, 3)
|
||||
};
|
||||
printf(" %*s %*s %*s %*s ",
|
||||
tuklib_mbstr_fw(cols1[0], 6), cols1[0],
|
||||
tuklib_mbstr_fw(cols1[1], 9), cols1[1],
|
||||
tuklib_mbstr_fw(cols1[2], 15), cols1[2],
|
||||
tuklib_mbstr_fw(cols1[3], 15), cols1[3]);
|
||||
tuklib_mbstr_fw(cols1[0],
|
||||
headings[HEADING_STREAM].columns),
|
||||
cols1[0],
|
||||
tuklib_mbstr_fw(cols1[1],
|
||||
headings[HEADING_BLOCK].columns),
|
||||
cols1[1],
|
||||
tuklib_mbstr_fw(cols1[2],
|
||||
headings[HEADING_COMPOFFSET].columns),
|
||||
cols1[2],
|
||||
tuklib_mbstr_fw(cols1[3], headings[
|
||||
HEADING_UNCOMPOFFSET].columns),
|
||||
cols1[3]);
|
||||
|
||||
const char *cols2[4] = {
|
||||
uint64_to_str(iter.block.total_size, 0),
|
||||
|
@ -860,11 +963,18 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
|
|||
_(check_names[iter.stream.flags->check])
|
||||
};
|
||||
printf("%*s %*s %*s %-*s",
|
||||
tuklib_mbstr_fw(cols2[0], 15), cols2[0],
|
||||
tuklib_mbstr_fw(cols2[1], 15), cols2[1],
|
||||
tuklib_mbstr_fw(cols2[2], 5), cols2[2],
|
||||
tuklib_mbstr_fw(cols2[3], detailed ? 11 : 1),
|
||||
cols2[3]);
|
||||
tuklib_mbstr_fw(cols2[0],
|
||||
headings[HEADING_TOTALSIZE].columns),
|
||||
cols2[0],
|
||||
tuklib_mbstr_fw(cols2[1],
|
||||
headings[HEADING_UNCOMPSIZE].columns),
|
||||
cols2[1],
|
||||
tuklib_mbstr_fw(cols2[2],
|
||||
headings[HEADING_RATIO].columns),
|
||||
cols2[2],
|
||||
tuklib_mbstr_fw(cols2[3], detailed
|
||||
? headings[HEADING_CHECK].columns : 1),
|
||||
cols2[3]);
|
||||
|
||||
if (detailed) {
|
||||
const lzma_vli compressed_size
|
||||
|
@ -885,25 +995,35 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
|
|||
};
|
||||
// Show MiB for memory usage, because it
|
||||
// is the only size which is not in bytes.
|
||||
printf("%-*s %*s %-5s %*s %*s MiB %s",
|
||||
printf(" %-*s %*s %-*s %*s %*s MiB %s",
|
||||
checkval_width, cols3[0],
|
||||
tuklib_mbstr_fw(cols3[1], 6), cols3[1],
|
||||
tuklib_mbstr_fw(cols3[1], headings[
|
||||
HEADING_HEADERSIZE].columns),
|
||||
cols3[1],
|
||||
tuklib_mbstr_fw(cols3[2], headings[
|
||||
HEADING_HEADERFLAGS].columns),
|
||||
cols3[2],
|
||||
tuklib_mbstr_fw(cols3[3], 15),
|
||||
cols3[3],
|
||||
tuklib_mbstr_fw(cols3[4], 7), cols3[4],
|
||||
tuklib_mbstr_fw(cols3[3], headings[
|
||||
HEADING_COMPSIZE].columns),
|
||||
cols3[3],
|
||||
tuklib_mbstr_fw(cols3[4], headings[
|
||||
HEADING_MEMUSAGE].columns - 4),
|
||||
cols3[4],
|
||||
cols3[5]);
|
||||
}
|
||||
|
||||
putchar('\n');
|
||||
block_header_info_end(&bhi);
|
||||
}
|
||||
}
|
||||
|
||||
if (detailed) {
|
||||
printf(_(" Memory needed: %s MiB\n"), uint64_to_str(
|
||||
printf(" %-*s %s MiB\n", COLON_STR(COLON_STR_MEMORY_NEEDED),
|
||||
uint64_to_str(
|
||||
round_up_to_mib(xfi->memusage_max), 0));
|
||||
printf(_(" Sizes in headers: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_SIZES_IN_HEADERS),
|
||||
xfi->all_have_sizes ? _("Yes") : _("No"));
|
||||
//printf(" %-*s %s\n", COLON_STR(COLON_STR_MINIMUM_XZ_VERSION),
|
||||
printf(_(" Minimum XZ Utils version: %s\n"),
|
||||
xz_ver_to_str(xfi->min_version));
|
||||
}
|
||||
|
@ -951,9 +1071,9 @@ print_info_robot(xz_file_info *xfi, file_pair *pair)
|
|||
iter.stream.padding);
|
||||
|
||||
lzma_index_iter_rewind(&iter);
|
||||
block_header_info bhi;
|
||||
|
||||
while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_BLOCK)) {
|
||||
block_header_info bhi = BLOCK_HEADER_INFO_INIT;
|
||||
if (message_verbosity_get() >= V_DEBUG
|
||||
&& parse_details(
|
||||
pair, &iter, &bhi, xfi))
|
||||
|
@ -984,6 +1104,7 @@ print_info_robot(xz_file_info *xfi, file_pair *pair)
|
|||
bhi.filter_chain);
|
||||
|
||||
putchar('\n');
|
||||
block_header_info_end(&bhi);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1068,17 +1189,19 @@ print_totals_adv(void)
|
|||
{
|
||||
putchar('\n');
|
||||
puts(_("Totals:"));
|
||||
printf(_(" Number of files: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_NUMBER_OF_FILES),
|
||||
uint64_to_str(totals.files, 0));
|
||||
print_adv_helper(totals.streams, totals.blocks,
|
||||
totals.compressed_size, totals.uncompressed_size,
|
||||
totals.checks, totals.stream_padding);
|
||||
|
||||
if (message_verbosity_get() >= V_DEBUG) {
|
||||
printf(_(" Memory needed: %s MiB\n"), uint64_to_str(
|
||||
printf(" %-*s %s MiB\n", COLON_STR(COLON_STR_MEMORY_NEEDED),
|
||||
uint64_to_str(
|
||||
round_up_to_mib(totals.memusage_max), 0));
|
||||
printf(_(" Sizes in headers: %s\n"),
|
||||
printf(" %-*s %s\n", COLON_STR(COLON_STR_SIZES_IN_HEADERS),
|
||||
totals.all_have_sizes ? _("Yes") : _("No"));
|
||||
//printf(" %-*s %s\n", COLON_STR(COLON_STR_MINIMUM_XZ_VERSION),
|
||||
printf(_(" Minimum XZ Utils version: %s\n"),
|
||||
xz_ver_to_str(totals.min_version));
|
||||
}
|
||||
|
@ -1154,6 +1277,8 @@ list_file(const char *filename)
|
|||
return;
|
||||
}
|
||||
|
||||
init_field_widths();
|
||||
|
||||
// Unset opt_stdout so that io_open_src() won't accept special files.
|
||||
// Set opt_force so that io_open_src() will follow symlinks.
|
||||
opt_stdout = false;
|
||||
|
|
|
@ -142,6 +142,20 @@ read_name(const args_info *args)
|
|||
int
|
||||
main(int argc, char **argv)
|
||||
{
|
||||
#ifdef HAVE_PLEDGE
|
||||
// OpenBSD's pledge(2) sandbox
|
||||
//
|
||||
// Unconditionally enable sandboxing with fairly relaxed promises.
|
||||
// This is still way better than having no sandbox at all. :-)
|
||||
// More strict promises will be made later in file_io.c if possible.
|
||||
if (pledge("stdio rpath wpath cpath fattr", "")) {
|
||||
// Don't translate the string or use message_fatal() as
|
||||
// those haven't been initialized yet.
|
||||
fprintf(stderr, "%s: Failed to enable the sandbox\n", argv[0]);
|
||||
return E_ERROR;
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(_WIN32) && !defined(__CYGWIN__)
|
||||
InitializeCriticalSection(&exit_status_cs);
|
||||
#endif
|
||||
|
|
181
src/xz/message.c
181
src/xz/message.c
|
@ -829,6 +829,15 @@ message_strm(lzma_ret code)
|
|||
case LZMA_STREAM_END:
|
||||
case LZMA_GET_CHECK:
|
||||
case LZMA_PROG_ERROR:
|
||||
case LZMA_SEEK_NEEDED:
|
||||
case LZMA_RET_INTERNAL1:
|
||||
case LZMA_RET_INTERNAL2:
|
||||
case LZMA_RET_INTERNAL3:
|
||||
case LZMA_RET_INTERNAL4:
|
||||
case LZMA_RET_INTERNAL5:
|
||||
case LZMA_RET_INTERNAL6:
|
||||
case LZMA_RET_INTERNAL7:
|
||||
case LZMA_RET_INTERNAL8:
|
||||
// Without "default", compiler will warn if new constants
|
||||
// are added to lzma_ret, it is not too easy to forget to
|
||||
// add the new constants to this function.
|
||||
|
@ -891,167 +900,20 @@ message_mem_needed(enum message_verbosity v, uint64_t memusage)
|
|||
}
|
||||
|
||||
|
||||
/// \brief Convert uint32_t to a nice string for --lzma[12]=dict=SIZE
|
||||
///
|
||||
/// The idea is to use KiB or MiB suffix when possible.
|
||||
static const char *
|
||||
uint32_to_optstr(uint32_t num)
|
||||
{
|
||||
static char buf[16];
|
||||
|
||||
if ((num & ((UINT32_C(1) << 20) - 1)) == 0)
|
||||
snprintf(buf, sizeof(buf), "%" PRIu32 "MiB", num >> 20);
|
||||
else if ((num & ((UINT32_C(1) << 10) - 1)) == 0)
|
||||
snprintf(buf, sizeof(buf), "%" PRIu32 "KiB", num >> 10);
|
||||
else
|
||||
snprintf(buf, sizeof(buf), "%" PRIu32, num);
|
||||
|
||||
return buf;
|
||||
}
|
||||
|
||||
|
||||
extern void
|
||||
message_filters_to_str(char buf[FILTERS_STR_SIZE],
|
||||
const lzma_filter *filters, bool all_known)
|
||||
{
|
||||
char *pos = buf;
|
||||
size_t left = FILTERS_STR_SIZE;
|
||||
|
||||
for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) {
|
||||
// Add the dashes for the filter option. A space is
|
||||
// needed after the first and later filters.
|
||||
my_snprintf(&pos, &left, "%s", i == 0 ? "--" : " --");
|
||||
|
||||
switch (filters[i].id) {
|
||||
case LZMA_FILTER_LZMA1:
|
||||
case LZMA_FILTER_LZMA2: {
|
||||
const lzma_options_lzma *opt = filters[i].options;
|
||||
const char *mode = NULL;
|
||||
const char *mf = NULL;
|
||||
|
||||
if (all_known) {
|
||||
switch (opt->mode) {
|
||||
case LZMA_MODE_FAST:
|
||||
mode = "fast";
|
||||
break;
|
||||
|
||||
case LZMA_MODE_NORMAL:
|
||||
mode = "normal";
|
||||
break;
|
||||
|
||||
default:
|
||||
mode = "UNKNOWN";
|
||||
break;
|
||||
}
|
||||
|
||||
switch (opt->mf) {
|
||||
case LZMA_MF_HC3:
|
||||
mf = "hc3";
|
||||
break;
|
||||
|
||||
case LZMA_MF_HC4:
|
||||
mf = "hc4";
|
||||
break;
|
||||
|
||||
case LZMA_MF_BT2:
|
||||
mf = "bt2";
|
||||
break;
|
||||
|
||||
case LZMA_MF_BT3:
|
||||
mf = "bt3";
|
||||
break;
|
||||
|
||||
case LZMA_MF_BT4:
|
||||
mf = "bt4";
|
||||
break;
|
||||
|
||||
default:
|
||||
mf = "UNKNOWN";
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Add the filter name and dictionary size, which
|
||||
// is always known.
|
||||
my_snprintf(&pos, &left, "lzma%c=dict=%s",
|
||||
filters[i].id == LZMA_FILTER_LZMA2
|
||||
? '2' : '1',
|
||||
uint32_to_optstr(opt->dict_size));
|
||||
|
||||
// With LZMA1 also lc/lp/pb are known when
|
||||
// decompressing, but this function is never
|
||||
// used to print information about .lzma headers.
|
||||
assert(filters[i].id == LZMA_FILTER_LZMA2
|
||||
|| all_known);
|
||||
|
||||
// Print the rest of the options, which are known
|
||||
// only when compressing.
|
||||
if (all_known)
|
||||
my_snprintf(&pos, &left,
|
||||
",lc=%" PRIu32 ",lp=%" PRIu32
|
||||
",pb=%" PRIu32
|
||||
",mode=%s,nice=%" PRIu32 ",mf=%s"
|
||||
",depth=%" PRIu32,
|
||||
opt->lc, opt->lp, opt->pb,
|
||||
mode, opt->nice_len, mf, opt->depth);
|
||||
break;
|
||||
}
|
||||
|
||||
case LZMA_FILTER_X86:
|
||||
case LZMA_FILTER_POWERPC:
|
||||
case LZMA_FILTER_IA64:
|
||||
case LZMA_FILTER_ARM:
|
||||
case LZMA_FILTER_ARMTHUMB:
|
||||
case LZMA_FILTER_SPARC: {
|
||||
static const char bcj_names[][9] = {
|
||||
"x86",
|
||||
"powerpc",
|
||||
"ia64",
|
||||
"arm",
|
||||
"armthumb",
|
||||
"sparc",
|
||||
};
|
||||
|
||||
const lzma_options_bcj *opt = filters[i].options;
|
||||
my_snprintf(&pos, &left, "%s", bcj_names[filters[i].id
|
||||
- LZMA_FILTER_X86]);
|
||||
|
||||
// Show the start offset only when really needed.
|
||||
if (opt != NULL && opt->start_offset != 0)
|
||||
my_snprintf(&pos, &left, "=start=%" PRIu32,
|
||||
opt->start_offset);
|
||||
|
||||
break;
|
||||
}
|
||||
|
||||
case LZMA_FILTER_DELTA: {
|
||||
const lzma_options_delta *opt = filters[i].options;
|
||||
my_snprintf(&pos, &left, "delta=dist=%" PRIu32,
|
||||
opt->dist);
|
||||
break;
|
||||
}
|
||||
|
||||
default:
|
||||
// This should be possible only if liblzma is
|
||||
// newer than the xz tool.
|
||||
my_snprintf(&pos, &left, "UNKNOWN");
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
extern void
|
||||
message_filters_show(enum message_verbosity v, const lzma_filter *filters)
|
||||
{
|
||||
if (v > verbosity)
|
||||
return;
|
||||
|
||||
char buf[FILTERS_STR_SIZE];
|
||||
message_filters_to_str(buf, filters, true);
|
||||
char *buf;
|
||||
const lzma_ret ret = lzma_str_from_filters(&buf, filters,
|
||||
LZMA_STR_ENCODER | LZMA_STR_GETOPT_LONG, NULL);
|
||||
if (ret != LZMA_OK)
|
||||
message_fatal("%s", message_strm(ret));
|
||||
|
||||
fprintf(stderr, _("%s: Filter chain: %s\n"), progname, buf);
|
||||
free(buf);
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -1134,7 +996,7 @@ message_help(bool long_help)
|
|||
puts(_("\n Basic file format and compression options:\n"));
|
||||
puts(_(
|
||||
" -F, --format=FMT file format to encode or decode; possible values are\n"
|
||||
" `auto' (default), `xz', `lzma', and `raw'\n"
|
||||
" `auto' (default), `xz', `lzma', `lzip', and `raw'\n"
|
||||
" -C, --check=CHECK integrity check type: `none' (use with caution),\n"
|
||||
" `crc32', `crc64' (default), or `sha256'"));
|
||||
puts(_(
|
||||
|
@ -1171,9 +1033,11 @@ message_help(bool long_help)
|
|||
puts(_( // xgettext:no-c-format
|
||||
" --memlimit-compress=LIMIT\n"
|
||||
" --memlimit-decompress=LIMIT\n"
|
||||
" --memlimit-mt-decompress=LIMIT\n"
|
||||
" -M, --memlimit=LIMIT\n"
|
||||
" set memory usage limit for compression, decompression,\n"
|
||||
" or both; LIMIT is in bytes, % of RAM, or 0 for defaults"));
|
||||
" threaded decompression, or all of these; LIMIT is in\n"
|
||||
" bytes, % of RAM, or 0 for defaults"));
|
||||
|
||||
puts(_(
|
||||
" --no-adjust if compression settings exceed the memory usage limit,\n"
|
||||
|
@ -1208,10 +1072,11 @@ message_help(bool long_help)
|
|||
puts(_(
|
||||
"\n"
|
||||
" --x86[=OPTS] x86 BCJ filter (32-bit and 64-bit)\n"
|
||||
" --arm[=OPTS] ARM BCJ filter\n"
|
||||
" --armthumb[=OPTS] ARM-Thumb BCJ filter\n"
|
||||
" --arm64[=OPTS] ARM64 BCJ filter\n"
|
||||
" --powerpc[=OPTS] PowerPC BCJ filter (big endian only)\n"
|
||||
" --ia64[=OPTS] IA-64 (Itanium) BCJ filter\n"
|
||||
" --arm[=OPTS] ARM BCJ filter (little endian only)\n"
|
||||
" --armthumb[=OPTS] ARM-Thumb BCJ filter (little endian only)\n"
|
||||
" --sparc[=OPTS] SPARC BCJ filter\n"
|
||||
" Valid OPTS for all BCJ filters:\n"
|
||||
" start=NUM start offset for conversions (default=0)"));
|
||||
|
|
|
@ -90,22 +90,6 @@ extern const char *message_strm(lzma_ret code);
|
|||
extern void message_mem_needed(enum message_verbosity v, uint64_t memusage);
|
||||
|
||||
|
||||
/// Buffer size for message_filters_to_str()
|
||||
#define FILTERS_STR_SIZE 512
|
||||
|
||||
|
||||
/// \brief Get the filter chain as a string
|
||||
///
|
||||
/// \param buf Pointer to caller allocated buffer to hold
|
||||
/// the filter chain string
|
||||
/// \param filters Pointer to the filter chain
|
||||
/// \param all_known If true, all filter options are printed.
|
||||
/// If false, only the options that get stored
|
||||
/// into .xz headers are printed.
|
||||
extern void message_filters_to_str(char buf[FILTERS_STR_SIZE],
|
||||
const lzma_filter *filters, bool all_known);
|
||||
|
||||
|
||||
/// Print the filter chain.
|
||||
extern void message_filters_show(
|
||||
enum message_verbosity v, const lzma_filter *filters);
|
||||
|
|
|
@ -354,10 +354,5 @@ options_lzma(const char *str)
|
|||
if (options->lc + options->lp > LZMA_LCLP_MAX)
|
||||
message_fatal(_("The sum of lc and lp must not exceed 4"));
|
||||
|
||||
const uint32_t nice_len_min = options->mf & 0x0F;
|
||||
if (options->nice_len < nice_len_min)
|
||||
message_fatal(_("The selected match finder requires at "
|
||||
"least nice=%" PRIu32), nice_len_min);
|
||||
|
||||
return options;
|
||||
}
|
||||
|
|
|
@ -45,7 +45,7 @@
|
|||
# define STDERR_FILENO (fileno(stderr))
|
||||
#endif
|
||||
|
||||
#ifdef HAVE_CAPSICUM
|
||||
#if defined(HAVE_CAPSICUM) || defined(HAVE_PLEDGE)
|
||||
# define ENABLE_SANDBOX 1
|
||||
#endif
|
||||
|
||||
|
|
|
@ -119,9 +119,10 @@ uncompressed_name(const char *src_name, const size_t src_len)
|
|||
#ifdef __DJGPP__
|
||||
{ ".lzm", "" },
|
||||
#endif
|
||||
{ ".tlz", ".tar" },
|
||||
// { ".gz", "" },
|
||||
// { ".tgz", ".tar" },
|
||||
{ ".tlz", ".tar" }, // Both .tar.lzma and .tar.lz
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
{ ".lz", "" },
|
||||
#endif
|
||||
};
|
||||
|
||||
const char *new_suffix = "";
|
||||
|
@ -208,12 +209,15 @@ compressed_name(const char *src_name, size_t src_len)
|
|||
#endif
|
||||
".tlz",
|
||||
NULL
|
||||
/*
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
// This is needed to keep the table indexing in sync with
|
||||
// enum format_type from coder.h.
|
||||
}, {
|
||||
".gz",
|
||||
".tgz",
|
||||
NULL
|
||||
/*
|
||||
".lz",
|
||||
*/
|
||||
NULL
|
||||
#endif
|
||||
}, {
|
||||
// --format=raw requires specifying the suffix
|
||||
// manually or using stdout.
|
||||
|
@ -221,8 +225,11 @@ compressed_name(const char *src_name, size_t src_len)
|
|||
}
|
||||
};
|
||||
|
||||
// args.c ensures this.
|
||||
// args.c ensures these.
|
||||
assert(opt_format != FORMAT_AUTO);
|
||||
#ifdef HAVE_LZIP_DECODER
|
||||
assert(opt_format != FORMAT_LZIP);
|
||||
#endif
|
||||
|
||||
const size_t format = opt_format - 1;
|
||||
const char *const *suffixes = all_suffixes[format];
|
||||
|
@ -299,9 +306,11 @@ compressed_name(const char *src_name, size_t src_len)
|
|||
// xz foo.tar -> foo.txz
|
||||
// xz -F lzma foo.tar -> foo.tlz
|
||||
static const char *const tar_suffixes[] = {
|
||||
".txz",
|
||||
".tlz",
|
||||
// ".tgz",
|
||||
".txz", // .tar.xz
|
||||
".tlz", // .tar.lzma
|
||||
/*
|
||||
".tlz", // .tar.lz
|
||||
*/
|
||||
};
|
||||
suffix = tar_suffixes[format];
|
||||
suffix_len = 4;
|
||||
|
|
|
@ -260,18 +260,6 @@ my_snprintf(char **pos, size_t *left, const char *fmt, ...)
|
|||
}
|
||||
|
||||
|
||||
extern bool
|
||||
is_empty_filename(const char *filename)
|
||||
{
|
||||
if (filename[0] == '\0') {
|
||||
message_error(_("Empty filename, skipping"));
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
extern bool
|
||||
is_tty_stdin(void)
|
||||
{
|
||||
|
|
|
@ -105,10 +105,6 @@ extern void my_snprintf(char **pos, size_t *left, const char *fmt, ...)
|
|||
lzma_attribute((__format__(__printf__, 3, 4)));
|
||||
|
||||
|
||||
/// \brief Check if filename is empty and print an error message
|
||||
extern bool is_empty_filename(const char *filename);
|
||||
|
||||
|
||||
/// \brief Test if stdin is a terminal
|
||||
///
|
||||
/// If stdin is a terminal, an error message is printed and exit status set
|
||||
|
|
327
src/xz/xz.1
327
src/xz/xz.1
|
@ -5,7 +5,7 @@
|
|||
.\" This file has been put into the public domain.
|
||||
.\" You can do whatever you want with this file.
|
||||
.\"
|
||||
.TH XZ 1 "2022-10-25" "Tukaani" "XZ Utils"
|
||||
.TH XZ 1 "2022-12-01" "Tukaani" "XZ Utils"
|
||||
.
|
||||
.SH NAME
|
||||
xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files
|
||||
|
@ -62,6 +62,11 @@ format, but the legacy
|
|||
format used by LZMA Utils and
|
||||
raw compressed streams with no container format headers
|
||||
are also supported.
|
||||
In addition, decompression of the
|
||||
.B .lz
|
||||
format used by
|
||||
.B lzip
|
||||
is supported.
|
||||
.PP
|
||||
.B xz
|
||||
compresses or decompresses each
|
||||
|
@ -102,9 +107,10 @@ or
|
|||
is appended to the source filename to get the target filename.
|
||||
.IP \(bu 3
|
||||
When decompressing, the
|
||||
.B .xz
|
||||
.BR .xz ,
|
||||
.BR .lzma ,
|
||||
or
|
||||
.B .lzma
|
||||
.B .lz
|
||||
suffix is removed from the filename to get the target filename.
|
||||
.B xz
|
||||
also recognizes the suffixes
|
||||
|
@ -158,8 +164,9 @@ doesn't have a suffix of any of the supported file formats
|
|||
.RB ( .xz ,
|
||||
.BR .txz ,
|
||||
.BR .lzma ,
|
||||
.BR .tlz ,
|
||||
or
|
||||
.BR .tlz ).
|
||||
.BR .lz ).
|
||||
.PP
|
||||
After successfully compressing or decompressing the
|
||||
.IR file ,
|
||||
|
@ -507,8 +514,9 @@ in addition to files with the
|
|||
.BR .xz ,
|
||||
.BR .txz ,
|
||||
.BR .lzma ,
|
||||
.BR .tlz ,
|
||||
or
|
||||
.B .tlz
|
||||
.B .lz
|
||||
suffix.
|
||||
If the source file has the suffix
|
||||
.IR .suf ,
|
||||
|
@ -575,6 +583,34 @@ The alternative name
|
|||
.B alone
|
||||
is provided for backwards compatibility with LZMA Utils.
|
||||
.TP
|
||||
.B lzip
|
||||
Accept only
|
||||
.B .lz
|
||||
files when decompressing.
|
||||
Compression is not supported.
|
||||
.IP ""
|
||||
The
|
||||
.B .lz
|
||||
format version 0 and the unextended version 1 are supported.
|
||||
Version 0 files were produced by
|
||||
.B lzip
|
||||
1.3 and older.
|
||||
Such files aren't common but may be found from file archives
|
||||
as a few source packages were released in this format.
|
||||
People might have old personal files in this format too.
|
||||
Decompression support for the format version 0 was removed in
|
||||
.B lzip
|
||||
1.18.
|
||||
.IP ""
|
||||
.B lzip
|
||||
1.4 and later create files in the format version 1.
|
||||
The sync flush marker extension to the format version 1 was added in
|
||||
.B lzip
|
||||
1.6.
|
||||
This extension is rarely used and isn't supported by
|
||||
.B xz
|
||||
(diagnosed as corrupt input).
|
||||
.TP
|
||||
.B raw
|
||||
Compress or uncompress a raw stream (no headers).
|
||||
This is meant for advanced users only.
|
||||
|
@ -965,15 +1001,28 @@ the last one takes effect.
|
|||
If the compression settings exceed the
|
||||
.IR limit ,
|
||||
.B xz
|
||||
will adjust the settings downwards so that
|
||||
will attempt to adjust the settings downwards so that
|
||||
the limit is no longer exceeded and display a notice that
|
||||
automatic adjustment was done.
|
||||
Such adjustments are not made when compressing with
|
||||
The adjustments are done in this order:
|
||||
reducing the number of threads,
|
||||
switching to single-threaded mode
|
||||
if even one thread in multi-threaded mode exceeds the
|
||||
.IR limit ,
|
||||
and finally reducing the LZMA2 dictionary size.
|
||||
.IP ""
|
||||
When compressing with
|
||||
.B \-\-format=raw
|
||||
or if
|
||||
.B \-\-no\-adjust
|
||||
has been specified.
|
||||
In those cases, an error is displayed and
|
||||
has been specified,
|
||||
only the number of threads may be reduced
|
||||
since it can be done without affecting the compressed output.
|
||||
.IP ""
|
||||
If the
|
||||
.I limit
|
||||
cannot be met even with the adjustments described above,
|
||||
an error is displayed and
|
||||
.B xz
|
||||
will exit with exit status 1.
|
||||
.IP ""
|
||||
|
@ -1012,16 +1061,6 @@ This is currently equivalent to setting the
|
|||
to
|
||||
.B max
|
||||
(no memory usage limit).
|
||||
Once multithreading support has been implemented,
|
||||
there may be a difference between
|
||||
.B 0
|
||||
and
|
||||
.B max
|
||||
for the multithreaded case, so it is recommended to use
|
||||
.B 0
|
||||
instead of
|
||||
.B max
|
||||
until the details have been decided.
|
||||
.RE
|
||||
.IP ""
|
||||
For 32-bit
|
||||
|
@ -1064,16 +1103,80 @@ See
|
|||
for possible ways to specify the
|
||||
.IR limit .
|
||||
.TP
|
||||
.BI \-\-memlimit\-mt\-decompress= limit
|
||||
Set a memory usage limit for multi-threaded decompression.
|
||||
This can only affect the number of threads;
|
||||
this will never make
|
||||
.B xz
|
||||
refuse to decompress a file.
|
||||
If
|
||||
.I limit
|
||||
is too low to allow any multi-threading, the
|
||||
.I limit
|
||||
is ignored and
|
||||
.B xz
|
||||
will continue in single-threaded mode.
|
||||
Note that if also
|
||||
.B \-\-memlimit\-decompress
|
||||
is used,
|
||||
it will always apply to both single-threaded and multi-threaded modes,
|
||||
and so the effective
|
||||
.I limit
|
||||
for multi-threading will never be higher than the limit set with
|
||||
.BR \-\-memlimit\-decompress .
|
||||
.IP ""
|
||||
In contrast to the other memory usage limit options,
|
||||
.BI \-\-memlimit\-mt\-decompress= limit
|
||||
has a system-specific default
|
||||
.IR limit .
|
||||
.B "xz \-\-info\-memory"
|
||||
can be used to see the current value.
|
||||
.IP ""
|
||||
This option and its default value exist
|
||||
because without any limit the threaded decompressor
|
||||
could end up allocating an insane amount of memory with some input files.
|
||||
If the default
|
||||
.I limit
|
||||
is too low on your system,
|
||||
feel free to increase the
|
||||
.I limit
|
||||
but never set it to a value larger than the amount of usable RAM
|
||||
as with appropriate input files
|
||||
.B xz
|
||||
will attempt to use that amount of memory
|
||||
even with a low number of threads.
|
||||
Running out of memory or swapping
|
||||
will not improve decompression performance.
|
||||
.IP ""
|
||||
See
|
||||
.BI \-\-memlimit\-compress= limit
|
||||
for possible ways to specify the
|
||||
.IR limit .
|
||||
Setting
|
||||
.I limit
|
||||
to
|
||||
.B 0
|
||||
resets the
|
||||
.I limit
|
||||
to the default system-specific value.
|
||||
.IP ""
|
||||
.TP
|
||||
\fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit
|
||||
This is equivalent to specifying
|
||||
.BI \-\-memlimit\-compress= limit
|
||||
\fB\-\-memlimit\-decompress=\fIlimit\fR.
|
||||
.BI \-\-memlimit-decompress= limit
|
||||
\fB\-\-memlimit\-mt\-decompress=\fIlimit\fR.
|
||||
.TP
|
||||
.B \-\-no\-adjust
|
||||
Display an error and exit if the compression settings exceed
|
||||
the memory usage limit.
|
||||
The default is to adjust the settings downwards so
|
||||
that the memory usage limit is not exceeded.
|
||||
Display an error and exit if the memory usage limit cannot be
|
||||
met without adjusting settings that affect the compressed output.
|
||||
That is, this prevents
|
||||
.B xz
|
||||
from switching the encoder from multi-threaded mode to single-threaded mode
|
||||
and from reducing the LZMA2 dictionary size.
|
||||
Even when this option is used the number of threads may be reduced
|
||||
to meet the memory usage limit as that won't affect the compressed output.
|
||||
.IP ""
|
||||
Automatic adjusting is always disabled when creating raw streams
|
||||
.RB ( \-\-format=raw ).
|
||||
.TP
|
||||
|
@ -1085,13 +1188,66 @@ to a special value
|
|||
.B 0
|
||||
makes
|
||||
.B xz
|
||||
use as many threads as there are CPU cores on the system.
|
||||
The actual number of threads can be less than
|
||||
use up to as many threads as the processor(s) on the system support.
|
||||
The actual number of threads can be fewer than
|
||||
.I threads
|
||||
if the input file is not big enough
|
||||
for threading with the given settings or
|
||||
if using more threads would exceed the memory usage limit.
|
||||
.IP ""
|
||||
The single-threaded and multi-threaded compressors produce different output.
|
||||
Single-threaded compressor will give the smallest file size but
|
||||
only the output from the multi-threaded compressor can be decompressed
|
||||
using multiple threads.
|
||||
Setting
|
||||
.I threads
|
||||
to
|
||||
.B 1
|
||||
will use the single-threaded mode.
|
||||
Setting
|
||||
.I threads
|
||||
to any other value, including
|
||||
.BR 0 ,
|
||||
will use the multi-threaded compressor
|
||||
even if the system supports only one hardware thread.
|
||||
.RB ( xz
|
||||
5.2.x
|
||||
used single-threaded mode in this situation.)
|
||||
.IP ""
|
||||
To use multi-threaded mode with only one thread, set
|
||||
.I threads
|
||||
to
|
||||
.BR +1 .
|
||||
The
|
||||
.B +
|
||||
prefix has no effect with values other than
|
||||
.BR 1 .
|
||||
A memory usage limit can still make
|
||||
.B xz
|
||||
switch to single-threaded mode unless
|
||||
.B \-\-no\-adjust
|
||||
is used.
|
||||
Support for the
|
||||
.B +
|
||||
prefix was added in
|
||||
.B xz
|
||||
5.4.0.
|
||||
.IP ""
|
||||
If an automatic number of threads has been requested and
|
||||
no memory usage limit has been specified,
|
||||
then a system-specific default soft limit will be used to possibly
|
||||
limit the number of threads.
|
||||
It is a soft limit in sense that it is ignored
|
||||
if the number of threads becomes one,
|
||||
thus a soft limit will never stop
|
||||
.B xz
|
||||
from compressing or decompressing.
|
||||
This default soft limit will not make
|
||||
.B xz
|
||||
switch from multi-threaded mode to single-threaded mode.
|
||||
The active limits can be seen with
|
||||
.BR "xz \-\-info\-memory" .
|
||||
.IP ""
|
||||
Currently the only threading method is to split the input into
|
||||
blocks and compress them independently from each other.
|
||||
The default block size depends on the compression level and
|
||||
|
@ -1099,13 +1255,13 @@ can be overridden with the
|
|||
.BI \-\-block\-size= size
|
||||
option.
|
||||
.IP ""
|
||||
Threaded decompression hasn't been implemented yet.
|
||||
It will only work on files that contain multiple blocks
|
||||
with size information in block headers.
|
||||
All files compressed in multi-threaded mode meet this condition,
|
||||
Threaded decompression only works on files that contain
|
||||
multiple blocks with size information in block headers.
|
||||
All large enough files compressed in multi-threaded mode
|
||||
meet this condition,
|
||||
but files compressed in single-threaded mode don't even if
|
||||
.BI \-\-block\-size= size
|
||||
is used.
|
||||
has been used.
|
||||
.
|
||||
.SS "Custom compressor filter chains"
|
||||
A custom filter chain allows specifying
|
||||
|
@ -1537,14 +1693,16 @@ and
|
|||
\fB\-\-x86\fR[\fB=\fIoptions\fR]
|
||||
.PD 0
|
||||
.TP
|
||||
\fB\-\-powerpc\fR[\fB=\fIoptions\fR]
|
||||
.TP
|
||||
\fB\-\-ia64\fR[\fB=\fIoptions\fR]
|
||||
.TP
|
||||
\fB\-\-arm\fR[\fB=\fIoptions\fR]
|
||||
.TP
|
||||
\fB\-\-armthumb\fR[\fB=\fIoptions\fR]
|
||||
.TP
|
||||
\fB\-\-arm64\fR[\fB=\fIoptions\fR]
|
||||
.TP
|
||||
\fB\-\-powerpc\fR[\fB=\fIoptions\fR]
|
||||
.TP
|
||||
\fB\-\-ia64\fR[\fB=\fIoptions\fR]
|
||||
.TP
|
||||
\fB\-\-sparc\fR[\fB=\fIoptions\fR]
|
||||
.PD
|
||||
Add a branch/call/jump (BCJ) filter to the filter chain.
|
||||
|
@ -1553,7 +1711,7 @@ in the filter chain.
|
|||
.IP ""
|
||||
A BCJ filter converts relative addresses in
|
||||
the machine code to their absolute counterparts.
|
||||
This doesn't change the size of the data,
|
||||
This doesn't change the size of the data
|
||||
but it increases redundancy,
|
||||
which can help LZMA2 to produce 0\(en15\ % smaller
|
||||
.B .xz
|
||||
|
@ -1562,21 +1720,8 @@ The BCJ filters are always reversible,
|
|||
so using a BCJ filter for wrong type of data
|
||||
doesn't cause any data loss, although it may make
|
||||
the compression ratio slightly worse.
|
||||
.IP ""
|
||||
It is fine to apply a BCJ filter on a whole executable;
|
||||
there's no need to apply it only on the executable section.
|
||||
Applying a BCJ filter on an archive that contains both executable
|
||||
and non-executable files may or may not give good results,
|
||||
so it generally isn't good to blindly apply a BCJ filter when
|
||||
compressing binary packages for distribution.
|
||||
.IP ""
|
||||
These BCJ filters are very fast and
|
||||
use insignificant amount of memory.
|
||||
If a BCJ filter improves compression ratio of a file,
|
||||
it can improve decompression speed at the same time.
|
||||
This is because, on the same hardware,
|
||||
the decompression speed of LZMA2 is roughly
|
||||
a fixed number of bytes of compressed data per second.
|
||||
The BCJ filters are very fast and
|
||||
use an insignificant amount of memory.
|
||||
.IP ""
|
||||
These BCJ filters have known problems related to
|
||||
the compression ratio:
|
||||
|
@ -1588,21 +1733,20 @@ have the addresses in the instructions filled with filler values.
|
|||
These BCJ filters will still do the address conversion,
|
||||
which will make the compression worse with these files.
|
||||
.IP \(bu 3
|
||||
Applying a BCJ filter on an archive containing multiple similar
|
||||
executables can make the compression ratio worse than not using
|
||||
a BCJ filter.
|
||||
This is because the BCJ filter doesn't detect the boundaries
|
||||
of the executable files, and doesn't reset
|
||||
the address conversion counter for each executable.
|
||||
If a BCJ filter is applied on an archive,
|
||||
it is possible that it makes the compression ratio
|
||||
worse than not using a BCJ filter.
|
||||
For example, if there are similar or even identical executables
|
||||
then filtering will likely make the files less similar
|
||||
and thus compression is worse.
|
||||
The contents of non-executable files in the same archive can matter too.
|
||||
In practice one has to try with and without a BCJ filter to see
|
||||
which is better in each situation.
|
||||
.RE
|
||||
.IP ""
|
||||
Both of the above problems will be fixed
|
||||
in the future in a new filter.
|
||||
The old BCJ filters will still be useful in embedded systems,
|
||||
because the decoder of the new filter will be bigger
|
||||
and use more memory.
|
||||
.IP ""
|
||||
Different instruction sets have different alignment:
|
||||
the executable file must be aligned to a multiple of
|
||||
this value in the input data to make the filter work.
|
||||
.RS
|
||||
.RS
|
||||
.PP
|
||||
|
@ -1612,11 +1756,12 @@ l n l
|
|||
l n l.
|
||||
Filter;Alignment;Notes
|
||||
x86;1;32-bit or 64-bit x86
|
||||
ARM;4;
|
||||
ARM-Thumb;2;
|
||||
ARM64;4;4096-byte alignment is best
|
||||
PowerPC;4;Big endian only
|
||||
ARM;4;Little endian only
|
||||
ARM-Thumb;2;Little endian only
|
||||
IA-64;16;Big or little endian
|
||||
SPARC;4;Big or little endian
|
||||
IA-64;16;Itanium
|
||||
SPARC;4;
|
||||
.TE
|
||||
.RE
|
||||
.RE
|
||||
|
@ -1627,6 +1772,8 @@ the LZMA2 options are set to match the
|
|||
alignment of the selected BCJ filter.
|
||||
For example, with the IA-64 filter, it's good to set
|
||||
.B pb=4
|
||||
or even
|
||||
.B pb=4,lp=4,lc=0
|
||||
with LZMA2 (2^4=16).
|
||||
The x86 filter is an exception;
|
||||
it's usually good to stick to LZMA2's default
|
||||
|
@ -1774,6 +1921,7 @@ for details.
|
|||
.TP
|
||||
.B \-\-info\-memory
|
||||
Display, in human-readable format, how much physical memory (RAM)
|
||||
and how many processor threads
|
||||
.B xz
|
||||
thinks the system has and the memory usage limits for compression
|
||||
and decompression, and exit successfully.
|
||||
|
@ -1858,15 +2006,50 @@ and
|
|||
.B "xz \-\-robot \-\-info\-memory"
|
||||
prints a single line with three tab-separated columns:
|
||||
.IP 1. 4
|
||||
Total amount of physical memory (RAM) in bytes
|
||||
Total amount of physical memory (RAM) in bytes.
|
||||
.IP 2. 4
|
||||
Memory usage limit for compression in bytes.
|
||||
A special value of zero indicates the default setting,
|
||||
Memory usage limit for compression in bytes
|
||||
.RB ( \-\-memlimit\-compress ).
|
||||
A special value of
|
||||
.B 0
|
||||
indicates the default setting
|
||||
which for single-threaded mode is the same as no limit.
|
||||
.IP 3. 4
|
||||
Memory usage limit for decompression in bytes.
|
||||
A special value of zero indicates the default setting,
|
||||
Memory usage limit for decompression in bytes
|
||||
.RB ( \-\-memlimit\-decompress ).
|
||||
A special value of
|
||||
.B 0
|
||||
indicates the default setting
|
||||
which for single-threaded mode is the same as no limit.
|
||||
.IP 4. 4
|
||||
Since
|
||||
.B xz
|
||||
5.3.4alpha:
|
||||
Memory usage for multi-threaded decompression in bytes
|
||||
.RB ( \-\-memlimit\-mt\-decompress ).
|
||||
This is never zero because a system-specific default value
|
||||
shown in the column 5
|
||||
is used if no limit has been specified explicitly.
|
||||
This is also never greater than the value in the column 3
|
||||
even if a larger value has been specified with
|
||||
.BR \-\-memlimit\-mt\-decompress .
|
||||
.IP 5. 4
|
||||
Since
|
||||
.B xz
|
||||
5.3.4alpha:
|
||||
A system-specific default memory usage limit
|
||||
that is used to limit the number of threads
|
||||
when compressing with an automatic number of threads
|
||||
.RB ( \-\-threads=0 )
|
||||
and no memory usage limit has been specified
|
||||
.RB ( \-\-memlimit\-compress ).
|
||||
This is also used as the default value for
|
||||
.BR \-\-memlimit\-mt\-decompress .
|
||||
.IP 6. 4
|
||||
Since
|
||||
.B xz
|
||||
5.3.4alpha:
|
||||
Number of available processor threads.
|
||||
.PP
|
||||
In the future, the output of
|
||||
.B "xz \-\-robot \-\-info\-memory"
|
||||
|
|
Loading…
Reference in a new issue