Vendor import of xz 5.4.0 (trimmed)

2024-10-15 04:43:53 +00:00 · 2022-12-31 20:23:22 -08:00 · 2022-12-31 20:23:22 -08:00 · f6a891c2b4
parent 46780ea2dc
commit f6a891c2b4
86 changed files with 13192 additions and 6616 deletions
--- a/12
+++ b/12
@ -19,6 +19,18 @@ Authors of XZ Utils
    Andrew Dudman helped adapting the scripts and their man pages for
    XZ Utils.

+    The initial version of the threaded .xz decompressor was written
+    by Sebastian Andrzej Siewior.
+
+    The initial version of the .lz (lzip) decoder was written
+    by Michał Górny.
+
+    CLMUL-accelerated CRC code was contributed by Ilya Kurdyukov.
+
+    Other authors:
+      - Jonathan Nieder
+      - Joachim Henke
+
    The GNU Autotools-based build system contains files from many authors,
    which I'm not trying to list here.

--- a/9834
+++ b/9834
--- a/74
+++ b/74
@ -202,9 +202,77 @@ XZ Utils

        https://translationproject.org/html/translators.html

-    Several strings will change in a future version of xz so if you
-    wish to start a new translation, look at the code in the xz git
-    repository instead of a 5.2.x release.
+    Below are notes and testing instructions specific to xz
+    translations.
+
+    Testing can be done by installing xz into a temporary directory:
+
+        ./configure --disable-shared --prefix=/tmp/xz-test
+        # <Edit the .po file in the po directory.>
+        make -C po update-po
+        make install
+        bash debug/translation.bash | less
+        bash debug/translation.bash | less -S  # For --list outputs
+
+    Repeat the above as needed (no need to re-run configure though).
+
+    Note especially the following:
+
+      - The output of --help and --long-help must look nice on
+        an 80-column terminal. It's OK to add extra lines if needed.
+
+      - In contrast, don't add extra lines to error messages and such.
+        They are often preceded with e.g. a filename on the same line,
+        so you have no way to predict where to put a \n. Let the terminal
+        do the wrapping even if it looks ugly. Adding new lines will be
+        even uglier in the generic case even if it looks nice in a few
+        limited examples.
+
+      - Be careful with column alignment in tables and table-like output
+        (--list, --list --verbose --verbose, --info-memory, --help, and
+        --long-help):
+
+          * All descriptions of options in --help should start in the
+            same column (but it doesn't need to be the same column as
+            in the English messages; just be consistent if you change it).
+            Check that both --help and --long-help look OK, since they
+            share several strings.
+
+          * --list --verbose and --info-memory print lines that have
+            the format "Description:   %s". If you need a longer
+            description, you can put extra space between the colon
+            and %s. Then you may need to add extra space to other
+            strings too so that the result as a whole looks good (all
+            values start at the same column).
+
+          * The columns of the actual tables in --list --verbose --verbose
+            should be aligned properly. Abbreviate if necessary. It might
+            be good to keep at least 2 or 3 spaces between column headings
+            and avoid spaces in the headings so that the columns stand out
+            better, but this is a matter of opinion. Do what you think
+            looks best.
+
+      - Be careful to put a period at the end of a sentence when the
+        original version has it, and don't put it when the original
+        doesn't have it. Similarly, be careful with \n characters
+        at the beginning and end of the strings.
+
+      - Read the TRANSLATORS comments that have been extracted from the
+        source code and included in xz.pot. Some comments suggest
+        testing with a specific command which needs an .xz file. You
+        may use e.g. any tests/files/good-*.xz. However, these test
+        commands are included in translations.bash output, so reading
+        translations.bash output carefully can be enough.
+
+      - If you find language problems in the original English strings,
+        feel free to suggest improvements. Ask if something is unclear.
+
+      - The translated messages should be understandable (sometimes this
+        may be a problem with the original English messages too). Don't
+        make a direct word-by-word translation from English especially if
+        the result doesn't sound good in your language.
+
+    Thanks for your help!


 5. Other implementations of the .xz format
--- a/9
+++ b/9
@ -42,8 +42,11 @@ has been important. :-) In alphabetical order:
  - Michael Fox
  - Mike Frysinger
  - Daniel Richard G.
+  - Tomasz Gajc
  - Bjarni Ingi Gislason
+  - John Paul Adrian Glaubitz
  - Bill Glessner
+  - Michał Górny
  - Jason Gorski
  - Juan Manuel Guerrero
  - Diederik de Haas
@ -51,6 +54,8 @@ has been important. :-) In alphabetical order:
  - Christian Hesse
  - Vincenzo Innocente
  - Peter Ivanov
+  - Nicholas Jackson
+  - Sam James
  - Jouk Jansen
  - Jun I Jin
  - Kiyoshi Kanazawa
@ -61,8 +66,10 @@ has been important. :-) In alphabetical order:
  - Jan Kratochvil
  - Christian Kujau
  - Stephan Kulow
+  - Ilya Kurdyukov
  - Peter Lawler
  - James M Leddy
+  - Vincent Lefevre
  - Hin-Tak Leung
  - Andraž 'ruskie' Levstik
  - Cary Lewis
@ -79,6 +86,8 @@ has been important. :-) In alphabetical order:
  - Ivan A. Melnikov
  - Jim Meyering
  - Arkadiusz Miskiewicz
+  - Nathan Moinvaziri
+  - Étienne Mollier
  - Conley Moorhous
  - Rafał Mużyło
  - Adrien Nader
--- a/2
+++ b/2
@ -59,8 +59,6 @@ Missing features
      - Implement threaded match finders.
      - Implement pigz-style threading in LZMA2.

-    Multithreaded decompression
-
    Buffer-to-buffer coding could use less RAM (especially when
    decompressing LZMA1 or LZMA2).

--- a/src/common/tuklib_common.h
+++ b/src/common/tuklib_common.h
@ -14,7 +14,7 @@
 #define TUKLIB_COMMON_H

 // The config file may be replaced by a package-specific file.
-// It should include at least stddef.h, inttypes.h, and limits.h.
+// It should include at least stddef.h, stdbool.h, inttypes.h, and limits.h.
 #include "tuklib_config.h"

 // TUKLIB_SYMBOL_PREFIX is prefixed to all symbols exported by
--- a/src/common/tuklib_config.h
+++ b/src/common/tuklib_config.h
@ -1,7 +1,10 @@
+// If config.h isn't available, assume that the headers required by
+// tuklib_common.h are available. This is required by crc32_tablegen.c.
 #ifdef HAVE_CONFIG_H
 #	include "sysdefs.h"
 #else
 #	include <stddef.h>
+#	include <stdbool.h>
 #	include <inttypes.h>
 #	include <limits.h>
 #endif
--- a/src/common/tuklib_integer.h
+++ b/src/common/tuklib_integer.h
@ -17,8 +17,8 @@
 ///   - Byte swapping: bswapXX(num)
 ///   - Byte order conversions to/from native (byteswaps if Y isn't
 ///     the native endianness): convXXYe(num)
-///   - Unaligned reads (16/32-bit only): readXXYe(ptr)
-///   - Unaligned writes (16/32-bit only): writeXXYe(ptr, num)
+///   - Unaligned reads: readXXYe(ptr)
+///   - Unaligned writes: writeXXYe(ptr, num)
 ///   - Aligned reads: aligned_readXXYe(ptr)
 ///   - Aligned writes: aligned_writeXXYe(ptr, num)
 ///
@ -343,6 +343,46 @@ read32le(const uint8_t *buf)
 }


+static inline uint64_t
+read64be(const uint8_t *buf)
+{
+#if defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
+	uint64_t num = read64ne(buf);
+	return conv64be(num);
+#else
+	uint64_t num = (uint64_t)buf[0] << 56;
+	num |= (uint64_t)buf[1] << 48;
+	num |= (uint64_t)buf[2] << 40;
+	num |= (uint64_t)buf[3] << 32;
+	num |= (uint64_t)buf[4] << 24;
+	num |= (uint64_t)buf[5] << 16;
+	num |= (uint64_t)buf[6] << 8;
+	num |= (uint64_t)buf[7];
+	return num;
+#endif
+}
+
+
+static inline uint64_t
+read64le(const uint8_t *buf)
+{
+#if !defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
+	uint64_t num = read64ne(buf);
+	return conv64le(num);
+#else
+	uint64_t num = (uint64_t)buf[0];
+	num |= (uint64_t)buf[1] << 8;
+	num |= (uint64_t)buf[2] << 16;
+	num |= (uint64_t)buf[3] << 24;
+	num |= (uint64_t)buf[4] << 32;
+	num |= (uint64_t)buf[5] << 40;
+	num |= (uint64_t)buf[6] << 48;
+	num |= (uint64_t)buf[7] << 56;
+	return num;
+#endif
+}
+
+
 // NOTE: Possible byte swapping must be done in a macro to allow the compiler
 // to optimize byte swapping of constants when using glibc's or *BSD's
 // byte swapping macros. The actual write is done in an inline function
@ -350,11 +390,13 @@ read32le(const uint8_t *buf)
 #if defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
 #	define write16be(buf, num) write16ne(buf, conv16be(num))
 #	define write32be(buf, num) write32ne(buf, conv32be(num))
+#	define write64be(buf, num) write64ne(buf, conv64be(num))
 #endif

 #if !defined(WORDS_BIGENDIAN) || defined(TUKLIB_FAST_UNALIGNED_ACCESS)
 #	define write16le(buf, num) write16ne(buf, conv16le(num))
 #	define write32le(buf, num) write32ne(buf, conv32le(num))
+#	define write64le(buf, num) write64ne(buf, conv64le(num))
 #endif


--- a/src/liblzma/api/lzma/base.h
+++ b/src/liblzma/api/lzma/base.h
@ -240,6 +240,36 @@ typedef enum {
 		 * can be a sign of a bug in liblzma. See the documentation
 		 * how to report bugs.
 		 */
+
+	LZMA_SEEK_NEEDED        = 12,
+		/**<
+		 * \brief       Request to change the input file position
+		 *
+		 * Some coders can do random access in the input file. The
+		 * initialization functions of these coders take the file size
+		 * as an argument. No other coders can return LZMA_SEEK_NEEDED.
+		 *
+		 * When this value is returned, the application must seek to
+		 * the file position given in lzma_stream.seek_pos. This value
+		 * is guaranteed to never exceed the file size that was
+		 * specified at the coder initialization.
+		 *
+		 * After seeking the application should read new input and
+		 * pass it normally via lzma_stream.next_in and .avail_in.
+		 */
+
+	/*
+	 * These eumerations may be used internally by liblzma
+	 * but they will never be returned to applications.
+	 */
+	LZMA_RET_INTERNAL1      = 101,
+	LZMA_RET_INTERNAL2      = 102,
+	LZMA_RET_INTERNAL3      = 103,
+	LZMA_RET_INTERNAL4      = 104,
+	LZMA_RET_INTERNAL5      = 105,
+	LZMA_RET_INTERNAL6      = 106,
+	LZMA_RET_INTERNAL7      = 107,
+	LZMA_RET_INTERNAL8      = 108
 } lzma_ret;


@ -520,7 +550,19 @@ typedef struct {
 	void *reserved_ptr2;
 	void *reserved_ptr3;
 	void *reserved_ptr4;
-	uint64_t reserved_int1;
+
+	/**
+	 * \brief       New seek input position for LZMA_SEEK_NEEDED
+	 *
+	 * When lzma_code() returns LZMA_SEEK_NEEDED, the new input position
+	 * needed by liblzma will be available seek_pos. The value is
+	 * guaranteed to not exceed the file size that was specified when
+	 * this lzma_stream was initialized.
+	 *
+	 * In all other situations the value of this variable is undefined.
+	 */
+	uint64_t seek_pos;
+
 	uint64_t reserved_int2;
 	size_t reserved_int3;
 	size_t reserved_int4;
--- a/src/liblzma/api/lzma/bcj.h
+++ b/src/liblzma/api/lzma/bcj.h
@ -49,9 +49,13 @@
 	 * Filter for SPARC binaries.
 	 */

+#define LZMA_FILTER_ARM64       LZMA_VLI_C(0x0A)
+	/**<
+	 * Filter for ARM64 binaries.
+	 */

 /**
- * \brief       Options for BCJ filters
+ * \brief       Options for BCJ filters (except ARM64)
 *
 * The BCJ filters never change the size of the data. Specifying options
 * for them is optional: if pointer to options is NULL, default value is
--- a/src/liblzma/api/lzma/container.h
+++ b/src/liblzma/api/lzma/container.h
@ -69,7 +69,12 @@ typedef struct {
 	 *
 	 * Set this to zero if no flags are wanted.
 	 *
-	 * No flags are currently supported.
+	 * Encoder: No flags are currently supported.
+	 *
+	 * Decoder: Bitwise-or of zero or more of the decoder flags:
+	 * LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
+	 * LZMA_TELL_ANY_CHECK, LZMA_IGNORE_CHECK,
+	 * LZMA_CONCATENATED, LZMA_FAIL_FAST
 	 */
 	uint32_t flags;

@ -79,7 +84,7 @@ typedef struct {
 	uint32_t threads;

 	/**
-	 * \brief       Maximum uncompressed size of a Block
+	 * \brief       Encoder only: Maximum uncompressed size of a Block
 	 *
 	 * The encoder will start a new .xz Block every block_size bytes.
 	 * Using LZMA_FULL_FLUSH or LZMA_FULL_BARRIER with lzma_code()
@ -135,7 +140,7 @@ typedef struct {
 	uint32_t timeout;

 	/**
-	 * \brief       Compression preset (level and possible flags)
+	 * \brief       Encoder only: Compression preset
 	 *
 	 * The preset is set just like with lzma_easy_encoder().
 	 * The preset is ignored if filters below is non-NULL.
@ -143,7 +148,7 @@ typedef struct {
 	uint32_t preset;

 	/**
-	 * \brief       Filter chain (alternative to a preset)
+	 * \brief       Encoder only: Filter chain (alternative to a preset)
 	 *
 	 * If this is NULL, the preset above is used. Otherwise the preset
 	 * is ignored and the filter chain specified here is used.
@ -151,7 +156,7 @@ typedef struct {
 	const lzma_filter *filters;

 	/**
-	 * \brief       Integrity check type
+	 * \brief       Encoder only: Integrity check type
 	 *
 	 * See check.h for available checks. The xz command line tool
 	 * defaults to LZMA_CHECK_CRC64, which is a good choice if you
@ -173,8 +178,50 @@ typedef struct {
 	uint32_t reserved_int2;
 	uint32_t reserved_int3;
 	uint32_t reserved_int4;
-	uint64_t reserved_int5;
-	uint64_t reserved_int6;
+
+	/**
+	 * \brief       Memory usage limit to reduce the number of threads
+	 *
+	 * Encoder: Ignored.
+	 *
+	 * Decoder:
+	 *
+	 * If the number of threads has been set so high that more than
+	 * memlimit_threading bytes of memory would be needed, the number
+	 * of threads will be reduced so that the memory usage will not exceed
+	 * memlimit_threading bytes. However, if memlimit_threading cannot
+	 * be met even in single-threaded mode, then decoding will continue
+	 * in single-threaded mode and memlimit_threading may be exceeded
+	 * even by a large amount. That is, memlimit_threading will never make
+	 * lzma_code() return LZMA_MEMLIMIT_ERROR. To truly cap the memory
+	 * usage, see memlimit_stop below.
+	 *
+	 * Setting memlimit_threading to UINT64_MAX or a similar huge value
+	 * means that liblzma is allowed to keep the whole compressed file
+	 * and the whole uncompressed file in memory in addition to the memory
+	 * needed by the decompressor data structures used by each thread!
+	 * In other words, a reasonable value limit must be set here or it
+	 * will cause problems sooner or later. If you have no idea what
+	 * a reasonable value could be, try lzma_physmem() / 4 as a starting
+	 * point. Setting this limit will never prevent decompression of
+	 * a file; this will only reduce the number of threads.
+	 *
+	 * If memlimit_threading is greater than memlimit_stop, then the value
+	 * of memlimit_stop will be used for both.
+	 */
+	uint64_t memlimit_threading;
+
+	/**
+	 * \brief       Memory usage limit that should never be exceeded
+	 *
+	 * Encoder: Ignored.
+	 *
+	 * Decoder: If decompressing will need more than this amount of
+	 * memory even in the single-threaded mode, then lzma_code() will
+	 * return LZMA_MEMLIMIT_ERROR.
+	 */
+	uint64_t memlimit_stop;
+
 	uint64_t reserved_int7;
 	uint64_t reserved_int8;
 	void *reserved_ptr1;
@ -444,6 +491,60 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
 		lzma_nothrow lzma_attr_warn_unused_result;


+/**
+ * \brief       MicroLZMA encoder
+ *
+ * The MicroLZMA format is a raw LZMA stream whose first byte (always 0x00)
+ * has been replaced with bitwise-negation of the LZMA properties (lc/lp/pb).
+ * This encoding ensures that the first byte of MicroLZMA stream is never
+ * 0x00. There is no end of payload marker and thus the uncompressed size
+ * must be stored separately. For the best error detection the dictionary
+ * size should be stored separately as well but alternatively one may use
+ * the uncompressed size as the dictionary size when decoding.
+ *
+ * With the MicroLZMA encoder, lzma_code() behaves slightly unusually.
+ * The action argument must be LZMA_FINISH and the return value will never be
+ * LZMA_OK. Thus the encoding is always done with a single lzma_code() after
+ * the initialization. The benefit of the combination of initialization
+ * function and lzma_code() is that memory allocations can be re-used for
+ * better performance.
+ *
+ * lzma_code() will try to encode as much input as is possible to fit into
+ * the given output buffer. If not all input can be encoded, the stream will
+ * be finished without encoding all the input. The caller must check both
+ * input and output buffer usage after lzma_code() (total_in and total_out
+ * in lzma_stream can be convenient). Often lzma_code() can fill the output
+ * buffer completely if there is a lot of input, but sometimes a few bytes
+ * may remain unused because the next LZMA symbol would require more space.
+ *
+ * lzma_stream.avail_out must be at least 6. Otherwise LZMA_PROG_ERROR
+ * will be returned.
+ *
+ * The LZMA dictionary should be reasonably low to speed up the encoder
+ * re-initialization. A good value is bigger than the resulting
+ * uncompressed size of most of the output chunks. For example, if output
+ * size is 4 KiB, dictionary size of 32 KiB or 64 KiB is good. If the
+ * data compresses extremely well, even 128 KiB may be useful.
+ *
+ * The MicroLZMA format and this encoder variant were made with the EROFS
+ * file system in mind. This format may be convenient in other embedded
+ * uses too where many small streams are needed. XZ Embedded includes a
+ * decoder for this format.
+ *
+ * \return      - LZMA_STREAM_END: All good. Check the amounts of input used
+ *                and output produced. Store the amount of input used
+ *                (uncompressed size) as it needs to be known to decompress
+ *                the data.
+ *              - LZMA_OPTIONS_ERROR
+ *              - LZMA_MEM_ERROR
+ *              - LZMA_PROG_ERROR: In addition to the generic reasons for this
+ *                error code, this may also be returned if there isn't enough
+ *                output space (6 bytes) to create a valid MicroLZMA stream.
+ */
+extern LZMA_API(lzma_ret) lzma_microlzma_encoder(
+		lzma_stream *strm, const lzma_options_lzma *options);
+
+
 /************
 * Decoding *
 ************/
@ -501,8 +602,8 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
 /**
 * This flag enables decoding of concatenated files with file formats that
 * allow concatenating compressed files as is. From the formats currently
- * supported by liblzma, only the .xz format allows concatenated files.
- * Concatenated files are not allowed with the legacy .lzma format.
+ * supported by liblzma, only the .xz and .lz formats allow concatenated
+ * files. Concatenated files are not allowed with the legacy .lzma format.
 *
 * This flag also affects the usage of the `action' argument for lzma_code().
 * When LZMA_CONCATENATED is used, lzma_code() won't return LZMA_STREAM_END
@ -515,6 +616,35 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
 #define LZMA_CONCATENATED               UINT32_C(0x08)


+/**
+ * This flag makes the threaded decoder report errors (like LZMA_DATA_ERROR)
+ * as soon as they are detected. This saves time when the application has no
+ * interest in a partially decompressed truncated or corrupt file. Note that
+ * due to timing randomness, if the same truncated or corrupt input is
+ * decompressed multiple times with this flag, a different amount of output
+ * may be produced by different runs, and even the error code might vary.
+ *
+ * When using LZMA_FAIL_FAST, it is recommended to use LZMA_FINISH to tell
+ * the decoder when no more input will be coming because it can help fast
+ * detection and reporting of truncated files. Note that in this situation
+ * truncated files might be diagnosed with LZMA_DATA_ERROR instead of
+ * LZMA_OK or LZMA_BUF_ERROR!
+ *
+ * Without this flag the threaded decoder will provide as much output as
+ * possible at first and then report the pending error. This default behavior
+ * matches the single-threaded decoder and provides repeatable behavior
+ * with truncated or corrupt input. There are a few special cases where the
+ * behavior can still differ like memory allocation failures (LZMA_MEM_ERROR).
+ *
+ * Single-threaded decoders currently ignore this flag.
+ *
+ * Support for this flag was added in liblzma 5.3.3alpha. Note that in older
+ * versions this flag isn't supported (LZMA_OPTIONS_ERROR) even by functions
+ * that ignore this flag in newer liblzma versions.
+ */
+#define LZMA_FAIL_FAST                  UINT32_C(0x20)
+
+
 /**
 * \brief       Initialize .xz Stream decoder
 *
@ -527,7 +657,7 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
 * \param       flags       Bitwise-or of zero or more of the decoder flags:
 *                          LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
 *                          LZMA_TELL_ANY_CHECK, LZMA_IGNORE_CHECK,
- *                          LZMA_CONCATENATED
+ *                          LZMA_CONCATENATED, LZMA_FAIL_FAST
 *
 * \return      - LZMA_OK: Initialization was successful.
 *              - LZMA_MEM_ERROR: Cannot allocate memory.
@ -540,11 +670,43 @@ extern LZMA_API(lzma_ret) lzma_stream_decoder(


 /**
- * \brief       Decode .xz Streams and .lzma files with autodetection
+ * \brief       Initialize multithreaded .xz Stream decoder
 *
- * This decoder autodetects between the .xz and .lzma file formats, and
- * calls lzma_stream_decoder() or lzma_alone_decoder() once the type
- * of the input file has been detected.
+ * \param       strm        Pointer to properly prepared lzma_stream
+ * \param       options     Pointer to multithreaded compression options
+ *
+ * The decoder can decode multiple Blocks in parallel. This requires that each
+ * Block Header contains the Compressed Size and Uncompressed size fields
+ * which are added by the multi-threaded encoder, see lzma_stream_encoder_mt().
+ *
+ * A Stream with one Block will only utilize one thread. A Stream with multiple
+ * Blocks but without size information in Block Headers will be processed in
+ * single-threaded mode in the same way as done by lzma_stream_decoder().
+ * Concatenated Streams are processed one Stream at a time; no inter-Stream
+ * parallelization is done.
+ *
+ * This function behaves like lzma_stream_decoder() when options->threads == 1
+ * and options->memlimit_threading <= 1.
+ *
+ * \return      - LZMA_OK: Initialization was successful.
+ *              - LZMA_MEM_ERROR: Cannot allocate memory.
+ *              - LZMA_MEMLIMIT_ERROR: Memory usage limit was reached.
+ *              - LZMA_OPTIONS_ERROR: Unsupported flags.
+ *              - LZMA_PROG_ERROR
+ */
+extern LZMA_API(lzma_ret) lzma_stream_decoder_mt(
+		lzma_stream *strm, const lzma_mt *options)
+		lzma_nothrow lzma_attr_warn_unused_result;
+
+
+/**
+ * \brief       Decode .xz, .lzma, and .lz (lzip) files with autodetection
+ *
+ * This decoder autodetects between the .xz, .lzma, and .lz file formats,
+ * and calls lzma_stream_decoder(), lzma_alone_decoder(), or
+ * lzma_lzip_decoder() once the type of the input file has been detected.
+ *
+ * Support for .lz was added in 5.4.0.
 *
 * If the flag LZMA_CONCATENATED is used and the input is a .lzma file:
 * For historical reasons concatenated .lzma files aren't supported.
@ -562,7 +724,7 @@ extern LZMA_API(lzma_ret) lzma_stream_decoder(
 * \param       flags       Bitwise-or of zero or more of the decoder flags:
 *                          LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
 *                          LZMA_TELL_ANY_CHECK, LZMA_IGNORE_CHECK,
- *                          LZMA_CONCATENATED
+ *                          LZMA_CONCATENATED, LZMA_FAIL_FAST
 *
 * \return      - LZMA_OK: Initialization was successful.
 *              - LZMA_MEM_ERROR: Cannot allocate memory.
@ -597,6 +759,64 @@ extern LZMA_API(lzma_ret) lzma_alone_decoder(
 		lzma_nothrow lzma_attr_warn_unused_result;


+/**
+ * \brief       Initialize .lz (lzip) decoder (a foreign file format)
+ *
+ * \param       strm        Pointer to properly prepared lzma_stream
+ * \param       memlimit    Memory usage limit as bytes. Use UINT64_MAX
+ *                          to effectively disable the limiter.
+ * \param       flags       Bitwise-or of flags, or zero for no flags.
+ *                          All decoder flags listed above are supported
+ *                          although only LZMA_CONCATENATED and (in very rare
+ *                          cases) LZMA_IGNORE_CHECK are actually useful.
+ *                          LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
+ *                          and LZMA_FAIL_FAST do nothing. LZMA_TELL_ANY_CHECK
+ *                          is supported for consistency only as CRC32 is
+ *                          always used in the .lz format.
+ *
+ * This decoder supports the .lz format version 0 and the unextended .lz
+ * format version 1:
+ *
+ *   - Files in the format version 0 were produced by lzip 1.3 and older.
+ *     Such files aren't common but may be found from file archives
+ *     as a few source packages were released in this format. People
+ *     might have old personal files in this format too. Decompression
+ *     support for the format version 0 was removed in lzip 1.18.
+ *
+ *   - lzip 1.3 added decompression support for .lz format version 1 files.
+ *     Compression support was added in lzip 1.4. In lzip 1.6 the .lz format
+ *     version 1 was extended to support the Sync Flush marker. This extension
+ *     is not supported by liblzma. lzma_code() will return LZMA_DATA_ERROR
+ *     at the location of the Sync Flush marker. In practice files with
+ *     the Sync Flush marker are very rare and thus liblzma can decompress
+ *     almost all .lz files.
+ *
+ * Just like with lzma_stream_decoder() for .xz files, LZMA_CONCATENATED
+ * should be used when decompressing normal standalone .lz files.
+ *
+ * The .lz format allows putting non-.lz data at the end of a file after at
+ * least one valid .lz member. That is, one can append custom data at the end
+ * of a .lz file and the decoder is required to ignore it. In liblzma this
+ * is relevant only when LZMA_CONCATENATED is used. In that case lzma_code()
+ * will return LZMA_STREAM_END and leave lzma_stream.next_in pointing to
+ * the first byte of the non-.lz data. An exception to this is if the first
+ * 1-3 bytes of the non-.lz data are identical to the .lz magic bytes
+ * (0x4C, 0x5A, 0x49, 0x50; "LZIP" in US-ASCII). In such a case the 1-3 bytes
+ * will have been ignored by lzma_code(). If one wishes to locate the non-.lz
+ * data reliably, one must ensure that the first byte isn't 0x4C. Actually
+ * one should ensure that none of the first four bytes of trailing data are
+ * equal to the magic bytes because lzip >= 1.20 requires it by default.
+ *
+ * \return      - LZMA_OK: Initialization was successful.
+ *              - LZMA_MEM_ERROR: Cannot allocate memory.
+ *              - LZMA_OPTIONS_ERROR: Unsupported flags
+ *              - LZMA_PROG_ERROR
+ */
+extern LZMA_API(lzma_ret) lzma_lzip_decoder(
+		lzma_stream *strm, uint64_t memlimit, uint32_t flags)
+		lzma_nothrow lzma_attr_warn_unused_result;
+
+
 /**
 * \brief       Single-call .xz Stream decoder
 *
@ -606,9 +826,9 @@ extern LZMA_API(lzma_ret) lzma_alone_decoder(
 *                          returned.
 * \param       flags       Bitwise-or of zero or more of the decoder flags:
 *                          LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK,
- *                          LZMA_IGNORE_CHECK, LZMA_CONCATENATED. Note that
- *                          LZMA_TELL_ANY_CHECK is not allowed and will
- *                          return LZMA_PROG_ERROR.
+ *                          LZMA_IGNORE_CHECK, LZMA_CONCATENATED,
+ *                          LZMA_FAIL_FAST. Note that LZMA_TELL_ANY_CHECK
+ *                          is not allowed and will return LZMA_PROG_ERROR.
 * \param       allocator   lzma_allocator for custom allocator functions.
 *                          Set to NULL to use malloc() and free().
 * \param       in          Beginning of the input buffer
@ -642,3 +862,43 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_decode(
 		const uint8_t *in, size_t *in_pos, size_t in_size,
 		uint8_t *out, size_t *out_pos, size_t out_size)
 		lzma_nothrow lzma_attr_warn_unused_result;
+
+
+/**
+ * \brief       MicroLZMA decoder
+ *
+ * See lzma_microlzma_decoder() for more information.
+ *
+ * The lzma_code() usage with this decoder is completely normal. The
+ * special behavior of lzma_code() applies to lzma_microlzma_encoder() only.
+ *
+ * \param       strm        Pointer to properly prepared lzma_stream
+ * \param       comp_size   Compressed size of the MicroLZMA stream.
+ *                          The caller must somehow know this exactly.
+ * \param       uncomp_size Uncompressed size of the MicroLZMA stream.
+ *                          If the exact uncompressed size isn't known, this
+ *                          can be set to a value that is at most as big as
+ *                          the exact uncompressed size would be, but then the
+ *                          next argument uncomp_size_is_exact must be false.
+ * \param       uncomp_size_is_exact
+ *                          If true, uncomp_size must be exactly correct.
+ *                          This will improve error detection at the end of
+ *                          the stream. If the exact uncompressed size isn't
+ *                          known, this must be false. uncomp_size must still
+ *                          be at most as big as the exact uncompressed size
+ *                          is. Setting this to false when the exact size is
+ *                          known will work but error detection at the end of
+ *                          the stream will be weaker.
+ * \param       dict_size   LZMA dictionary size that was used when
+ *                          compressing the data. It is OK to use a bigger
+ *                          value too but liblzma will then allocate more
+ *                          memory than would actually be required and error
+ *                          detection will be slightly worse. (Note that with
+ *                          the implementation in XZ Embedded it doesn't
+ *                          affect the memory usage if one specifies bigger
+ *                          dictionary than actually required.)
+ */
+extern LZMA_API(lzma_ret) lzma_microlzma_decoder(
+		lzma_stream *strm, uint64_t comp_size,
+		uint64_t uncomp_size, lzma_bool uncomp_size_is_exact,
+		uint32_t dict_size);
--- a/src/liblzma/api/lzma/filter.h
+++ b/src/liblzma/api/lzma/filter.h
@ -124,6 +124,27 @@ extern LZMA_API(lzma_ret) lzma_filters_copy(
 		lzma_nothrow lzma_attr_warn_unused_result;


+/**
+ * \brief       Free the options in the array of lzma_filter structures
+ *
+ * This frees the filter chain options. The filters array itself is not freed.
+ *
+ * The filters array must have at most LZMA_FILTERS_MAX + 1 elements
+ * including the terminating element which must have .id = LZMA_VLI_UNKNOWN.
+ * For all elements before the terminating element:
+ *   - options will be freed using the given lzma_allocator or,
+ *     if allocator is NULL, using free().
+ *   - options will be set to NULL.
+ *   - id will be set to LZMA_VLI_UNKNOWN.
+ *
+ * If filters is NULL, this does nothing but remember that this never frees
+ * the filters array itself.
+ */
+extern LZMA_API(void) lzma_filters_free(
+		lzma_filter *filters, const lzma_allocator *allocator)
+		lzma_nothrow;
+
+
 /**
 * \brief       Calculate approximate memory requirements for raw encoder
 *
@ -205,21 +226,27 @@ extern LZMA_API(lzma_ret) lzma_raw_decoder(
 /**
 * \brief       Update the filter chain in the encoder
 *
- * This function is for advanced users only. This function has two slightly
- * different purposes:
+ * This function may be called after lzma_code() has returned LZMA_STREAM_END
+ * when LZMA_FULL_BARRIER, LZMA_FULL_FLUSH, or LZMA_SYNC_FLUSH was used:
 *
- *  - After LZMA_FULL_FLUSH when using Stream encoder: Set a new filter
- *    chain, which will be used starting from the next Block.
+ *  - After LZMA_FULL_BARRIER or LZMA_FULL_FLUSH: Single-threaded .xz Stream
+ *    encoder (lzma_stream_encoder()) and (since liblzma 5.4.0) multi-threaded
+ *    Stream encoder (lzma_stream_encoder_mt()) allow setting a new filter
+ *    chain to be used for the next Block(s).
 *
- *  - After LZMA_SYNC_FLUSH using Raw, Block, or Stream encoder: Change
- *    the filter-specific options in the middle of encoding. The actual
- *    filters in the chain (Filter IDs) cannot be changed. In the future,
- *    it might become possible to change the filter options without
- *    using LZMA_SYNC_FLUSH.
+ *  - After LZMA_SYNC_FLUSH: Raw encoder (lzma_raw_encoder()),
+ *    Block encocder (lzma_block_encoder()), and single-threaded .xz Stream
+ *    encoder (lzma_stream_encoder()) allow changing certain filter-specific
+ *    options in the middle of encoding. The actual filters in the chain
+ *    (Filter IDs) must not be changed! Currently only the lc, lp, and pb
+ *    options of LZMA2 (not LZMA1) can be changed this way.
 *
- * While rarely useful, this function may be called also when no data has
- * been compressed yet. In that case, this function will behave as if
- * LZMA_FULL_FLUSH (Stream encoder) or LZMA_SYNC_FLUSH (Raw or Block
+ *  - In the future some filters might allow changing some of their options
+ *    without any barrier or flushing but currently such filters don't exist.
+ *
+ * This function may also be called when no data has been compressed yet
+ * although this is rarely useful. In that case, this function will behave
+ * as if LZMA_FULL_FLUSH (Stream encoders) or LZMA_SYNC_FLUSH (Raw or Block
 * encoder) had been used right before calling this function.
 *
 * \return      - LZMA_OK
@ -427,3 +454,261 @@ extern LZMA_API(lzma_ret) lzma_filter_flags_decode(
 		lzma_filter *filter, const lzma_allocator *allocator,
 		const uint8_t *in, size_t *in_pos, size_t in_size)
 		lzma_nothrow lzma_attr_warn_unused_result;
+
+
+/***********
+ * Strings *
+ ***********/
+
+/**
+ * \brief       Allow or show all filters
+ *
+ * By default only the filters supported in the .xz format are accept by
+ * lzma_str_to_filters() or shown by lzma_str_list_filters().
+ */
+#define LZMA_STR_ALL_FILTERS    UINT32_C(0x01)
+
+
+/**
+ * \brief       Do not validate the filter chain in lzma_str_to_filters()
+ *
+ * By default lzma_str_to_filters() can return an error if the filter chain
+ * as a whole isn't usable in the .xz format or in the raw encoder or decoder.
+ * With this flag the validation is skipped (this doesn't affect the handling
+ * of the individual filter options).
+ */
+#define LZMA_STR_NO_VALIDATION  UINT32_C(0x02)
+
+
+/**
+ * \brief       Stringify encoder options
+ *
+ * Show the filter-specific options that the encoder will use.
+ * This may be useful for verbose diagnostic messages.
+ *
+ * Note that if options were decoded from .xz headers then the encoder options
+ * may be undefined. This flag shouldn't be used in such a situation.
+ */
+#define LZMA_STR_ENCODER        UINT32_C(0x10)
+
+
+/**
+ * \brief       Stringify decoder options
+ *
+ * Show the filter-specific options that the decoder will use.
+ * This may be useful for showing what filter options were decoded
+ * from file headers.
+ */
+#define LZMA_STR_DECODER        UINT32_C(0x20)
+
+
+/**
+ * \brief       Produce xz-compatible getopt_long() syntax
+ *
+ * That is, "delta:dist=2 lzma2:dict=4MiB,pb=1,lp=1" becomes
+ * "--delta=dist=2 --lzma2=dict=4MiB,pb=1,lp=1".
+ *
+ * This syntax is compatible with xz 5.0.0 as long as the filters and
+ * their options are supported too.
+ */
+#define LZMA_STR_GETOPT_LONG    UINT32_C(0x40)
+
+
+/**
+ * \brief       Use two dashes "--" instead of a space to separate filters
+ *
+ * That is, "delta:dist=2 lzma2:pb=1,lp=1" becomes
+ * "delta:dist=2--lzma2:pb=1,lp=1". This looks slightly odd but this
+ * kind of strings should be usable on the command line without quoting.
+ * However, it is possible that future versions with new filter options
+ * might produce strings that require shell quoting anyway as the exact
+ * set of possible characters isn't frozen for now.
+ *
+ * It is guaranteed that the single quote (') will never be used in
+ * filter chain strings (even if LZMA_STR_NO_SPACES isn't used).
+ */
+#define LZMA_STR_NO_SPACES      UINT32_C(0x80)
+
+
+/**
+ * \brief       Convert a string to a filter chain
+ *
+ * This tries to make it easier to write applications that allow users
+ * to set custom compression options. This only handles the filter
+ * configuration (including presets) but not the number of threads,
+ * block size, check type, or memory limits.
+ *
+ * The input string can be either a preset or a filter chain. Presets
+ * begin with a digit 0-9 and may be followed by zero or more flags
+ * which are lower-case letters. Currently only "e" is supported, matching
+ * LZMA_PRESET_EXTREME. For partial xz command line syntax compatibility,
+ * a preset string may start with a single dash "-".
+ *
+ * A filter chain consists of one or more "filtername:opt1=value1,opt2=value2"
+ * strings separated by one or more spaces. Leading and trailing spaces are
+ * ignored. All names and values must be lower-case. Extra commas in the
+ * option list are ignored. The order of filters is significant: when
+ * encoding, the uncompressed input data goes to the leftmost filter first.
+ * Normally "lzma2" is the last filter in the chain.
+ *
+ * If one wishes to avoid spaces, for example, to avoid shell quoting,
+ * it is possible to use two dashes "--" instead of spaces to separate
+ * the filters.
+ *
+ * For xz command line compatibility, each filter may be prefixed with
+ * two dashes "--" and the colon ":" separating the filter name from
+ * the options may be replaced with an equals sign "=".
+ *
+ * By default, only filters that can be used in the .xz format are accepted.
+ * To allow all filters (LZMA1) use the flag LZMA_STR_ALL_FILTERS.
+ *
+ * By default, very basic validation is done for the filter chain as a whole,
+ * for example, that LZMA2 is only used as the last filter in the chain.
+ * The validation isn't perfect though and it's possible that this function
+ * succeeds but using the filter chain for encoding or decoding will still
+ * result in LZMA_OPTIONS_ERROR. To disable this validation, use the flag
+ * LZMA_STR_NO_VALIDATION.
+ *
+ * The available filter names and their options are available via
+ * lzma_str_list_filters(). See the xz man page for the description
+ * of filter names and options.
+ *
+ * \param       str         User-supplied string describing a preset or
+ *                          a filter chain. If a default value is needed and
+ *                          you don't know what would be good, use "6" since
+ *                          that is the default preset in xz too.
+ * \param       error_pos   If this isn't NULL, this value will be set on
+ *                          both success and on all errors. This tells the
+ *                          location of the error in the string. This is
+ *                          an int to make it straightforward to use this
+ *                          as printf() field width. The value is guaranteed
+ *                          to be in the range [0, INT_MAX] even if strlen(str)
+ *                          somehow was greater than INT_MAX.
+ * \param       filters     An array of lzma_filter structures. There must
+ *                          be LZMA_FILTERS_MAX + 1 (that is, five) elements
+ *                          in the array. The old contents are ignored so it
+ *                          doesn't need to be initialized. This array is
+ *                          modified only if this function returns LZMA_OK.
+ *                          Once the allocated filter options are no longer
+ *                          needed, lzma_filters_free() can be used to free the
+ *                          options (it doesn't free the filters array itself).
+ * \param       flags       Bitwise-or of zero or more of the flags
+ *                          LZMA_STR_ALL_FILTERS and LZMA_STR_NO_VALIDATION.
+ * \param       allocator   lzma_allocator for custom allocator functions.
+ *                          Set to NULL to use malloc() and free().
+ *
+ * \return      On success, NULL is returned. On error, a statically-allocated
+ *              error message is returned which together with the error_pos
+ *              should give some idea what is wrong.
+ *
+ * For command line applications, below is an example how an error message
+ * can be displayed. Note the use of an empty string for the field width.
+ * If "^" was used there it would create an off-by-one error except at
+ * the very beginning of the line.
+ *
+ * \code{.c}
+ * const char *str = ...; // From user
+ * lzma_filter filters[LZMA_FILTERS_MAX + 1];
+ * int pos;
+ * const char *msg = lzma_str_to_filters(str, &pos, filters, 0, NULL);
+ * if (msg != NULL) {
+ *     printf("%s: Error in XZ compression options:\n", argv[0]);
+ *     printf("%s: %s\n", argv[0], str);
+ *     printf("%s: %*s^\n", argv[0], errpos, "");
+ *     printf("%s: %s\n", argv[0], msg);
+ * }
+ * \endcode
+ */
+extern LZMA_API(const char *) lzma_str_to_filters(
+		const char *str, int *error_pos, lzma_filter *filters,
+		uint32_t flags, const lzma_allocator *allocator)
+		lzma_nothrow lzma_attr_warn_unused_result;
+
+
+/**
+ * \brief       Convert a filter chain to a string
+ *
+ * Use cases:
+ *
+ *   - Verbose output showing the full encoder options to the user
+ *     (use LZMA_STR_ENCODER in flags)
+ *
+ *   - Showing the filters and options that are required to decode a file
+ *     (use LZMA_STR_DECODER in flags)
+ *
+ *   - Showing the filter names without any options in informational messages
+ *     where the technical details aren't important (no flags). In this case
+ *     the .options in the filters array are ignored and may be NULL even if
+ *     a filter has a mandatory options structure.
+ *
+ * Note that even if the filter chain was specified using a preset,
+ * the resulting filter chain isn't reversed to a preset. So if you
+ * specify "6" to lzma_str_to_filters() then lzma_str_from_filters()
+ * will produce a string containing "lzma2".
+ *
+ * \param       str         On success *str will be set to point to an
+ *                          allocated string describing the given filter
+ *                          chain. Old value is ignored. On error *str is
+ *                          always set to NULL.
+ * \param       filters     Array of 1-4 filters and a terminating element
+ *                          with .id = LZMA_VLI_UNKNOWN.
+ * \param       flags       Bitwise-or of zero or more of the flags
+ *                          LZMA_STR_ENCODER, LZMA_STR_DECODER,
+ *                          LZMA_STR_GETOPT_LONG, and LZMA_STR_NO_SPACES.
+ * \param       allocator   lzma_allocator for custom allocator functions.
+ *                          Set to NULL to use malloc() and free().
+ *
+ * \return      - LZMA_OK
+ *              - LZMA_OPTIONS_ERROR: Empty filter chain
+ *                (filters[0].id == LZMA_VLI_UNKNOWN) or the filter chain
+ *                includes a Filter ID that is not supported by this function.
+ *              - LZMA_MEM_ERROR
+ *              - LZMA_PROG_ERROR
+ */
+extern LZMA_API(lzma_ret) lzma_str_from_filters(
+		char **str, const lzma_filter *filters, uint32_t flags,
+		const lzma_allocator *allocator)
+		lzma_nothrow lzma_attr_warn_unused_result;
+
+
+/**
+ * \brief       List available filters and/or their options (for help message)
+ *
+ * If a filter_id is given then only one line is created which contains the
+ * filter name. If LZMA_STR_ENCODER or LZMA_STR_DECODER is used then the
+ * options required for encoding or decoding are listed on the same line too.
+ *
+ * If filter_id is LZMA_VLI_UNKNOWN then all supported .xz-compatible filters
+ * are listed:
+ *
+ *   - If neither LZMA_STR_ENCODER nor LZMA_STR_DECODER is used then
+ *     the supported filter names are listed on a single line separated
+ *     by spaces.
+ *
+ *   - If LZMA_STR_ENCODER or LZMA_STR_DECODER is used then filters and
+ *     the supported options are listed one filter per line. There won't
+ *     be a '\n' after the last filter.
+ *
+ *   - If LZMA_STR_ALL_FILTERS is used then the list will include also
+ *     those filters that cannot be used in the .xz format (LZMA1).
+ *
+ * \param       str         On success *str will be set to point to an
+ *                          allocated string listing the filters and options.
+ *                          Old value is ignored. On error *str is always set
+ *                          to NULL.
+ * \param       filter_id   Filter ID or LZMA_VLI_UNKNOWN.
+ * \param       flags       Bitwise-or of zero or more of the flags
+ *                          LZMA_STR_ALL_FILTERS, LZMA_STR_ENCODER,
+ *                          LZMA_STR_DECODER, and LZMA_STR_GETOPT_LONG.
+ * \param       allocator   lzma_allocator for custom allocator functions.
+ *                          Set to NULL to use malloc() and free().
+ *
+ * \return      - LZMA_OK
+ *              - LZMA_OPTIONS_ERROR: Unsupported filter_id or flags
+ *              - LZMA_MEM_ERROR
+ *              - LZMA_PROG_ERROR
+ */
+extern LZMA_API(lzma_ret) lzma_str_list_filters(
+		char **str, lzma_vli filter_id, uint32_t flags,
+		const lzma_allocator *allocator)
+		lzma_nothrow lzma_attr_warn_unused_result;
--- a/src/liblzma/api/lzma/index.h
+++ b/src/liblzma/api/lzma/index.h
@ -684,3 +684,69 @@ extern LZMA_API(lzma_ret) lzma_index_buffer_decode(lzma_index **i,
 		uint64_t *memlimit, const lzma_allocator *allocator,
 		const uint8_t *in, size_t *in_pos, size_t in_size)
 		lzma_nothrow;
+
+
+/**
+ * \brief       Initialize a .xz file information decoder
+ *
+ * \param       strm        Pointer to a properly prepared lzma_stream
+ * \param       dest_index  Pointer to a pointer where the decoder will put
+ *                          the decoded lzma_index. The old value
+ *                          of *dest_index is ignored (not freed).
+ * \param       memlimit    How much memory the resulting lzma_index is
+ *                          allowed to require. Use UINT64_MAX to
+ *                          effectively disable the limiter.
+ * \param       file_size   Size of the input .xz file
+ *
+ * This decoder decodes the Stream Header, Stream Footer, Index, and
+ * Stream Padding field(s) from the input .xz file and stores the resulting
+ * combined index in *dest_index. This information can be used to get the
+ * uncompressed file size with lzma_index_uncompressed_size(*dest_index) or,
+ * for example, to implement random access reading by locating the Blocks
+ * in the Streams.
+ *
+ * To get the required information from the .xz file, lzma_code() may ask
+ * the application to seek in the input file by returning LZMA_SEEK_NEEDED
+ * and having the target file position specified in lzma_stream.seek_pos.
+ * The number of seeks required depends on the input file and how big buffers
+ * the application provides. When possible, the decoder will seek backward
+ * and forward in the given buffer to avoid useless seek requests. Thus, if
+ * the application provides the whole file at once, no external seeking will
+ * be required (that is, lzma_code() won't return LZMA_SEEK_NEEDED).
+ *
+ * The value in lzma_stream.total_in can be used to estimate how much data
+ * liblzma had to read to get the file information. However, due to seeking
+ * and the way total_in is updated, the value of total_in will be somewhat
+ * inaccurate (a little too big). Thus, total_in is a good estimate but don't
+ * expect to see the same exact value for the same file if you change the
+ * input buffer size or switch to a different liblzma version.
+ *
+ * Valid `action' arguments to lzma_code() are LZMA_RUN and LZMA_FINISH.
+ * You only need to use LZMA_RUN; LZMA_FINISH is only supported because it
+ * might be convenient for some applications. If you use LZMA_FINISH and if
+ * lzma_code() asks the application to seek, remember to reset `action' back
+ * to LZMA_RUN unless you hit the end of the file again.
+ *
+ * Possible return values from lzma_code():
+ *   - LZMA_OK: All OK so far, more input needed
+ *   - LZMA_SEEK_NEEDED: Provide more input starting from the absolute
+ *     file position strm->seek_pos
+ *   - LZMA_STREAM_END: Decoding was successful, *dest_index has been set
+ *   - LZMA_FORMAT_ERROR: The input file is not in the .xz format (the
+ *     expected magic bytes were not found from the beginning of the file)
+ *   - LZMA_OPTIONS_ERROR: File looks valid but contains headers that aren't
+ *     supported by this version of liblzma
+ *   - LZMA_DATA_ERROR: File is corrupt
+ *   - LZMA_BUF_ERROR
+ *   - LZMA_MEM_ERROR
+ *   - LZMA_MEMLIMIT_ERROR
+ *   - LZMA_PROG_ERROR
+ *
+ * \return      - LZMA_OK
+ *              - LZMA_MEM_ERROR
+ *              - LZMA_PROG_ERROR
+ */
+extern LZMA_API(lzma_ret) lzma_file_info_decoder(
+		lzma_stream *strm, lzma_index **dest_index,
+		uint64_t memlimit, uint64_t file_size)
+		lzma_nothrow;
--- a/src/liblzma/api/lzma/lzma12.h
+++ b/src/liblzma/api/lzma/lzma12.h
@ -18,17 +18,40 @@


 /**
- * \brief       LZMA1 Filter ID
+ * \brief       LZMA1 Filter ID (for raw encoder/decoder only, not in .xz)
 *
 * LZMA1 is the very same thing as what was called just LZMA in LZMA Utils,
 * 7-Zip, and LZMA SDK. It's called LZMA1 here to prevent developers from
 * accidentally using LZMA when they actually want LZMA2.
- *
- * LZMA1 shouldn't be used for new applications unless you _really_ know
- * what you are doing. LZMA2 is almost always a better choice.
 */
 #define LZMA_FILTER_LZMA1       LZMA_VLI_C(0x4000000000000001)

+/**
+ * \brief       LZMA1 Filter ID with extended options (for raw encoder/decoder)
+ *
+ * This is like LZMA_FILTER_LZMA1 but with this ID a few extra options
+ * are supported in the lzma_options_lzma structure:
+ *
+ *   - A flag to tell the encoder if the end of payload marker (EOPM) alias
+ *     end of stream (EOS) marker must be written at the end of the stream.
+ *     In contrast, LZMA_FILTER_LZMA1 always writes the end marker.
+ *
+ *   - Decoder needs to be told the uncompressed size of the stream
+ *     or that it is unknown (using the special value UINT64_MAX).
+ *     If the size is known, a flag can be set to allow the presence of
+ *     the end marker anyway. In contrast, LZMA_FILTER_LZMA1 always
+ *     behaves as if the uncompressed size was unknown.
+ *
+ * This allows handling file formats where LZMA1 streams are used but where
+ * the end marker isn't allowed or where it might not (always) be present.
+ * This extended LZMA1 functionality is provided as a Filter ID for raw
+ * encoder and decoder instead of adding new encoder and decoder initialization
+ * functions because this way it is possible to also use extra filters,
+ * for example, LZMA_FILTER_X86 in a filter chain with LZMA_FILTER_LZMA1EXT,
+ * which might be needed to handle some file formats.
+ */
+#define LZMA_FILTER_LZMA1EXT    LZMA_VLI_C(0x4000000000000002)
+
 /**
 * \brief       LZMA2 Filter ID
 *
@ -374,6 +397,82 @@ typedef struct {
 	 */
 	uint32_t depth;

+	/**
+	 * \brief       For LZMA_FILTER_LZMA1EXT: Extended flags
+	 *
+	 * This is used only with LZMA_FILTER_LZMA1EXT.
+	 *
+	 * Currently only one flag is supported, LZMA_LZMA1EXT_ALLOW_EOPM:
+	 *
+	 *   - Encoder: If the flag is set, then end marker is written just
+	 *     like it is with LZMA_FILTER_LZMA1. Without this flag the
+	 *     end marker isn't written and the application has to store
+	 *     the uncompressed size somewhere outside the compressed stream.
+	 *     To decompress streams without the end marker, the appliation
+	 *     has to set the correct uncompressed size in ext_size_low and
+	 *     ext_size_high.
+	 *
+	 *   - Decoder: If the uncompressed size in ext_size_low and
+	 *     ext_size_high is set to the special value UINT64_MAX
+	 *     (indicating unknown uncompressed size) then this flag is
+	 *     ignored and the end marker must always be present, that is,
+	 *     the behavior is identical to LZMA_FILTER_LZMA1.
+	 *
+	 *     Otherwise, if this flag isn't set, then the input stream
+	 *     must not have the end marker; if the end marker is detected
+	 *     then it will result in LZMA_DATA_ERROR. This is useful when
+	 *     it is known that the stream must not have the end marker and
+	 *     strict validation is wanted.
+	 *
+	 *     If this flag is set, then it is autodetected if the end marker
+	 *     is present after the specified number of uncompressed bytes
+	 *     has been decompressed (ext_size_low and ext_size_high). The
+	 *     end marker isn't allowed in any other position. This behavior
+	 *     is useful when uncompressed size is known but the end marker
+	 *     may or may not be present. This is the case, for example,
+	 *     in .7z files (valid .7z files that have the end marker in
+	 *     LZMA1 streams are rare but they do exist).
+	 */
+	uint32_t ext_flags;
+#	define LZMA_LZMA1EXT_ALLOW_EOPM   UINT32_C(0x01)
+
+	/**
+	 * \brief       For LZMA_FILTER_LZMA1EXT: Uncompressed size (low bits)
+	 *
+	 * The 64-bit uncompressed size is needed for decompression with
+	 * LZMA_FILTER_LZMA1EXT. The size is ignored by the encoder.
+	 *
+	 * The special value UINT64_MAX indicates that the uncompressed size
+	 * is unknown and that the end of payload marker (also known as
+	 * end of stream marker) must be present to indicate the end of
+	 * the LZMA1 stream. Any other value indicates the expected
+	 * uncompressed size of the LZMA1 stream. (If LZMA1 was used together
+	 * with filters that change the size of the data then the uncompressed
+	 * size of the LZMA1 stream could be different than the final
+	 * uncompressed size of the filtered stream.)
+	 *
+	 * ext_size_low holds the least significant 32 bits of the
+	 * uncompressed size. The most significant 32 bits must be set
+	 * in ext_size_high. The macro lzma_ext_size_set(opt_lzma, u64size)
+	 * can be used to set these members.
+	 *
+	 * The 64-bit uncompressed size is split into two uint32_t variables
+	 * because there were no reserved uint64_t members and using the
+	 * same options structure for LZMA_FILTER_LZMA1, LZMA_FILTER_LZMA1EXT,
+	 * and LZMA_FILTER_LZMA2 was otherwise more convenient than having
+	 * a new options structure for LZMA_FILTER_LZMA1EXT. (Replacing two
+	 * uint32_t members with one uint64_t changes the ABI on some systems
+	 * as the alignment of this struct can increase from 4 bytes to 8.)
+	 */
+	uint32_t ext_size_low;
+
+	/**
+	 * \brief       For LZMA_FILTER_LZMA1EXT: Uncompressed size (high bits)
+	 *
+	 * This holds the most significant 32 bits of the uncompressed size.
+	 */
+	uint32_t ext_size_high;
+
 	/*
 	 * Reserved space to allow possible future extensions without
 	 * breaking the ABI. You should not touch these, because the names
@ -381,9 +480,6 @@ typedef struct {
 	 * with the currently supported options, so it is safe to leave these
 	 * uninitialized.
 	 */
-	uint32_t reserved_int1;
-	uint32_t reserved_int2;
-	uint32_t reserved_int3;
 	uint32_t reserved_int4;
 	uint32_t reserved_int5;
 	uint32_t reserved_int6;
@ -399,6 +495,19 @@ typedef struct {
 } lzma_options_lzma;


+/**
+ * \brief       Macro to set the 64-bit uncompressed size in ext_size_*
+ *
+ * This might be convenient when decoding using LZMA_FILTER_LZMA1EXT.
+ * This isn't used with LZMA_FILTER_LZMA1 or LZMA_FILTER_LZMA2.
+ */
+#define lzma_set_ext_size(opt_lzma2, u64size) \
+do { \
+	(opt_lzma2).ext_size_low = (uint32_t)(u64size); \
+	(opt_lzma2).ext_size_high = (uint32_t)((uint64_t)(u64size) >> 32); \
+} while (0)
+
+
 /**
 * \brief       Set a compression preset to lzma_options_lzma structure
 *
--- a/src/liblzma/api/lzma/version.h
+++ b/src/liblzma/api/lzma/version.h
@ -21,8 +21,8 @@
 * Version number split into components
 */
 #define LZMA_VERSION_MAJOR 5
-#define LZMA_VERSION_MINOR 2
-#define LZMA_VERSION_PATCH 9
+#define LZMA_VERSION_MINOR 4
+#define LZMA_VERSION_PATCH 0
 #define LZMA_VERSION_STABILITY LZMA_VERSION_STABILITY_STABLE

 #ifndef LZMA_VERSION_COMMIT
--- a/src/liblzma/check/crc32_small.c
+++ b/src/liblzma/check/crc32_small.c
@ -16,6 +16,9 @@
 uint32_t lzma_crc32_table[1][256];


+#ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
+__attribute__((__constructor__))
+#endif
 static void
 crc32_init(void)
 {
@ -37,18 +40,22 @@ crc32_init(void)
 }


+#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
 extern void
 lzma_crc32_init(void)
 {
 	mythread_once(crc32_init);
 	return;
 }
+#endif


 extern LZMA_API(uint32_t)
 lzma_crc32(const uint8_t *buf, size_t size, uint32_t crc)
 {
+#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
 	lzma_crc32_init();
+#endif

 	crc = ~crc;

--- a/src/liblzma/check/crc64_fast.c
+++ b/src/liblzma/check/crc64_fast.c
@ -3,11 +3,25 @@
 /// \file       crc64.c
 /// \brief      CRC64 calculation
 ///
-/// Calculate the CRC64 using the slice-by-four algorithm. This is the same
-/// idea that is used in crc32_fast.c, but for CRC64 we use only four tables
+/// There are two methods in this file. crc64_generic uses the
+/// the slice-by-four algorithm. This is the same idea that is
+/// used in crc32_fast.c, but for CRC64 we use only four tables
 /// instead of eight to avoid increasing CPU cache usage.
+///
+/// crc64_clmul uses 32/64-bit x86 SSSE3, SSE4.1, and CLMUL instructions.
+/// It was derived from
+/// https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
+/// and the public domain code from https://github.com/rawrunprotected/crc
+/// (URLs were checked on 2022-11-07).
+///
+/// FIXME: Builds for 32-bit x86 use crc64_x86.S by default instead
+/// of this file and thus CLMUL version isn't available on 32-bit x86
+/// unless configured with --disable-assembler. Even then the lookup table
+/// isn't omitted in crc64_table.c since it doesn't know that assembly
+/// code has been disabled.
 //
-//  Author:     Lasse Collin
+//  Authors:    Lasse Collin
+//              Ilya Kurdyukov
 //
 //  This file has been put into the public domain.
 //  You can do whatever you want with this file.
@ -15,6 +29,54 @@
 ///////////////////////////////////////////////////////////////////////////////

 #include "check.h"
+
+#undef CRC_GENERIC
+#undef CRC_CLMUL
+#undef CRC_USE_GENERIC_FOR_SMALL_INPUTS
+
+// If CLMUL cannot be used then only the generic slice-by-four is built.
+#if !defined(HAVE_USABLE_CLMUL)
+#	define CRC_GENERIC 1
+
+// If CLMUL is allowed unconditionally in the compiler options then the
+// generic version can be omitted. Note that this doesn't work with MSVC
+// as I don't know how to detect the features here.
+//
+// NOTE: Keep this this in sync with crc64_table.c.
+#elif (defined(__SSSE3__) && defined(__SSE4_1__) && defined(__PCLMUL__)) \
+		|| (defined(__e2k__) && __iset__ >= 6)
+#	define CRC_CLMUL 1
+
+// Otherwise build both and detect at runtime which version to use.
+#else
+#	define CRC_GENERIC 1
+#	define CRC_CLMUL 1
+
+/*
+	// The generic code is much faster with 1-8-byte inputs and has
+	// similar performance up to 16 bytes  at least in microbenchmarks
+	// (it depends on input buffer alignment too). If both versions are
+	// built, this #define will use the generic version for inputs up to
+	// 16 bytes and CLMUL for bigger inputs. It saves a little in code
+	// size since the special cases for 0-16-byte inputs will be omitted
+	// from the CLMUL code.
+#	define CRC_USE_GENERIC_FOR_SMALL_INPUTS 1
+*/
+
+#	if defined(_MSC_VER)
+#		include <intrin.h>
+#	elif defined(HAVE_CPUID_H)
+#		include <cpuid.h>
+#	endif
+#endif
+
+
+/////////////////////////////////
+// Generic slice-by-four CRC64 //
+/////////////////////////////////
+
+#ifdef CRC_GENERIC
+
 #include "crc_macros.h"


@ -26,8 +88,8 @@


 // See the comments in crc32_fast.c. They aren't duplicated here.
-extern LZMA_API(uint64_t)
-lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
+static uint64_t
+crc64_generic(const uint8_t *buf, size_t size, uint64_t crc)
 {
 	crc = ~crc;

@ -46,10 +108,11 @@ lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)

 		while (buf < limit) {
 #ifdef WORDS_BIGENDIAN
-			const uint32_t tmp = (crc >> 32)
+			const uint32_t tmp = (uint32_t)(crc >> 32)
 					^ aligned_read32ne(buf);
 #else
-			const uint32_t tmp = crc ^ aligned_read32ne(buf);
+			const uint32_t tmp = (uint32_t)crc
+					^ aligned_read32ne(buf);
 #endif
 			buf += 4;

@ -70,3 +133,380 @@ lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)

 	return ~crc;
 }
+#endif
+
+
+/////////////////////
+// x86 CLMUL CRC64 //
+/////////////////////
+
+#ifdef CRC_CLMUL
+
+#include <immintrin.h>
+
+
+/*
+// These functions were used to generate the constants
+// at the top of crc64_clmul().
+static uint64_t
+calc_lo(uint64_t poly)
+{
+	uint64_t a = poly;
+	uint64_t b = 0;
+
+	for (unsigned i = 0; i < 64; ++i) {
+		b = (b >> 1) | (a << 63);
+		a = (a >> 1) ^ (a & 1 ? poly : 0);
+	}
+
+	return b;
+}
+
+static uint64_t
+calc_hi(uint64_t poly, uint64_t a)
+{
+	for (unsigned i = 0; i < 64; ++i)
+		a = (a >> 1) ^ (a & 1 ? poly : 0);
+
+	return a;
+}
+*/
+
+
+#define MASK_L(in, mask, r) \
+	r = _mm_shuffle_epi8(in, mask)
+
+#define MASK_H(in, mask, r) \
+	r = _mm_shuffle_epi8(in, _mm_xor_si128(mask, vsign))
+
+#define MASK_LH(in, mask, low, high) \
+	MASK_L(in, mask, low); \
+	MASK_H(in, mask, high)
+
+
+// EDG-based compilers (Intel's classic compiler and compiler for E2K) can
+// define __GNUC__ but the attribute must not be used with them.
+// The new Clang-based ICX needs the attribute.
+//
+// NOTE: Build systems check for this too, keep them in sync with this.
+#if (defined(__GNUC__) || defined(__clang__)) && !defined(__EDG__)
+__attribute__((__target__("ssse3,sse4.1,pclmul")))
+#endif
+static uint64_t
+crc64_clmul(const uint8_t *buf, size_t size, uint64_t crc)
+{
+	// The prototypes of the intrinsics use signed types while most of
+	// the values are treated as unsigned here. These warnings in this
+	// function have been checked and found to be harmless so silence them.
+#if TUKLIB_GNUC_REQ(4, 6) || defined(__clang__)
+#	pragma GCC diagnostic push
+#	pragma GCC diagnostic ignored "-Wsign-conversion"
+#	pragma GCC diagnostic ignored "-Wconversion"
+#endif
+
+#ifndef CRC_USE_GENERIC_FOR_SMALL_INPUTS
+	// The code assumes that there is at least one byte of input.
+	if (size == 0)
+		return crc;
+#endif
+
+	// const uint64_t poly = 0xc96c5795d7870f42; // CRC polynomial
+	const uint64_t p  = 0x92d8af2baf0e1e85; // (poly << 1) | 1
+	const uint64_t mu = 0x9c3e466c172963d5; // (calc_lo(poly) << 1) | 1
+	const uint64_t k2 = 0xdabe95afc7875f40; // calc_hi(poly, 1)
+	const uint64_t k1 = 0xe05dd497ca393ae4; // calc_hi(poly, k2)
+	const __m128i vfold0 = _mm_set_epi64x(p, mu);
+	const __m128i vfold1 = _mm_set_epi64x(k2, k1);
+
+	// Create a vector with 8-bit values 0 to 15. This is used to
+	// construct control masks for _mm_blendv_epi8 and _mm_shuffle_epi8.
+	const __m128i vramp = _mm_setr_epi32(
+			0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c);
+
+	// This is used to inverse the control mask of _mm_shuffle_epi8
+	// so that bytes that wouldn't be picked with the original mask
+	// will be picked and vice versa.
+	const __m128i vsign = _mm_set1_epi8(0x80);
+
+	// Memory addresses A to D and the distances between them:
+	//
+	//     A           B     C         D
+	//     [skip_start][size][skip_end]
+	//     [     size2      ]
+	//
+	// A and D are 16-byte aligned. B and C are 1-byte aligned.
+	// skip_start and skip_end are 0-15 bytes. size is at least 1 byte.
+	//
+	// A = aligned_buf will initially point to this address.
+	// B = The address pointed by the caller-supplied buf.
+	// C = buf + size == aligned_buf + size2
+	// D = buf + size + skip_end == aligned_buf + size2 + skip_end
+	const size_t skip_start = (size_t)((uintptr_t)buf & 15);
+	const size_t skip_end = (size_t)(-(uintptr_t)(buf + size) & 15);
+	const __m128i *aligned_buf = (const __m128i *)(
+			(uintptr_t)buf & ~(uintptr_t)15);
+
+	// If size2 <= 16 then the whole input fits into a single 16-byte
+	// vector. If size2 > 16 then at least two 16-byte vectors must
+	// be processed. If size2 > 16 && size <= 16 then there is only
+	// one 16-byte vector's worth of input but it is unaligned in memory.
+	//
+	// NOTE: There is no integer overflow here if the arguments are valid.
+	// If this overflowed, buf + size would too.
+	size_t size2 = skip_start + size;
+
+	// Masks to be used with _mm_blendv_epi8 and _mm_shuffle_epi8:
+	// The first skip_start or skip_end bytes in the vectors will have
+	// the high bit (0x80) set. _mm_blendv_epi8 and _mm_shuffle_epi8
+	// will produce zeros for these positions. (Bitwise-xor of these
+	// masks with vsign will produce the opposite behavior.)
+	const __m128i mask_start
+			= _mm_sub_epi8(vramp, _mm_set1_epi8(skip_start));
+	const __m128i mask_end = _mm_sub_epi8(vramp, _mm_set1_epi8(skip_end));
+
+	// Get the first 1-16 bytes into data0. If loading less than 16 bytes,
+	// the bytes are loaded to the high bits of the vector and the least
+	// significant positions are filled with zeros.
+	const __m128i data0 = _mm_blendv_epi8(_mm_load_si128(aligned_buf),
+			_mm_setzero_si128(), mask_start);
+	++aligned_buf;
+
+#if defined(__i386__) || defined(_M_IX86)
+	const __m128i initial_crc = _mm_set_epi64x(0, ~crc);
+#else
+	// GCC and Clang would produce good code with _mm_set_epi64x
+	// but MSVC needs _mm_cvtsi64_si128 on x86-64.
+	const __m128i initial_crc = _mm_cvtsi64_si128(~crc);
+#endif
+
+	__m128i v0, v1, v2, v3;
+
+#ifndef CRC_USE_GENERIC_FOR_SMALL_INPUTS
+	if (size <= 16) {
+		// Right-shift initial_crc by 1-16 bytes based on "size"
+		// and store the result in v1 (high bytes) and v0 (low bytes).
+		//
+		// NOTE: The highest 8 bytes of initial_crc are zeros so
+		// v1 will be filled with zeros if size >= 8. The highest 8
+		// bytes of v1 will always become zeros.
+		//
+		// [      v1      ][      v0      ]
+		//  [ initial_crc  ]                  size == 1
+		//   [ initial_crc  ]                 size == 2
+		//                [ initial_crc  ]    size == 15
+		//                 [ initial_crc  ]   size == 16 (all in v0)
+		const __m128i mask_low = _mm_add_epi8(
+				vramp, _mm_set1_epi8(size - 16));
+		MASK_LH(initial_crc, mask_low, v0, v1);
+
+		if (size2 <= 16) {
+			// There are 1-16 bytes of input and it is all
+			// in data0. Copy the input bytes to v3. If there
+			// are fewer than 16 bytes, the low bytes in v3
+			// will be filled with zeros. That is, the input
+			// bytes are stored to the same position as
+			// (part of) initial_crc is in v0.
+			MASK_L(data0, mask_end, v3);
+		} else {
+			// There are 2-16 bytes of input but not all bytes
+			// are in data0.
+			const __m128i data1 = _mm_load_si128(aligned_buf);
+
+			// Collect the 2-16 input bytes from data0 and data1
+			// to v2 and v3, and bitwise-xor them with the
+			// low bits of initial_crc in v0. Note that the
+			// the second xor is below this else-block as it
+			// is shared with the other branch.
+			MASK_H(data0, mask_end, v2);
+			MASK_L(data1, mask_end, v3);
+			v0 = _mm_xor_si128(v0, v2);
+		}
+
+		v0 = _mm_xor_si128(v0, v3);
+		v1 = _mm_alignr_epi8(v1, v0, 8);
+	} else
+#endif
+	{
+		const __m128i data1 = _mm_load_si128(aligned_buf);
+		MASK_LH(initial_crc, mask_start, v0, v1);
+		v0 = _mm_xor_si128(v0, data0);
+		v1 = _mm_xor_si128(v1, data1);
+
+#define FOLD \
+	v1 = _mm_xor_si128(v1, _mm_clmulepi64_si128(v0, vfold1, 0x00)); \
+	v0 = _mm_xor_si128(v1, _mm_clmulepi64_si128(v0, vfold1, 0x11));
+
+		while (size2 > 32) {
+			++aligned_buf;
+			size2 -= 16;
+			FOLD
+			v1 = _mm_load_si128(aligned_buf);
+		}
+
+		if (size2 < 32) {
+			MASK_H(v0, mask_end, v2);
+			MASK_L(v0, mask_end, v0);
+			MASK_L(v1, mask_end, v3);
+			v1 = _mm_or_si128(v2, v3);
+		}
+
+		FOLD
+		v1 = _mm_srli_si128(v0, 8);
+#undef FOLD
+	}
+
+	v1 = _mm_xor_si128(_mm_clmulepi64_si128(v0, vfold1, 0x10), v1);
+	v0 = _mm_clmulepi64_si128(v1, vfold0, 0x00);
+	v2 = _mm_clmulepi64_si128(v0, vfold0, 0x10);
+	v0 = _mm_xor_si128(_mm_xor_si128(v2, _mm_slli_si128(v0, 8)), v1);
+
+#if defined(__i386__) || defined(_M_IX86)
+	return ~(((uint64_t)(uint32_t)_mm_extract_epi32(v0, 3) << 32) |
+			(uint64_t)(uint32_t)_mm_extract_epi32(v0, 2));
+#else
+	return ~(uint64_t)_mm_extract_epi64(v0, 1);
+#endif
+
+#if TUKLIB_GNUC_REQ(4, 6) || defined(__clang__)
+#	pragma GCC diagnostic pop
+#endif
+}
+#endif
+
+
+////////////////////////
+// Detect CPU support //
+////////////////////////
+
+#if defined(CRC_GENERIC) && defined(CRC_CLMUL)
+static inline bool
+is_clmul_supported(void)
+{
+	int success = 1;
+	uint32_t r[4]; // eax, ebx, ecx, edx
+
+#if defined(_MSC_VER)
+	// This needs <intrin.h> with MSVC. ICC has it as a built-in
+	// on all platforms.
+	__cpuid(r, 1);
+#elif defined(HAVE_CPUID_H)
+	// Compared to just using __asm__ to run CPUID, this also checks
+	// that CPUID is supported and saves and restores ebx as that is
+	// needed with GCC < 5 with position-independent code (PIC).
+	success = __get_cpuid(1, &r[0], &r[1], &r[2], &r[3]);
+#else
+	// Just a fallback that shouldn't be needed.
+	__asm__("cpuid\n\t"
+			: "=a"(r[0]), "=b"(r[1]), "=c"(r[2]), "=d"(r[3])
+			: "a"(1), "c"(0));
+#endif
+
+	// Returns true if these are supported:
+	// CLMUL (bit 1 in ecx)
+	// SSSE3 (bit 9 in ecx)
+	// SSE4.1 (bit 19 in ecx)
+	const uint32_t ecx_mask = (1 << 1) | (1 << 9) | (1 << 19);
+	return success && (r[2] & ecx_mask) == ecx_mask;
+
+	// Alternative methods that weren't used:
+	//   - ICC's _may_i_use_cpu_feature: the other methods should work too.
+	//   - GCC >= 6 / Clang / ICX __builtin_cpu_supports("pclmul")
+	//
+	// CPUID decding is needed with MSVC anyway and older GCC. This keeps
+	// the feature checks in the build system simpler too. The nice thing
+	// about __builtin_cpu_supports would be that it generates very short
+	// code as is it only reads a variable set at startup but a few bytes
+	// doesn't matter here.
+}
+
+
+#ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
+#	define CRC64_FUNC_INIT
+#	define CRC64_SET_FUNC_ATTR __attribute__((__constructor__))
+#else
+#	define CRC64_FUNC_INIT = &crc64_dispatch
+#	define CRC64_SET_FUNC_ATTR
+static uint64_t crc64_dispatch(const uint8_t *buf, size_t size, uint64_t crc);
+#endif
+
+
+// Pointer to the the selected CRC64 method.
+static uint64_t (*crc64_func)(const uint8_t *buf, size_t size, uint64_t crc)
+		CRC64_FUNC_INIT;
+
+
+CRC64_SET_FUNC_ATTR
+static void
+crc64_set_func(void)
+{
+	crc64_func = is_clmul_supported() ? &crc64_clmul : &crc64_generic;
+	return;
+}
+
+
+#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
+static uint64_t
+crc64_dispatch(const uint8_t *buf, size_t size, uint64_t crc)
+{
+	// When __attribute__((__constructor__)) isn't supported, set the
+	// function pointer without any locking. If multiple threads run
+	// the detection code in parallel, they will all end up setting
+	// the pointer to the same value. This avoids the use of
+	// mythread_once() on every call to lzma_crc64() but this likely
+	// isn't strictly standards compliant. Let's change it if it breaks.
+	crc64_set_func();
+	return crc64_func(buf, size, crc);
+}
+#endif
+#endif
+
+
+extern LZMA_API(uint64_t)
+lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
+{
+#if defined(CRC_GENERIC) && defined(CRC_CLMUL)
+	// If CLMUL is available, it is the best for non-tiny inputs,
+	// being over twice as fast as the generic slice-by-four version.
+	// However, for size <= 16 it's different. In the extreme case
+	// of size == 1 the generic version can be five times faster.
+	// At size >= 8 the CLMUL starts to become reasonable. It
+	// varies depending on the alignment of buf too.
+	//
+	// The above doesn't include the overhead of mythread_once().
+	// At least on x86-64 GNU/Linux, pthread_once() is very fast but
+	// it still makes lzma_crc64(buf, 1, crc) 50-100 % slower. When
+	// size reaches 12-16 bytes the overhead becomes negligible.
+	//
+	// So using the generic version for size <= 16 may give better
+	// performance with tiny inputs but if such inputs happen rarely
+	// it's not so obvious because then the lookup table of the
+	// generic version may not be in the processor cache.
+#ifdef CRC_USE_GENERIC_FOR_SMALL_INPUTS
+	if (size <= 16)
+		return crc64_generic(buf, size, crc);
+#endif
+
+/*
+#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
+	// See crc64_dispatch(). This would be the alternative which uses
+	// locking and doesn't use crc64_dispatch(). Note that on Windows
+	// this method needs Vista threads.
+	mythread_once(crc64_set_func);
+#endif
+*/
+
+	return crc64_func(buf, size, crc);
+
+#elif defined(CRC_CLMUL)
+	// If CLMUL is used unconditionally without runtime CPU detection
+	// then omitting the generic version and its 8 KiB lookup table
+	// makes the library smaller.
+	//
+	// FIXME: Lookup table isn't currently omitted on 32-bit x86,
+	// see crc64_table.c.
+	return crc64_clmul(buf, size, crc);
+
+#else
+	return crc64_generic(buf, size, crc);
+#endif
+}
--- a/src/liblzma/check/crc64_small.c
+++ b/src/liblzma/check/crc64_small.c
@ -16,6 +16,9 @@
 static uint64_t crc64_table[256];


+#ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
+__attribute__((__constructor__))
+#endif
 static void
 crc64_init(void)
 {
@ -40,7 +43,9 @@ crc64_init(void)
 extern LZMA_API(uint64_t)
 lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc)
 {
+#ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR
 	mythread_once(crc64_init);
+#endif

 	crc = ~crc;

--- a/src/liblzma/check/crc64_table.c
+++ b/src/liblzma/check/crc64_table.c
@ -12,11 +12,24 @@

 #include "common.h"

+
+// FIXME: Compared to crc64_fast.c this has to check for __x86_64__ too
+// so that in 32-bit builds crc64_x86.S won't break due to a missing table.
+#if (defined(__x86_64__) && defined(__SSSE3__) \
+			&& defined(__SSE4_1__) && defined(__PCLMUL__)) \
+		|| (defined(__e2k__) && __iset__ >= 6)
+// No table needed but something has to be exported to keep some toolchains
+// happy. Also use a declaration to silence compiler warnings.
+extern const char lzma_crc64_dummy;
+const char lzma_crc64_dummy;
+
+#else
 // Having the declaration here silences clang -Wmissing-variable-declarations.
 extern const uint64_t lzma_crc64_table[4][256];

-#ifdef WORDS_BIGENDIAN
-#	include "crc64_table_be.h"
-#else
-#	include "crc64_table_le.h"
+#	if defined(WORDS_BIGENDIAN)
+#		include "crc64_table_be.h"
+#	else
+#		include "crc64_table_le.h"
+#	endif
 #endif
--- a/src/liblzma/common/alone_decoder.c
+++ b/src/liblzma/common/alone_decoder.c
@ -110,12 +110,24 @@ alone_decode(void *coder_ptr, const lzma_allocator *allocator,
 		// Another hack to ditch false positives: Assume that
 		// if the uncompressed size is known, it must be less
 		// than 256 GiB.
+		//
+		// FIXME? Without picky we allow > LZMA_VLI_MAX which doesn't
+		// really matter in this specific situation (> LZMA_VLI_MAX is
+		// safe in the LZMA decoder) but it's somewhat weird still.
 		if (coder->picky
 				&& coder->uncompressed_size != LZMA_VLI_UNKNOWN
 				&& coder->uncompressed_size
 					>= (LZMA_VLI_C(1) << 38))
 			return LZMA_FORMAT_ERROR;

+		// Use LZMA_FILTER_LZMA1EXT features to specify the
+		// uncompressed size and that the end marker is allowed
+		// even when the uncompressed size is known. Both .lzma
+		// header and LZMA1EXT use UINT64_MAX indicate that size
+		// is unknown.
+		coder->options.ext_flags = LZMA_LZMA1EXT_ALLOW_EOPM;
+		lzma_set_ext_size(coder->options, coder->uncompressed_size);
+
 		// Calculate the memory usage so that it is ready
 		// for SEQ_CODER_INIT.
 		coder->memusage = lzma_lzma_decoder_memusage(&coder->options)
@ -132,6 +144,7 @@ alone_decode(void *coder_ptr, const lzma_allocator *allocator,

 		lzma_filter_info filters[2] = {
 			{
+				.id = LZMA_FILTER_LZMA1EXT,
 				.init = &lzma_lzma_decoder_init,
 				.options = &coder->options,
 			}, {
@ -139,14 +152,8 @@ alone_decode(void *coder_ptr, const lzma_allocator *allocator,
 			}
 		};

-		const lzma_ret ret = lzma_next_filter_init(&coder->next,
-				allocator, filters);
-		if (ret != LZMA_OK)
-			return ret;
-
-		// Use a hack to set the uncompressed size.
-		lzma_lz_decoder_uncompressed(coder->next.coder,
-				coder->uncompressed_size, true);
+		return_if_error(lzma_next_filter_init(&coder->next,
+				allocator, filters));

 		coder->sequence = SEQ_CODE;
 		break;
--- a/src/liblzma/common/alone_encoder.c
+++ b/src/liblzma/common/alone_encoder.c
@ -129,6 +129,7 @@ alone_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 	// Initialize the LZMA encoder.
 	const lzma_filter_info filters[2] = {
 		{
+			.id = LZMA_FILTER_LZMA1,
 			.init = &lzma_lzma_encoder_init,
 			.options = (void *)(options),
 		}, {
--- a/src/liblzma/common/auto_decoder.c
+++ b/src/liblzma/common/auto_decoder.c
@ -1,7 +1,7 @@
 ///////////////////////////////////////////////////////////////////////////////
 //
 /// \file       auto_decoder.c
-/// \brief      Autodetect between .xz Stream and .lzma (LZMA_Alone) formats
+/// \brief      Autodetect between .xz, .lzma (LZMA_Alone), and .lz (lzip)
 //
 //  Author:     Lasse Collin
 //
@ -12,10 +12,13 @@

 #include "stream_decoder.h"
 #include "alone_decoder.h"
+#ifdef HAVE_LZIP_DECODER
+#	include "lzip_decoder.h"
+#endif


 typedef struct {
-	/// Stream decoder or LZMA_Alone decoder
+	/// .xz Stream decoder, LZMA_Alone decoder, or lzip decoder
 	lzma_next_coder next;

 	uint64_t memlimit;
@ -46,14 +49,22 @@ auto_decode(void *coder_ptr, const lzma_allocator *allocator,
 		// SEQ_CODE even if we return some LZMA_*_CHECK.
 		coder->sequence = SEQ_CODE;

-		// Detect the file format. For now this is simple, since if
-		// it doesn't start with 0xFD (the first magic byte of the
-		// new format), it has to be LZMA_Alone, or something that
-		// we don't support at all.
+		// Detect the file format. .xz files start with 0xFD which
+		// cannot be the first byte of .lzma (LZMA_Alone) format.
+		// The .lz format starts with 0x4C which could be the
+		// first byte of a .lzma file but luckily it would mean
+		// lc/lp/pb being 4/3/1 which liblzma doesn't support because
+		// lc + lp > 4. So using just 0x4C to detect .lz is OK here.
 		if (in[*in_pos] == 0xFD) {
 			return_if_error(lzma_stream_decoder_init(
 					&coder->next, allocator,
 					coder->memlimit, coder->flags));
+#ifdef HAVE_LZIP_DECODER
+		} else if (in[*in_pos] == 0x4C) {
+			return_if_error(lzma_lzip_decoder_init(
+					&coder->next, allocator,
+					coder->memlimit, coder->flags));
+#endif
 		} else {
 			return_if_error(lzma_alone_decoder_init(&coder->next,
 					allocator, coder->memlimit, true));
--- a/src/liblzma/common/block_header_decoder.c
+++ b/src/liblzma/common/block_header_decoder.c
@ -14,22 +14,6 @@
 #include "check.h"


-static void
-free_properties(lzma_block *block, const lzma_allocator *allocator)
-{
-	// Free allocated filter options. The last array member is not
-	// touched after the initialization in the beginning of
-	// lzma_block_header_decode(), so we don't need to touch that here.
-	for (size_t i = 0; i < LZMA_FILTERS_MAX; ++i) {
-		lzma_free(block->filters[i].options, allocator);
-		block->filters[i].id = LZMA_VLI_UNKNOWN;
-		block->filters[i].options = NULL;
-	}
-
-	return;
-}
-
-
 extern LZMA_API(lzma_ret)
 lzma_block_header_decode(lzma_block *block,
 		const lzma_allocator *allocator, const uint8_t *in)
@ -39,6 +23,10 @@ lzma_block_header_decode(lzma_block *block,
 	// are invalid or over 63 bits, or if the header is too small
 	// to contain the claimed information.

+	// Catch unexpected NULL pointers.
+	if (block == NULL || block->filters == NULL || in == NULL)
+		return LZMA_PROG_ERROR;
+
 	// Initialize the filter options array. This way the caller can
 	// safely free() the options even if an error occurs in this function.
 	for (size_t i = 0; i <= LZMA_FILTERS_MAX; ++i) {
@ -67,8 +55,11 @@ lzma_block_header_decode(lzma_block *block,
 	const size_t in_size = block->header_size - 4;

 	// Verify CRC32
-	if (lzma_crc32(in, in_size, 0) != read32le(in + in_size))
+	if (lzma_crc32(in, in_size, 0) != read32le(in + in_size)) {
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 		return LZMA_DATA_ERROR;
+#endif
+	}

 	// Check for unsupported flags.
 	if (in[1] & 0x3C)
@ -104,7 +95,7 @@ lzma_block_header_decode(lzma_block *block,
 				&block->filters[i], allocator,
 				in, &in_pos, in_size);
 		if (ret != LZMA_OK) {
-			free_properties(block, allocator);
+			lzma_filters_free(block->filters, allocator);
 			return ret;
 		}
 	}
@ -112,7 +103,7 @@ lzma_block_header_decode(lzma_block *block,
 	// Padding
 	while (in_pos < in_size) {
 		if (in[in_pos++] != 0x00) {
-			free_properties(block, allocator);
+			lzma_filters_free(block->filters, allocator);

 			// Possibly some new field present so use
 			// LZMA_OPTIONS_ERROR instead of LZMA_DATA_ERROR.
--- a/src/liblzma/common/common.c
+++ b/src/liblzma/common/common.c
@ -211,7 +211,6 @@ lzma_code(lzma_stream *strm, lzma_action action)
 			|| strm->reserved_ptr2 != NULL
 			|| strm->reserved_ptr3 != NULL
 			|| strm->reserved_ptr4 != NULL
-			|| strm->reserved_int1 != 0
 			|| strm->reserved_int2 != 0
 			|| strm->reserved_int3 != 0
 			|| strm->reserved_int4 != 0
@ -299,9 +298,7 @@ lzma_code(lzma_stream *strm, lzma_action action)

 	strm->internal->avail_in = strm->avail_in;

-	// Cast is needed to silence a warning about LZMA_TIMED_OUT, which
-	// isn't part of lzma_ret enumeration.
-	switch ((unsigned int)(ret)) {
+	switch (ret) {
 	case LZMA_OK:
 		// Don't return LZMA_BUF_ERROR when it happens the first time.
 		// This is to avoid returning LZMA_BUF_ERROR when avail_out
@ -322,6 +319,17 @@ lzma_code(lzma_stream *strm, lzma_action action)
 		ret = LZMA_OK;
 		break;

+	case LZMA_SEEK_NEEDED:
+		strm->internal->allow_buf_error = false;
+
+		// If LZMA_FINISH was used, reset it back to the
+		// LZMA_RUN-based state so that new input can be supplied
+		// by the application.
+		if (strm->internal->sequence == ISEQ_FINISH)
+			strm->internal->sequence = ISEQ_RUN;
+
+		break;
+
 	case LZMA_STREAM_END:
 		if (strm->internal->sequence == ISEQ_SYNC_FLUSH
 				|| strm->internal->sequence == ISEQ_FULL_FLUSH
--- a/src/liblzma/common/common.h
+++ b/src/liblzma/common/common.h
@ -34,6 +34,14 @@

 #include "lzma.h"

+// This is for detecting modern GCC and Clang attributes
+// like __symver__ in GCC >= 10.
+#ifdef __has_attribute
+#	define lzma_has_attribute(attr) __has_attribute(attr)
+#else
+#	define lzma_has_attribute(attr) 0
+#endif
+
 // The extra symbol versioning in the C files may only be used when
 // building a shared library. If HAVE_SYMBOL_VERSIONS_LINUX is defined
 // to 2 then symbol versioning is done only if also PIC is defined.
@ -63,7 +71,12 @@
 // since 2000). When using @@ instead of @@@, the internal name must not be
 // the same as the external name to avoid problems in some situations. This
 // is why "#define foo_52 foo" is needed for the default symbol versions.
-#	if TUKLIB_GNUC_REQ(10, 0) && !defined(__INTEL_COMPILER)
+//
+// __has_attribute is supported before GCC 10 and it is supported in Clang 14
+// too (which doesn't support __symver__) so use it to detect if __symver__
+// is available. This should be far more reliable than looking at compiler
+// version macros as nowadays especially __GNUC__ is defined by many compilers.
+#	if lzma_has_attribute(__symver__)
 #		define LZMA_SYMVER_API(extnamever, type, intname) \
 			extern __attribute__((__symver__(extnamever))) \
 					LZMA_API(type) intname
@ -107,14 +120,15 @@
 #define LZMA_FILTER_RESERVED_START (LZMA_VLI_C(1) << 62)


-/// Supported flags that can be passed to lzma_stream_decoder()
-/// or lzma_auto_decoder().
+/// Supported flags that can be passed to lzma_stream_decoder(),
+/// lzma_auto_decoder(), or lzma_stream_decoder_mt().
 #define LZMA_SUPPORTED_FLAGS \
 	( LZMA_TELL_NO_CHECK \
 	| LZMA_TELL_UNSUPPORTED_CHECK \
 	| LZMA_TELL_ANY_CHECK \
 	| LZMA_IGNORE_CHECK \
-	| LZMA_CONCATENATED )
+	| LZMA_CONCATENATED \
+	| LZMA_FAIL_FAST )


 /// Largest valid lzma_action value as unsigned integer.
@ -123,9 +137,12 @@

 /// Special return value (lzma_ret) to indicate that a timeout was reached
 /// and lzma_code() must not return LZMA_BUF_ERROR. This is converted to
-/// LZMA_OK in lzma_code(). This is not in the lzma_ret enumeration because
-/// there's no need to have it in the public API.
-#define LZMA_TIMED_OUT 32
+/// LZMA_OK in lzma_code().
+#define LZMA_TIMED_OUT LZMA_RET_INTERNAL1
+
+/// Special return value (lzma_ret) for use in stream_decoder_mt.c to
+/// indicate Index was detected instead of a Block Header.
+#define LZMA_INDEX_DETECTED LZMA_RET_INTERNAL2


 typedef struct lzma_next_coder_s lzma_next_coder;
@ -158,8 +175,11 @@ typedef void (*lzma_end_function)(
 /// an array of lzma_filter_info structures. This array is used with
 /// lzma_next_filter_init to initialize the filter chain.
 struct lzma_filter_info_s {
-	/// Filter ID. This is used only by the encoder
-	/// with lzma_filters_update().
+	/// Filter ID. This can be used to share the same initiazation
+	/// function *and* data structures with different Filter IDs
+	/// (LZMA_FILTER_LZMA1EXT does it), and also by the encoder
+	/// with lzma_filters_update() if filter chain is updated
+	/// in the middle of a raw stream or Block (LZMA_SYNC_FLUSH).
 	lzma_vli id;

 	/// Pointer to function used to initialize the filter.
@ -213,6 +233,16 @@ struct lzma_next_coder_s {
 	lzma_ret (*update)(void *coder, const lzma_allocator *allocator,
 			const lzma_filter *filters,
 			const lzma_filter *reversed_filters);
+
+	/// Set how many bytes of output this coder may produce at maximum.
+	/// On success LZMA_OK must be returned.
+	/// If the filter chain as a whole cannot support this feature,
+	/// this must return LZMA_OPTIONS_ERROR.
+	/// If no input has been given to the coder and the requested limit
+	/// is too small, this must return LZMA_BUF_ERROR. If input has been
+	/// seen, LZMA_OK is allowed too.
+	lzma_ret (*set_out_limit)(void *coder, uint64_t *uncomp_size,
+			uint64_t out_limit);
 };


@ -228,6 +258,7 @@ struct lzma_next_coder_s {
 		.get_check = NULL, \
 		.memconfig = NULL, \
 		.update = NULL, \
+		.set_out_limit = NULL, \
 	}


--- a/src/liblzma/common/file_info.c
+++ b/src/liblzma/common/file_info.c
@ -0,0 +1,855 @@
+///////////////////////////////////////////////////////////////////////////////
+//
+/// \file       file_info.c
+/// \brief      Decode .xz file information into a lzma_index structure
+//
+//  Author:     Lasse Collin
+//
+//  This file has been put into the public domain.
+//  You can do whatever you want with this file.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+#include "index_decoder.h"
+
+
+typedef struct {
+	enum {
+		SEQ_MAGIC_BYTES,
+		SEQ_PADDING_SEEK,
+		SEQ_PADDING_DECODE,
+		SEQ_FOOTER,
+		SEQ_INDEX_INIT,
+		SEQ_INDEX_DECODE,
+		SEQ_HEADER_DECODE,
+		SEQ_HEADER_COMPARE,
+	} sequence;
+
+	/// Absolute position of in[*in_pos] in the file. All code that
+	/// modifies *in_pos also updates this. seek_to_pos() needs this
+	/// to determine if we need to request the application to seek for
+	/// us or if we can do the seeking internally by adjusting *in_pos.
+	uint64_t file_cur_pos;
+
+	/// This refers to absolute positions of interesting parts of the
+	/// input file. Sometimes it points to the *beginning* of a specific
+	/// field and sometimes to the *end* of a field. The current target
+	/// position at each moment is explained in the comments.
+	uint64_t file_target_pos;
+
+	/// Size of the .xz file (from the application).
+	uint64_t file_size;
+
+	/// Index decoder
+	lzma_next_coder index_decoder;
+
+	/// Number of bytes remaining in the Index field that is currently
+	/// being decoded.
+	lzma_vli index_remaining;
+
+	/// The Index decoder will store the decoded Index in this pointer.
+	lzma_index *this_index;
+
+	/// Amount of Stream Padding in the current Stream.
+	lzma_vli stream_padding;
+
+	/// The final combined index is collected here.
+	lzma_index *combined_index;
+
+	/// Pointer from the application where to store the index information
+	/// after successful decoding.
+	lzma_index **dest_index;
+
+	/// Pointer to lzma_stream.seek_pos to be used when returning
+	/// LZMA_SEEK_NEEDED. This is set by seek_to_pos() when needed.
+	uint64_t *external_seek_pos;
+
+	/// Memory usage limit
+	uint64_t memlimit;
+
+	/// Stream Flags from the very beginning of the file.
+	lzma_stream_flags first_header_flags;
+
+	/// Stream Flags from Stream Header of the current Stream.
+	lzma_stream_flags header_flags;
+
+	/// Stream Flags from Stream Footer of the current Stream.
+	lzma_stream_flags footer_flags;
+
+	size_t temp_pos;
+	size_t temp_size;
+	uint8_t temp[8192];
+
+} lzma_file_info_coder;
+
+
+/// Copies data from in[*in_pos] into coder->temp until
+/// coder->temp_pos == coder->temp_size. This also keeps coder->file_cur_pos
+/// in sync with *in_pos. Returns true if more input is needed.
+static bool
+fill_temp(lzma_file_info_coder *coder, const uint8_t *restrict in,
+		size_t *restrict in_pos, size_t in_size)
+{
+	coder->file_cur_pos += lzma_bufcpy(in, in_pos, in_size,
+			coder->temp, &coder->temp_pos, coder->temp_size);
+	return coder->temp_pos < coder->temp_size;
+}
+
+
+/// Seeks to the absolute file position specified by target_pos.
+/// This tries to do the seeking by only modifying *in_pos, if possible.
+/// The main benefit of this is that if one passes the whole file at once
+/// to lzma_code(), the decoder will never need to return LZMA_SEEK_NEEDED
+/// as all the seeking can be done by adjusting *in_pos in this function.
+///
+/// Returns true if an external seek is needed and the caller must return
+/// LZMA_SEEK_NEEDED.
+static bool
+seek_to_pos(lzma_file_info_coder *coder, uint64_t target_pos,
+		size_t in_start, size_t *in_pos, size_t in_size)
+{
+	// The input buffer doesn't extend beyond the end of the file.
+	// This has been checked by file_info_decode() already.
+	assert(coder->file_size - coder->file_cur_pos >= in_size - *in_pos);
+
+	const uint64_t pos_min = coder->file_cur_pos - (*in_pos - in_start);
+	const uint64_t pos_max = coder->file_cur_pos + (in_size - *in_pos);
+
+	bool external_seek_needed;
+
+	if (target_pos >= pos_min && target_pos <= pos_max) {
+		// The requested position is available in the current input
+		// buffer or right after it. That is, in a corner case we
+		// end up setting *in_pos == in_size and thus will immediately
+		// need new input bytes from the application.
+		*in_pos += (size_t)(target_pos - coder->file_cur_pos);
+		external_seek_needed = false;
+	} else {
+		// Ask the application to seek the input file.
+		*coder->external_seek_pos = target_pos;
+		external_seek_needed = true;
+
+		// Mark the whole input buffer as used. This way
+		// lzma_stream.total_in will have a better estimate
+		// of the amount of data read. It still won't be perfect
+		// as the value will depend on the input buffer size that
+		// the application uses, but it should be good enough for
+		// those few who want an estimate.
+		*in_pos = in_size;
+	}
+
+	// After seeking (internal or external) the current position
+	// will match the requested target position.
+	coder->file_cur_pos = target_pos;
+
+	return external_seek_needed;
+}
+
+
+/// The caller sets coder->file_target_pos so that it points to the *end*
+/// of the desired file position. This function then determines how far
+/// backwards from that position we can seek. After seeking fill_temp()
+/// can be used to read data into coder->temp. When fill_temp() has finished,
+/// coder->temp[coder->temp_size] will match coder->file_target_pos.
+///
+/// This also validates that coder->target_file_pos is sane in sense that
+/// we aren't trying to seek too far backwards (too close or beyond the
+/// beginning of the file).
+static lzma_ret
+reverse_seek(lzma_file_info_coder *coder,
+		size_t in_start, size_t *in_pos, size_t in_size)
+{
+	// Check that there is enough data before the target position
+	// to contain at least Stream Header and Stream Footer. If there
+	// isn't, the file cannot be valid.
+	if (coder->file_target_pos < 2 * LZMA_STREAM_HEADER_SIZE)
+		return LZMA_DATA_ERROR;
+
+	coder->temp_pos = 0;
+
+	// The Stream Header at the very beginning of the file gets handled
+	// specially in SEQ_MAGIC_BYTES and thus we will never need to seek
+	// there. By not seeking to the first LZMA_STREAM_HEADER_SIZE bytes
+	// we avoid a useless external seek after SEQ_MAGIC_BYTES if the
+	// application uses an extremely small input buffer and the input
+	// file is very small.
+	if (coder->file_target_pos - LZMA_STREAM_HEADER_SIZE
+			< sizeof(coder->temp))
+		coder->temp_size = (size_t)(coder->file_target_pos
+				- LZMA_STREAM_HEADER_SIZE);
+	else
+		coder->temp_size = sizeof(coder->temp);
+
+	// The above if-statements guarantee this. This is important because
+	// the Stream Header/Footer decoders assume that there's at least
+	// LZMA_STREAM_HEADER_SIZE bytes in coder->temp.
+	assert(coder->temp_size >= LZMA_STREAM_HEADER_SIZE);
+
+	if (seek_to_pos(coder, coder->file_target_pos - coder->temp_size,
+			in_start, in_pos, in_size))
+		return LZMA_SEEK_NEEDED;
+
+	return LZMA_OK;
+}
+
+
+/// Gets the number of zero-bytes at the end of the buffer.
+static size_t
+get_padding_size(const uint8_t *buf, size_t buf_size)
+{
+	size_t padding = 0;
+	while (buf_size > 0 && buf[--buf_size] == 0x00)
+		++padding;
+
+	return padding;
+}
+
+
+/// With the Stream Header at the very beginning of the file, LZMA_FORMAT_ERROR
+/// is used to tell the application that Magic Bytes didn't match. In other
+/// Stream Header/Footer fields (in the middle/end of the file) it could be
+/// a bit confusing to return LZMA_FORMAT_ERROR as we already know that there
+/// is a valid Stream Header at the beginning of the file. For those cases
+/// this function is used to convert LZMA_FORMAT_ERROR to LZMA_DATA_ERROR.
+static lzma_ret
+hide_format_error(lzma_ret ret)
+{
+	if (ret == LZMA_FORMAT_ERROR)
+		ret = LZMA_DATA_ERROR;
+
+	return ret;
+}
+
+
+/// Calls the Index decoder and updates coder->index_remaining.
+/// This is a separate function because the input can be either directly
+/// from the application or from coder->temp.
+static lzma_ret
+decode_index(lzma_file_info_coder *coder, const lzma_allocator *allocator,
+		const uint8_t *restrict in, size_t *restrict in_pos,
+		size_t in_size, bool update_file_cur_pos)
+{
+	const size_t in_start = *in_pos;
+
+	const lzma_ret ret = coder->index_decoder.code(
+			coder->index_decoder.coder,
+			allocator, in, in_pos, in_size,
+			NULL, NULL, 0, LZMA_RUN);
+
+	coder->index_remaining -= *in_pos - in_start;
+
+	if (update_file_cur_pos)
+		coder->file_cur_pos += *in_pos - in_start;
+
+	return ret;
+}
+
+
+static lzma_ret
+file_info_decode(void *coder_ptr, const lzma_allocator *allocator,
+		const uint8_t *restrict in, size_t *restrict in_pos,
+		size_t in_size,
+		uint8_t *restrict out lzma_attribute((__unused__)),
+		size_t *restrict out_pos lzma_attribute((__unused__)),
+		size_t out_size lzma_attribute((__unused__)),
+		lzma_action action lzma_attribute((__unused__)))
+{
+	lzma_file_info_coder *coder = coder_ptr;
+	const size_t in_start = *in_pos;
+
+	// If the caller provides input past the end of the file, trim
+	// the extra bytes from the buffer so that we won't read too far.
+	assert(coder->file_size >= coder->file_cur_pos);
+	if (coder->file_size - coder->file_cur_pos < in_size - in_start)
+		in_size = in_start
+			+ (size_t)(coder->file_size - coder->file_cur_pos);
+
+	while (true)
+	switch (coder->sequence) {
+	case SEQ_MAGIC_BYTES:
+		// Decode the Stream Header at the beginning of the file
+		// first to check if the Magic Bytes match. The flags
+		// are stored in coder->first_header_flags so that we
+		// don't need to seek to it again.
+		//
+		// Check that the file is big enough to contain at least
+		// Stream Header.
+		if (coder->file_size < LZMA_STREAM_HEADER_SIZE)
+			return LZMA_FORMAT_ERROR;
+
+		// Read the Stream Header field into coder->temp.
+		if (fill_temp(coder, in, in_pos, in_size))
+			return LZMA_OK;
+
+		// This is the only Stream Header/Footer decoding where we
+		// want to return LZMA_FORMAT_ERROR if the Magic Bytes don't
+		// match. Elsewhere it will be converted to LZMA_DATA_ERROR.
+		return_if_error(lzma_stream_header_decode(
+				&coder->first_header_flags, coder->temp));
+
+		// Now that we know that the Magic Bytes match, check the
+		// file size. It's better to do this here after checking the
+		// Magic Bytes since this way we can give LZMA_FORMAT_ERROR
+		// instead of LZMA_DATA_ERROR when the Magic Bytes don't
+		// match in a file that is too big or isn't a multiple of
+		// four bytes.
+		if (coder->file_size > LZMA_VLI_MAX || (coder->file_size & 3))
+			return LZMA_DATA_ERROR;
+
+		// Start looking for Stream Padding and Stream Footer
+		// at the end of the file.
+		coder->file_target_pos = coder->file_size;
+
+	// Fall through
+
+	case SEQ_PADDING_SEEK:
+		coder->sequence = SEQ_PADDING_DECODE;
+		return_if_error(reverse_seek(
+				coder, in_start, in_pos, in_size));
+
+	// Fall through
+
+	case SEQ_PADDING_DECODE: {
+		// Copy to coder->temp first. This keeps the code simpler if
+		// the application only provides input a few bytes at a time.
+		if (fill_temp(coder, in, in_pos, in_size))
+			return LZMA_OK;
+
+		// Scan the buffer backwards to get the size of the
+		// Stream Padding field (if any).
+		const size_t new_padding = get_padding_size(
+				coder->temp, coder->temp_size);
+		coder->stream_padding += new_padding;
+
+		// Set the target position to the beginning of Stream Padding
+		// that has been observed so far. If all Stream Padding has
+		// been seen, then the target position will be at the end
+		// of the Stream Footer field.
+		coder->file_target_pos -= new_padding;
+
+		if (new_padding == coder->temp_size) {
+			// The whole buffer was padding. Seek backwards in
+			// the file to get more input.
+			coder->sequence = SEQ_PADDING_SEEK;
+			break;
+		}
+
+		// Size of Stream Padding must be a multiple of 4 bytes.
+		if (coder->stream_padding & 3)
+			return LZMA_DATA_ERROR;
+
+		coder->sequence = SEQ_FOOTER;
+
+		// Calculate the amount of non-padding data in coder->temp.
+		coder->temp_size -= new_padding;
+		coder->temp_pos = coder->temp_size;
+
+		// We can avoid an external seek if the whole Stream Footer
+		// is already in coder->temp. In that case SEQ_FOOTER won't
+		// read more input and will find the Stream Footer from
+		// coder->temp[coder->temp_size - LZMA_STREAM_HEADER_SIZE].
+		//
+		// Otherwise we will need to seek. The seeking is done so
+		// that Stream Footer wil be at the end of coder->temp.
+		// This way it's likely that we also get a complete Index
+		// field into coder->temp without needing a separate seek
+		// for that (unless the Index field is big).
+		if (coder->temp_size < LZMA_STREAM_HEADER_SIZE)
+			return_if_error(reverse_seek(
+					coder, in_start, in_pos, in_size));
+	}
+
+	// Fall through
+
+	case SEQ_FOOTER:
+		// Copy the Stream Footer field into coder->temp.
+		// If Stream Footer was already available in coder->temp
+		// in SEQ_PADDING_DECODE, then this does nothing.
+		if (fill_temp(coder, in, in_pos, in_size))
+			return LZMA_OK;
+
+		// Make coder->file_target_pos and coder->temp_size point
+		// to the beginning of Stream Footer and thus to the end
+		// of the Index field. coder->temp_pos will be updated
+		// a bit later.
+		coder->file_target_pos -= LZMA_STREAM_HEADER_SIZE;
+		coder->temp_size -= LZMA_STREAM_HEADER_SIZE;
+
+		// Decode Stream Footer.
+		return_if_error(hide_format_error(lzma_stream_footer_decode(
+				&coder->footer_flags,
+				coder->temp + coder->temp_size)));
+
+		// Check that we won't seek past the beginning of the file.
+		//
+		// LZMA_STREAM_HEADER_SIZE is added because there must be
+		// space for Stream Header too even though we won't seek
+		// there before decoding the Index field.
+		//
+		// There's no risk of integer overflow here because
+		// Backward Size cannot be greater than 2^34.
+		if (coder->file_target_pos < coder->footer_flags.backward_size
+				+ LZMA_STREAM_HEADER_SIZE)
+			return LZMA_DATA_ERROR;
+
+		// Set the target position to the beginning of the Index field.
+		coder->file_target_pos -= coder->footer_flags.backward_size;
+		coder->sequence = SEQ_INDEX_INIT;
+
+		// We can avoid an external seek if the whole Index field is
+		// already available in coder->temp.
+		if (coder->temp_size >= coder->footer_flags.backward_size) {
+			// Set coder->temp_pos to point to the beginning
+			// of the Index.
+			coder->temp_pos = coder->temp_size
+					- coder->footer_flags.backward_size;
+		} else {
+			// These are set to zero to indicate that there's no
+			// useful data (Index or anything else) in coder->temp.
+			coder->temp_pos = 0;
+			coder->temp_size = 0;
+
+			// Seek to the beginning of the Index field.
+			if (seek_to_pos(coder, coder->file_target_pos,
+					in_start, in_pos, in_size))
+				return LZMA_SEEK_NEEDED;
+		}
+
+	// Fall through
+
+	case SEQ_INDEX_INIT: {
+		// Calculate the amount of memory already used by the earlier
+		// Indexes so that we know how big memory limit to pass to
+		// the Index decoder.
+		//
+		// NOTE: When there are multiple Streams, the separate
+		// lzma_index structures can use more RAM (as measured by
+		// lzma_index_memused()) than the final combined lzma_index.
+		// Thus memlimit may need to be slightly higher than the final
+		// calculated memory usage will be. This is perhaps a bit
+		// confusing to the application, but I think it shouldn't
+		// cause problems in practice.
+		uint64_t memused = 0;
+		if (coder->combined_index != NULL) {
+			memused = lzma_index_memused(coder->combined_index);
+			assert(memused <= coder->memlimit);
+			if (memused > coder->memlimit) // Extra sanity check
+				return LZMA_PROG_ERROR;
+		}
+
+		// Initialize the Index decoder.
+		return_if_error(lzma_index_decoder_init(
+				&coder->index_decoder, allocator,
+				&coder->this_index,
+				coder->memlimit - memused));
+
+		coder->index_remaining = coder->footer_flags.backward_size;
+		coder->sequence = SEQ_INDEX_DECODE;
+	}
+
+	// Fall through
+
+	case SEQ_INDEX_DECODE: {
+		// Decode (a part of) the Index. If the whole Index is already
+		// in coder->temp, read it from there. Otherwise read from
+		// in[*in_pos] onwards. Note that index_decode() updates
+		// coder->index_remaining and optionally coder->file_cur_pos.
+		lzma_ret ret;
+		if (coder->temp_size != 0) {
+			assert(coder->temp_size - coder->temp_pos
+					== coder->index_remaining);
+			ret = decode_index(coder, allocator, coder->temp,
+					&coder->temp_pos, coder->temp_size,
+					false);
+		} else {
+			// Don't give the decoder more input than the known
+			// remaining size of the Index field.
+			size_t in_stop = in_size;
+			if (in_size - *in_pos > coder->index_remaining)
+				in_stop = *in_pos
+					+ (size_t)(coder->index_remaining);
+
+			ret = decode_index(coder, allocator,
+					in, in_pos, in_stop, true);
+		}
+
+		switch (ret) {
+		case LZMA_OK:
+			// If the Index docoder asks for more input when we
+			// have already given it as much input as Backward Size
+			// indicated, the file is invalid.
+			if (coder->index_remaining == 0)
+				return LZMA_DATA_ERROR;
+
+			// We cannot get here if we were reading Index from
+			// coder->temp because when reading from coder->temp
+			// we give the Index decoder exactly
+			// coder->index_remaining bytes of input.
+			assert(coder->temp_size == 0);
+
+			return LZMA_OK;
+
+		case LZMA_STREAM_END:
+			// If the decoding seems to be successful, check also
+			// that the Index decoder consumed as much input as
+			// indicated by the Backward Size field.
+			if (coder->index_remaining != 0)
+				return LZMA_DATA_ERROR;
+
+			break;
+
+		default:
+			return ret;
+		}
+
+		// Calculate how much the Index tells us to seek backwards
+		// (relative to the beginning of the Index): Total size of
+		// all Blocks plus the size of the Stream Header field.
+		// No integer overflow here because lzma_index_total_size()
+		// cannot return a value greater than LZMA_VLI_MAX.
+		const uint64_t seek_amount
+				= lzma_index_total_size(coder->this_index)
+					+ LZMA_STREAM_HEADER_SIZE;
+
+		// Check that Index is sane in sense that seek_amount won't
+		// make us seek past the beginning of the file when locating
+		// the Stream Header.
+		//
+		// coder->file_target_pos still points to the beginning of
+		// the Index field.
+		if (coder->file_target_pos < seek_amount)
+			return LZMA_DATA_ERROR;
+
+		// Set the target to the beginning of Stream Header.
+		coder->file_target_pos -= seek_amount;
+
+		if (coder->file_target_pos == 0) {
+			// We would seek to the beginning of the file, but
+			// since we already decoded that Stream Header in
+			// SEQ_MAGIC_BYTES, we can use the cached value from
+			// coder->first_header_flags to avoid the seek.
+			coder->header_flags = coder->first_header_flags;
+			coder->sequence = SEQ_HEADER_COMPARE;
+			break;
+		}
+
+		coder->sequence = SEQ_HEADER_DECODE;
+
+		// Make coder->file_target_pos point to the end of
+		// the Stream Header field.
+		coder->file_target_pos += LZMA_STREAM_HEADER_SIZE;
+
+		// If coder->temp_size is non-zero, it points to the end
+		// of the Index field. Then the beginning of the Index
+		// field is at coder->temp[coder->temp_size
+		// - coder->footer_flags.backward_size].
+		assert(coder->temp_size == 0 || coder->temp_size
+				>= coder->footer_flags.backward_size);
+
+		// If coder->temp contained the whole Index, see if it has
+		// enough data to contain also the Stream Header. If so,
+		// we avoid an external seek.
+		//
+		// NOTE: This can happen only with small .xz files and only
+		// for the non-first Stream as the Stream Flags of the first
+		// Stream are cached and already handled a few lines above.
+		// So this isn't as useful as the other seek-avoidance cases.
+		if (coder->temp_size != 0 && coder->temp_size
+				- coder->footer_flags.backward_size
+				>= seek_amount) {
+			// Make temp_pos and temp_size point to the *end* of
+			// Stream Header so that SEQ_HEADER_DECODE will find
+			// the start of Stream Header from coder->temp[
+			// coder->temp_size - LZMA_STREAM_HEADER_SIZE].
+			coder->temp_pos = coder->temp_size
+					- coder->footer_flags.backward_size
+					- seek_amount
+					+ LZMA_STREAM_HEADER_SIZE;
+			coder->temp_size = coder->temp_pos;
+		} else {
+			// Seek so that Stream Header will be at the end of
+			// coder->temp. With typical multi-Stream files we
+			// will usually also get the Stream Footer and Index
+			// of the *previous* Stream in coder->temp and thus
+			// won't need a separate seek for them.
+			return_if_error(reverse_seek(coder,
+					in_start, in_pos, in_size));
+		}
+	}
+
+	// Fall through
+
+	case SEQ_HEADER_DECODE:
+		// Copy the Stream Header field into coder->temp.
+		// If Stream Header was already available in coder->temp
+		// in SEQ_INDEX_DECODE, then this does nothing.
+		if (fill_temp(coder, in, in_pos, in_size))
+			return LZMA_OK;
+
+		// Make all these point to the beginning of Stream Header.
+		coder->file_target_pos -= LZMA_STREAM_HEADER_SIZE;
+		coder->temp_size -= LZMA_STREAM_HEADER_SIZE;
+		coder->temp_pos = coder->temp_size;
+
+		// Decode the Stream Header.
+		return_if_error(hide_format_error(lzma_stream_header_decode(
+				&coder->header_flags,
+				coder->temp + coder->temp_size)));
+
+		coder->sequence = SEQ_HEADER_COMPARE;
+
+	// Fall through
+
+	case SEQ_HEADER_COMPARE:
+		// Compare Stream Header against Stream Footer. They must
+		// match.
+		return_if_error(lzma_stream_flags_compare(
+				&coder->header_flags, &coder->footer_flags));
+
+		// Store the decoded Stream Flags into the Index. Use the
+		// Footer Flags because it contains Backward Size, although
+		// it shouldn't matter in practice.
+		if (lzma_index_stream_flags(coder->this_index,
+				&coder->footer_flags) != LZMA_OK)
+			return LZMA_PROG_ERROR;
+
+		// Store also the size of the Stream Padding field. It is
+		// needed to calculate the offsets of the Streams correctly.
+		if (lzma_index_stream_padding(coder->this_index,
+				coder->stream_padding) != LZMA_OK)
+			return LZMA_PROG_ERROR;
+
+		// Reset it so that it's ready for the next Stream.
+		coder->stream_padding = 0;
+
+		// Append the earlier decoded Indexes after this_index.
+		if (coder->combined_index != NULL)
+			return_if_error(lzma_index_cat(coder->this_index,
+					coder->combined_index, allocator));
+
+		coder->combined_index = coder->this_index;
+		coder->this_index = NULL;
+
+		// If the whole file was decoded, tell the caller that we
+		// are finished.
+		if (coder->file_target_pos == 0) {
+			// The combined index must indicate the same file
+			// size as was told to us at initialization.
+			assert(lzma_index_file_size(coder->combined_index)
+					== coder->file_size);
+
+			// Make the combined index available to
+			// the application.
+			*coder->dest_index = coder->combined_index;
+			coder->combined_index = NULL;
+
+			// Mark the input buffer as used since we may have
+			// done internal seeking and thus don't know how
+			// many input bytes were actually used. This way
+			// lzma_stream.total_in gets a slightly better
+			// estimate of the amount of input used.
+			*in_pos = in_size;
+			return LZMA_STREAM_END;
+		}
+
+		// We didn't hit the beginning of the file yet, so continue
+		// reading backwards in the file. If we have unprocessed
+		// data in coder->temp, use it before requesting more data
+		// from the application.
+		//
+		// coder->file_target_pos, coder->temp_size, and
+		// coder->temp_pos all point to the beginning of Stream Header
+		// and thus the end of the previous Stream in the file.
+		coder->sequence = coder->temp_size > 0
+				? SEQ_PADDING_DECODE : SEQ_PADDING_SEEK;
+		break;
+
+	default:
+		assert(0);
+		return LZMA_PROG_ERROR;
+	}
+}
+
+
+static lzma_ret
+file_info_decoder_memconfig(void *coder_ptr, uint64_t *memusage,
+		uint64_t *old_memlimit, uint64_t new_memlimit)
+{
+	lzma_file_info_coder *coder = coder_ptr;
+
+	// The memory usage calculation comes from three things:
+	//
+	// (1) The Indexes that have already been decoded and processed into
+	//     coder->combined_index.
+	//
+	// (2) The latest Index in coder->this_index that has been decoded but
+	//     not yet put into coder->combined_index.
+	//
+	// (3) The latest Index that we have started decoding but haven't
+	//     finished and thus isn't available in coder->this_index yet.
+	//     Memory usage and limit information needs to be communicated
+	//     from/to coder->index_decoder.
+	//
+	// Care has to be taken to not do both (2) and (3) when calculating
+	// the memory usage.
+	uint64_t combined_index_memusage = 0;
+	uint64_t this_index_memusage = 0;
+
+	// (1) If we have already successfully decoded one or more Indexes,
+	// get their memory usage.
+	if (coder->combined_index != NULL)
+		combined_index_memusage = lzma_index_memused(
+				coder->combined_index);
+
+	// Choose between (2), (3), or neither.
+	if (coder->this_index != NULL) {
+		// (2) The latest Index is available. Use its memory usage.
+		this_index_memusage = lzma_index_memused(coder->this_index);
+
+	} else if (coder->sequence == SEQ_INDEX_DECODE) {
+		// (3) The Index decoder is activate and hasn't yet stored
+		// the new index in coder->this_index. Get the memory usage
+		// information from the Index decoder.
+		//
+		// NOTE: If the Index decoder doesn't yet know how much memory
+		// it will eventually need, it will return a tiny value here.
+		uint64_t dummy;
+		if (coder->index_decoder.memconfig(coder->index_decoder.coder,
+					&this_index_memusage, &dummy, 0)
+				!= LZMA_OK) {
+			assert(0);
+			return LZMA_PROG_ERROR;
+		}
+	}
+
+	// Now we know the total memory usage/requirement. If we had neither
+	// old Indexes nor a new Index, this will be zero which isn't
+	// acceptable as lzma_memusage() has to return non-zero on success
+	// and even with an empty .xz file we will end up with a lzma_index
+	// that takes some memory.
+	*memusage = combined_index_memusage + this_index_memusage;
+	if (*memusage == 0)
+		*memusage = lzma_index_memusage(1, 0);
+
+	*old_memlimit = coder->memlimit;
+
+	// If requested, set a new memory usage limit.
+	if (new_memlimit != 0) {
+		if (new_memlimit < *memusage)
+			return LZMA_MEMLIMIT_ERROR;
+
+		// In the condition (3) we need to tell the Index decoder
+		// its new memory usage limit.
+		if (coder->this_index == NULL
+				&& coder->sequence == SEQ_INDEX_DECODE) {
+			const uint64_t idec_new_memlimit = new_memlimit
+					- combined_index_memusage;
+
+			assert(this_index_memusage > 0);
+			assert(idec_new_memlimit > 0);
+
+			uint64_t dummy1;
+			uint64_t dummy2;
+
+			if (coder->index_decoder.memconfig(
+					coder->index_decoder.coder,
+					&dummy1, &dummy2, idec_new_memlimit)
+					!= LZMA_OK) {
+				assert(0);
+				return LZMA_PROG_ERROR;
+			}
+		}
+
+		coder->memlimit = new_memlimit;
+	}
+
+	return LZMA_OK;
+}
+
+
+static void
+file_info_decoder_end(void *coder_ptr, const lzma_allocator *allocator)
+{
+	lzma_file_info_coder *coder = coder_ptr;
+
+	lzma_next_end(&coder->index_decoder, allocator);
+	lzma_index_end(coder->this_index, allocator);
+	lzma_index_end(coder->combined_index, allocator);
+
+	lzma_free(coder, allocator);
+	return;
+}
+
+
+static lzma_ret
+lzma_file_info_decoder_init(lzma_next_coder *next,
+		const lzma_allocator *allocator, uint64_t *seek_pos,
+		lzma_index **dest_index,
+		uint64_t memlimit, uint64_t file_size)
+{
+	lzma_next_coder_init(&lzma_file_info_decoder_init, next, allocator);
+
+	if (dest_index == NULL)
+		return LZMA_PROG_ERROR;
+
+	lzma_file_info_coder *coder = next->coder;
+	if (coder == NULL) {
+		coder = lzma_alloc(sizeof(lzma_file_info_coder), allocator);
+		if (coder == NULL)
+			return LZMA_MEM_ERROR;
+
+		next->coder = coder;
+		next->code = &file_info_decode;
+		next->end = &file_info_decoder_end;
+		next->memconfig = &file_info_decoder_memconfig;
+
+		coder->index_decoder = LZMA_NEXT_CODER_INIT;
+		coder->this_index = NULL;
+		coder->combined_index = NULL;
+	}
+
+	coder->sequence = SEQ_MAGIC_BYTES;
+	coder->file_cur_pos = 0;
+	coder->file_target_pos = 0;
+	coder->file_size = file_size;
+
+	lzma_index_end(coder->this_index, allocator);
+	coder->this_index = NULL;
+
+	lzma_index_end(coder->combined_index, allocator);
+	coder->combined_index = NULL;
+
+	coder->stream_padding = 0;
+
+	coder->dest_index = dest_index;
+	coder->external_seek_pos = seek_pos;
+
+	// If memlimit is 0, make it 1 to ensure that lzma_memlimit_get()
+	// won't return 0 (which would indicate an error).
+	coder->memlimit = my_max(1, memlimit);
+
+	// Prepare these for reading the first Stream Header into coder->temp.
+	coder->temp_pos = 0;
+	coder->temp_size = LZMA_STREAM_HEADER_SIZE;
+
+	return LZMA_OK;
+}
+
+
+extern LZMA_API(lzma_ret)
+lzma_file_info_decoder(lzma_stream *strm, lzma_index **dest_index,
+		uint64_t memlimit, uint64_t file_size)
+{
+	lzma_next_strm_init(lzma_file_info_decoder_init, strm, &strm->seek_pos,
+			dest_index, memlimit, file_size);
+
+	// We allow LZMA_FINISH in addition to LZMA_RUN for convenience.
+	// lzma_code() is able to handle the LZMA_FINISH + LZMA_SEEK_NEEDED
+	// combination in a sane way. Applications still need to be careful
+	// if they use LZMA_FINISH so that they remember to reset it back
+	// to LZMA_RUN after seeking if needed.
+	strm->internal->supported_actions[LZMA_RUN] = true;
+	strm->internal->supported_actions[LZMA_FINISH] = true;
+
+	return LZMA_OK;
+}
--- a/src/liblzma/common/filter_common.c
+++ b/src/liblzma/common/filter_common.c
@ -42,6 +42,13 @@ static const struct {
 		.last_ok = true,
 		.changes_size = true,
 	},
+	{
+		.id = LZMA_FILTER_LZMA1EXT,
+		.options_size = sizeof(lzma_options_lzma),
+		.non_last_ok = false,
+		.last_ok = true,
+		.changes_size = true,
+	},
 #endif
 #if defined(HAVE_ENCODER_LZMA2) || defined(HAVE_DECODER_LZMA2)
 	{
@ -97,6 +104,15 @@ static const struct {
 		.changes_size = false,
 	},
 #endif
+#if defined(HAVE_ENCODER_ARM64) || defined(HAVE_DECODER_ARM64)
+	{
+		.id = LZMA_FILTER_ARM64,
+		.options_size = sizeof(lzma_options_bcj),
+		.non_last_ok = true,
+		.last_ok = false,
+		.changes_size = false,
+	},
+#endif
 #if defined(HAVE_ENCODER_SPARC) || defined(HAVE_DECODER_SPARC)
 	{
 		.id = LZMA_FILTER_SPARC,
@ -196,8 +212,34 @@ lzma_filters_copy(const lzma_filter *src, lzma_filter *real_dest,
 }


-static lzma_ret
-validate_chain(const lzma_filter *filters, size_t *count)
+extern LZMA_API(void)
+lzma_filters_free(lzma_filter *filters, const lzma_allocator *allocator)
+{
+	if (filters == NULL)
+		return;
+
+	for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) {
+		if (i == LZMA_FILTERS_MAX) {
+			// The API says that LZMA_FILTERS_MAX + 1 is the
+			// maximum allowed size including the terminating
+			// element. Thus, we should never get here but in
+			// case there is a bug and we do anyway, don't go
+			// past the (probable) end of the array.
+			assert(0);
+			break;
+		}
+
+		lzma_free(filters[i].options, allocator);
+		filters[i].options = NULL;
+		filters[i].id = LZMA_VLI_UNKNOWN;
+	}
+
+	return;
+}
+
+
+extern lzma_ret
+lzma_validate_chain(const lzma_filter *filters, size_t *count)
 {
 	// There must be at least one filter.
 	if (filters == NULL || filters[0].id == LZMA_VLI_UNKNOWN)
@ -251,7 +293,7 @@ lzma_raw_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 {
 	// Do some basic validation and get the number of filters.
 	size_t count;
-	return_if_error(validate_chain(options, &count));
+	return_if_error(lzma_validate_chain(options, &count));

 	// Set the filter functions and copy the options pointer.
 	lzma_filter_info filters[LZMA_FILTERS_MAX + 1];
@ -304,7 +346,7 @@ lzma_raw_coder_memusage(lzma_filter_find coder_find,
 	// The chain has to have at least one filter.
 	{
 		size_t tmp;
-		if (validate_chain(filters, &tmp) != LZMA_OK)
+		if (lzma_validate_chain(filters, &tmp) != LZMA_OK)
 			return UINT64_MAX;
 	}

--- a/src/liblzma/common/filter_common.h
+++ b/src/liblzma/common/filter_common.h
@ -35,6 +35,9 @@ typedef struct {
 typedef const lzma_filter_coder *(*lzma_filter_find)(lzma_vli id);


+extern lzma_ret lzma_validate_chain(const lzma_filter *filters, size_t *count);
+
+
 extern lzma_ret lzma_raw_coder_init(
 		lzma_next_coder *next, const lzma_allocator *allocator,
 		const lzma_filter *filters,
--- a/src/liblzma/common/filter_decoder.c
+++ b/src/liblzma/common/filter_decoder.c
@ -50,6 +50,12 @@ static const lzma_filter_decoder decoders[] = {
 		.memusage = &lzma_lzma_decoder_memusage,
 		.props_decode = &lzma_lzma_props_decode,
 	},
+	{
+		.id = LZMA_FILTER_LZMA1EXT,
+		.init = &lzma_lzma_decoder_init,
+		.memusage = &lzma_lzma_decoder_memusage,
+		.props_decode = &lzma_lzma_props_decode,
+	},
 #endif
 #ifdef HAVE_DECODER_LZMA2
 	{
@ -99,6 +105,14 @@ static const lzma_filter_decoder decoders[] = {
 		.props_decode = &lzma_simple_props_decode,
 	},
 #endif
+#ifdef HAVE_DECODER_ARM64
+	{
+		.id = LZMA_FILTER_ARM64,
+		.init = &lzma_simple_arm64_decoder_init,
+		.memusage = NULL,
+		.props_decode = &lzma_simple_props_decode,
+	},
+#endif
 #ifdef HAVE_DECODER_SPARC
 	{
 		.id = LZMA_FILTER_SPARC,
--- a/src/liblzma/common/filter_encoder.c
+++ b/src/liblzma/common/filter_encoder.c
@ -64,6 +64,15 @@ static const lzma_filter_encoder encoders[] = {
 		.props_size_fixed = 5,
 		.props_encode = &lzma_lzma_props_encode,
 	},
+	{
+		.id = LZMA_FILTER_LZMA1EXT,
+		.init = &lzma_lzma_encoder_init,
+		.memusage = &lzma_lzma_encoder_memusage,
+		.block_size = NULL, // Not needed for LZMA1
+		.props_size_get = NULL,
+		.props_size_fixed = 5,
+		.props_encode = &lzma_lzma_props_encode,
+	},
 #endif
 #ifdef HAVE_ENCODER_LZMA2
 	{
@ -126,6 +135,16 @@ static const lzma_filter_encoder encoders[] = {
 		.props_encode = &lzma_simple_props_encode,
 	},
 #endif
+#ifdef HAVE_ENCODER_ARM64
+	{
+		.id = LZMA_FILTER_ARM64,
+		.init = &lzma_simple_arm64_encoder_init,
+		.memusage = NULL,
+		.block_size = NULL,
+		.props_size_get = &lzma_simple_props_size,
+		.props_encode = &lzma_simple_props_encode,
+	},
+#endif
 #ifdef HAVE_ENCODER_SPARC
 	{
 		.id = LZMA_FILTER_SPARC,
--- a/src/liblzma/common/index_decoder.c
+++ b/src/liblzma/common/index_decoder.c
@ -10,7 +10,7 @@
 //
 ///////////////////////////////////////////////////////////////////////////////

-#include "index.h"
+#include "index_decoder.h"
 #include "check.h"


@ -180,8 +180,11 @@ index_decode(void *coder_ptr, const lzma_allocator *allocator,
 				return LZMA_OK;

 			if (((coder->crc32 >> (coder->pos * 8)) & 0xFF)
-					!= in[(*in_pos)++])
+					!= in[(*in_pos)++]) {
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 				return LZMA_DATA_ERROR;
+#endif
+			}

 		} while (++coder->pos < 4);

@ -265,11 +268,11 @@ index_decoder_reset(lzma_index_coder *coder, const lzma_allocator *allocator,
 }


-static lzma_ret
-index_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
+extern lzma_ret
+lzma_index_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 		lzma_index **i, uint64_t memlimit)
 {
-	lzma_next_coder_init(&index_decoder_init, next, allocator);
+	lzma_next_coder_init(&lzma_index_decoder_init, next, allocator);

 	if (i == NULL)
 		return LZMA_PROG_ERROR;
@ -296,7 +299,7 @@ index_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 extern LZMA_API(lzma_ret)
 lzma_index_decoder(lzma_stream *strm, lzma_index **i, uint64_t memlimit)
 {
-	lzma_next_strm_init(index_decoder_init, strm, i, memlimit);
+	lzma_next_strm_init(lzma_index_decoder_init, strm, i, memlimit);

 	strm->internal->supported_actions[LZMA_RUN] = true;
 	strm->internal->supported_actions[LZMA_FINISH] = true;
--- a/src/liblzma/common/index_decoder.h
+++ b/src/liblzma/common/index_decoder.h
@ -0,0 +1,24 @@
+///////////////////////////////////////////////////////////////////////////////
+//
+/// \file       index_decoder.h
+/// \brief      Decodes the Index field
+//
+//  Author:     Lasse Collin
+//
+//  This file has been put into the public domain.
+//  You can do whatever you want with this file.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+#ifndef LZMA_INDEX_DECODER_H
+#define LZMA_INDEX_DECODER_H
+
+#include "index.h"
+
+
+extern lzma_ret lzma_index_decoder_init(lzma_next_coder *next,
+		const lzma_allocator *allocator,
+		lzma_index **i, uint64_t memlimit);
+
+
+#endif
--- a/src/liblzma/common/index_hash.c
+++ b/src/liblzma/common/index_hash.c
@ -312,8 +312,11 @@ lzma_index_hash_decode(lzma_index_hash *index_hash, const uint8_t *in,
 				return LZMA_OK;

 			if (((index_hash->crc32 >> (index_hash->pos * 8))
-					& 0xFF) != in[(*in_pos)++])
+					& 0xFF) != in[(*in_pos)++]) {
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 				return LZMA_DATA_ERROR;
+#endif
+			}

 		} while (++index_hash->pos < 4);

--- a/src/liblzma/common/lzip_decoder.c
+++ b/src/liblzma/common/lzip_decoder.c
@ -0,0 +1,414 @@
+///////////////////////////////////////////////////////////////////////////////
+//
+/// \file       lzip_decoder.c
+/// \brief      Decodes .lz (lzip) files
+//
+//  Author:     Michał Górny
+//              Lasse Collin
+//
+//  This file has been put into the public domain.
+//  You can do whatever you want with this file.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+#include "lzip_decoder.h"
+#include "lzma_decoder.h"
+#include "check.h"
+
+
+// .lz format version 0 lacks the 64-bit Member size field in the footer.
+#define LZIP_V0_FOOTER_SIZE 12
+#define LZIP_V1_FOOTER_SIZE 20
+#define LZIP_FOOTER_SIZE_MAX LZIP_V1_FOOTER_SIZE
+
+// lc/lp/pb are hardcoded in the .lz format.
+#define LZIP_LC 3
+#define LZIP_LP 0
+#define LZIP_PB 2
+
+
+typedef struct {
+	enum {
+		SEQ_ID_STRING,
+		SEQ_VERSION,
+		SEQ_DICT_SIZE,
+		SEQ_CODER_INIT,
+		SEQ_LZMA_STREAM,
+		SEQ_MEMBER_FOOTER,
+	} sequence;
+
+	/// .lz member format version
+	uint32_t version;
+
+	/// CRC32 of the uncompressed data in the .lz member
+	uint32_t crc32;
+
+	/// Uncompressed size of the .lz member
+	uint64_t uncompressed_size;
+
+	/// Compressed size of the .lz member
+	uint64_t member_size;
+
+	/// Memory usage limit
+	uint64_t memlimit;
+
+	/// Amount of memory actually needed
+	uint64_t memusage;
+
+	/// If true, LZMA_GET_CHECK is returned after decoding the header
+	/// fields. As all files use CRC32 this is redundant but it's
+	/// implemented anyway since the initialization functions supports
+	/// all other flags in addition to LZMA_TELL_ANY_CHECK.
+	bool tell_any_check;
+
+	/// If true, we won't calculate or verify the CRC32 of
+	/// the uncompressed data.
+	bool ignore_check;
+
+	/// If true, we will decode concatenated .lz members and stop if
+	/// non-.lz data is seen after at least one member has been
+	/// successfully decoded.
+	bool concatenated;
+
+	/// When decoding concatenated .lz members, this is true as long as
+	/// we are decoding the first .lz member. This is needed to avoid
+	/// incorrect LZMA_FORMAT_ERROR in case there is non-.lz data at
+	/// the end of the file.
+	bool first_member;
+
+	/// Reading position in the header and footer fields
+	size_t pos;
+
+	/// Buffer to hold the .lz footer fields
+	uint8_t buffer[LZIP_FOOTER_SIZE_MAX];
+
+	/// Options decoded from the .lz header that needed to initialize
+	/// the LZMA1 decoder.
+	lzma_options_lzma options;
+
+	/// LZMA1 decoder
+	lzma_next_coder lzma_decoder;
+
+} lzma_lzip_coder;
+
+
+static lzma_ret
+lzip_decode(void *coder_ptr, const lzma_allocator *allocator,
+		const uint8_t *restrict in, size_t *restrict in_pos,
+		size_t in_size, uint8_t *restrict out,
+		size_t *restrict out_pos, size_t out_size, lzma_action action)
+{
+	lzma_lzip_coder *coder = coder_ptr;
+
+	while (true)
+	switch (coder->sequence) {
+	case SEQ_ID_STRING: {
+		// The "ID string" or magic bytes are "LZIP" in US-ASCII.
+		const uint8_t lzip_id_string[4] = { 0x4C, 0x5A, 0x49, 0x50 };
+
+		while (coder->pos < sizeof(lzip_id_string)) {
+			if (*in_pos >= in_size) {
+				// If we are on the 2nd+ concatenated member
+				// and the input ends before we can read
+				// the magic bytes, we discard the bytes that
+				// were already read (up to 3) and finish.
+				// See the reasoning below.
+				return !coder->first_member
+						&& action == LZMA_FINISH
+					? LZMA_STREAM_END : LZMA_OK;
+			}
+
+			if (in[*in_pos] != lzip_id_string[coder->pos]) {
+				// The .lz format allows putting non-.lz data
+				// at the end of the file. If we have seen
+				// at least one valid .lz member already,
+				// then we won't consume the byte at *in_pos
+				// and will return LZMA_STREAM_END. This way
+				// apps can easily locate and read the non-.lz
+				// data after the .lz member(s).
+				//
+				// NOTE: If the first 1-3 bytes of the non-.lz
+				// data match the .lz ID string then the first
+				// 1-3 bytes of the junk will get ignored by
+				// us. If apps want to properly locate the
+				// trailing data they must ensure that the
+				// first byte of their custom data isn't the
+				// same as the first byte of .lz ID string.
+				// With the liblzma API we cannot rewind the
+				// input position across calls to lzma_code().
+				return !coder->first_member
+					? LZMA_STREAM_END : LZMA_FORMAT_ERROR;
+			}
+
+			++*in_pos;
+			++coder->pos;
+		}
+
+		coder->pos = 0;
+
+		coder->crc32 = 0;
+		coder->uncompressed_size = 0;
+		coder->member_size = sizeof(lzip_id_string);
+
+		coder->sequence = SEQ_VERSION;
+	}
+
+	// Fall through
+
+	case SEQ_VERSION:
+		if (*in_pos >= in_size)
+			return LZMA_OK;
+
+		coder->version = in[(*in_pos)++];
+
+		// We support version 0 and unextended version 1.
+		if (coder->version > 1)
+			return LZMA_OPTIONS_ERROR;
+
+		++coder->member_size;
+		coder->sequence = SEQ_DICT_SIZE;
+
+		// .lz versions 0 and 1 use CRC32 as the integrity check
+		// so if the application wanted to know that
+		// (LZMA_TELL_ANY_CHECK) we can tell it now.
+		if (coder->tell_any_check)
+			return LZMA_GET_CHECK;
+
+	// Fall through
+
+	case SEQ_DICT_SIZE: {
+		if (*in_pos >= in_size)
+			return LZMA_OK;
+
+		const uint32_t ds = in[(*in_pos)++];
+		++coder->member_size;
+
+		// The five lowest bits are for the base-2 logarithm of
+		// the dictionary size and the highest three bits are
+		// the fractional part (0/16 to 7/16) that will be
+		// substracted to get the final value.
+		//
+		// For example, with 0xB5:
+		//     b2log = 21
+		//     fracnum = 5
+		//     dict_size = 2^21 - 2^21 * 5 / 16 = 1408 KiB
+		const uint32_t b2log = ds & 0x1F;
+		const uint32_t fracnum = ds >> 5;
+
+		// The format versions 0 and 1 allow dictionary size in the
+		// range [4 KiB, 512 MiB].
+		if (b2log < 12 || b2log > 29 || (b2log == 12 && fracnum > 0))
+			return LZMA_DATA_ERROR;
+
+		//   2^[b2log] - 2^[b2log] * [fracnum] / 16
+		// = 2^[b2log] - [fracnum] * 2^([b2log] - 4)
+		coder->options.dict_size = (UINT32_C(1) << b2log)
+				- (fracnum << (b2log - 4));
+
+		assert(coder->options.dict_size >= 4096);
+		assert(coder->options.dict_size <= (UINT32_C(512) << 20));
+
+		coder->options.preset_dict = NULL;
+		coder->options.lc = LZIP_LC;
+		coder->options.lp = LZIP_LP;
+		coder->options.pb = LZIP_PB;
+
+		// Calculate the memory usage.
+		coder->memusage = lzma_lzma_decoder_memusage(&coder->options)
+				+ LZMA_MEMUSAGE_BASE;
+
+		// Initialization is a separate step because if we return
+		// LZMA_MEMLIMIT_ERROR we need to be able to restart after
+		// the memlimit has been increased.
+		coder->sequence = SEQ_CODER_INIT;
+	}
+
+	// Fall through
+
+	case SEQ_CODER_INIT: {
+		if (coder->memusage > coder->memlimit)
+			return LZMA_MEMLIMIT_ERROR;
+
+		const lzma_filter_info filters[2] = {
+			{
+				.id = LZMA_FILTER_LZMA1,
+				.init = &lzma_lzma_decoder_init,
+				.options = &coder->options,
+			}, {
+				.init = NULL,
+			}
+		};
+
+		return_if_error(lzma_next_filter_init(&coder->lzma_decoder,
+				allocator, filters));
+
+		coder->crc32 = 0;
+		coder->sequence = SEQ_LZMA_STREAM;
+	}
+
+	// Fall through
+
+	case SEQ_LZMA_STREAM: {
+		const size_t in_start = *in_pos;
+		const size_t out_start = *out_pos;
+
+		const lzma_ret ret = coder->lzma_decoder.code(
+				coder->lzma_decoder.coder, allocator,
+				in, in_pos, in_size, out, out_pos, out_size,
+				action);
+
+		const size_t out_used = *out_pos - out_start;
+
+		coder->member_size += *in_pos - in_start;
+		coder->uncompressed_size += out_used;
+
+		if (!coder->ignore_check)
+			coder->crc32 = lzma_crc32(out + out_start, out_used,
+					coder->crc32);
+
+		if (ret != LZMA_STREAM_END)
+			return ret;
+
+		coder->sequence = SEQ_MEMBER_FOOTER;
+	}
+
+	// Fall through
+
+	case SEQ_MEMBER_FOOTER: {
+		// The footer of .lz version 0 lacks the Member size field.
+		// This is the only difference between version 0 and
+		// unextended version 1 formats.
+		const size_t footer_size = coder->version == 0
+				? LZIP_V0_FOOTER_SIZE
+				: LZIP_V1_FOOTER_SIZE;
+
+		// Copy the CRC32, Data size, and Member size fields to
+		// the internal buffer.
+		lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos,
+				footer_size);
+
+		// Return if we didn't get the whole footer yet.
+		if (coder->pos < footer_size)
+			return LZMA_OK;
+
+		coder->pos = 0;
+		coder->member_size += footer_size;
+
+		// Check that the footer fields match the observed data.
+		if (!coder->ignore_check
+				&& coder->crc32 != read32le(&coder->buffer[0]))
+			return LZMA_DATA_ERROR;
+
+		if (coder->uncompressed_size != read64le(&coder->buffer[4]))
+			return LZMA_DATA_ERROR;
+
+		if (coder->version > 0) {
+			// .lz version 0 has no Member size field.
+			if (coder->member_size != read64le(&coder->buffer[12]))
+				return LZMA_DATA_ERROR;
+		}
+
+		// Decoding is finished if we weren't requested to decode
+		// more than one .lz member.
+		if (!coder->concatenated)
+			return LZMA_STREAM_END;
+
+		coder->first_member = false;
+		coder->sequence = SEQ_ID_STRING;
+		break;
+	}
+
+	default:
+		assert(0);
+		return LZMA_PROG_ERROR;
+	}
+
+	// Never reached
+}
+
+
+static void
+lzip_decoder_end(void *coder_ptr, const lzma_allocator *allocator)
+{
+	lzma_lzip_coder *coder = coder_ptr;
+	lzma_next_end(&coder->lzma_decoder, allocator);
+	lzma_free(coder, allocator);
+	return;
+}
+
+
+static lzma_check
+lzip_decoder_get_check(const void *coder_ptr lzma_attribute((__unused__)))
+{
+	return LZMA_CHECK_CRC32;
+}
+
+
+static lzma_ret
+lzip_decoder_memconfig(void *coder_ptr, uint64_t *memusage,
+		uint64_t *old_memlimit, uint64_t new_memlimit)
+{
+	lzma_lzip_coder *coder = coder_ptr;
+
+	*memusage = coder->memusage;
+	*old_memlimit = coder->memlimit;
+
+	if (new_memlimit != 0) {
+		if (new_memlimit < coder->memusage)
+			return LZMA_MEMLIMIT_ERROR;
+
+		coder->memlimit = new_memlimit;
+	}
+
+	return LZMA_OK;
+}
+
+
+extern lzma_ret
+lzma_lzip_decoder_init(
+		lzma_next_coder *next, const lzma_allocator *allocator,
+		uint64_t memlimit, uint32_t flags)
+{
+	lzma_next_coder_init(&lzma_lzip_decoder_init, next, allocator);
+
+	if (flags & ~LZMA_SUPPORTED_FLAGS)
+		return LZMA_OPTIONS_ERROR;
+
+	lzma_lzip_coder *coder = next->coder;
+	if (coder == NULL) {
+		coder = lzma_alloc(sizeof(lzma_lzip_coder), allocator);
+		if (coder == NULL)
+			return LZMA_MEM_ERROR;
+
+		next->coder = coder;
+		next->code = &lzip_decode;
+		next->end = &lzip_decoder_end;
+		next->get_check = &lzip_decoder_get_check;
+		next->memconfig = &lzip_decoder_memconfig;
+
+		coder->lzma_decoder = LZMA_NEXT_CODER_INIT;
+	}
+
+	coder->sequence = SEQ_ID_STRING;
+	coder->memlimit = my_max(1, memlimit);
+	coder->memusage = LZMA_MEMUSAGE_BASE;
+	coder->tell_any_check = (flags & LZMA_TELL_ANY_CHECK) != 0;
+	coder->ignore_check = (flags & LZMA_IGNORE_CHECK) != 0;
+	coder->concatenated = (flags & LZMA_CONCATENATED) != 0;
+	coder->first_member = true;
+	coder->pos = 0;
+
+	return LZMA_OK;
+}
+
+
+extern LZMA_API(lzma_ret)
+lzma_lzip_decoder(lzma_stream *strm, uint64_t memlimit, uint32_t flags)
+{
+	lzma_next_strm_init(lzma_lzip_decoder_init, strm, memlimit, flags);
+
+	strm->internal->supported_actions[LZMA_RUN] = true;
+	strm->internal->supported_actions[LZMA_FINISH] = true;
+
+	return LZMA_OK;
+}
--- a/src/liblzma/common/lzip_decoder.h
+++ b/src/liblzma/common/lzip_decoder.h
@ -0,0 +1,22 @@
+///////////////////////////////////////////////////////////////////////////////
+//
+/// \file       lzip_decoder.h
+/// \brief      Decodes .lz (lzip) files
+//
+//  Author:     Michał Górny
+//
+//  This file has been put into the public domain.
+//  You can do whatever you want with this file.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+#ifndef LZMA_LZIP_DECODER_H
+#define LZMA_LZIP_DECODER_H
+
+#include "common.h"
+
+extern lzma_ret lzma_lzip_decoder_init(
+		lzma_next_coder *next, const lzma_allocator *allocator,
+		uint64_t memlimit, uint32_t flags);
+
+#endif
--- a/src/liblzma/common/memcmplen.h
+++ b/src/liblzma/common/memcmplen.h
@ -51,10 +51,6 @@ lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2,
 			|| (defined(__INTEL_COMPILER) && defined(__x86_64__)) \
 			|| (defined(__INTEL_COMPILER) && defined(_M_X64)) \
 			|| (defined(_MSC_VER) && defined(_M_X64)))
-	// NOTE: This will use 64-bit unaligned access which
-	// TUKLIB_FAST_UNALIGNED_ACCESS wasn't meant to permit, but
-	// it's convenient here at least as long as it's x86-64 only.
-	//
 	// I keep this x86-64 only for now since that's where I know this
 	// to be a good method. This may be fine on other 64-bit CPUs too.
 	// On big endian one should use xor instead of subtraction and switch
@ -83,8 +79,9 @@ lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2,
 		&& (defined(__SSE2__) \
 			|| (defined(_MSC_VER) && defined(_M_IX86_FP) \
 				&& _M_IX86_FP >= 2))
-	// NOTE: Like above, this will use 128-bit unaligned access which
-	// TUKLIB_FAST_UNALIGNED_ACCESS wasn't meant to permit.
+	// NOTE: This will use 128-bit unaligned access which
+	// TUKLIB_FAST_UNALIGNED_ACCESS wasn't meant to permit,
+	// but it's convenient here since this is x86-only.
 	//
 	// SSE2 version for 32-bit and 64-bit x86. On x86-64 the above
 	// version is sometimes significantly faster and sometimes
--- a/src/liblzma/common/microlzma_decoder.c
+++ b/src/liblzma/common/microlzma_decoder.c
@ -0,0 +1,221 @@
+///////////////////////////////////////////////////////////////////////////////
+//
+/// \file       microlzma_decoder.c
+/// \brief      Decode MicroLZMA format
+//
+//  Author:     Lasse Collin
+//
+//  This file has been put into the public domain.
+//  You can do whatever you want with this file.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+#include "lzma_decoder.h"
+#include "lz_decoder.h"
+
+
+typedef struct {
+	/// LZMA1 decoder
+	lzma_next_coder lzma;
+
+	/// Compressed size of the stream as given by the application.
+	/// This must be exactly correct.
+	///
+	/// This will be decremented when input is read.
+	uint64_t comp_size;
+
+	/// Uncompressed size of the stream as given by the application.
+	/// This may be less than the actual uncompressed size if
+	/// uncomp_size_is_exact is false.
+	///
+	/// This will be decremented when output is produced.
+	lzma_vli uncomp_size;
+
+	/// LZMA dictionary size as given by the application
+	uint32_t dict_size;
+
+	/// If true, the exact uncompressed size is known. If false,
+	/// uncomp_size may be smaller than the real uncompressed size;
+	/// uncomp_size may never be bigger than the real uncompressed size.
+	bool uncomp_size_is_exact;
+
+	/// True once the first byte of the MicroLZMA stream
+	/// has been processed.
+	bool props_decoded;
+} lzma_microlzma_coder;
+
+
+static lzma_ret
+microlzma_decode(void *coder_ptr, const lzma_allocator *allocator,
+		const uint8_t *restrict in, size_t *restrict in_pos,
+		size_t in_size, uint8_t *restrict out,
+		size_t *restrict out_pos, size_t out_size, lzma_action action)
+{
+	lzma_microlzma_coder *coder = coder_ptr;
+
+	// Remember the in start position so that we can update comp_size.
+	const size_t in_start = *in_pos;
+
+	// Remember the out start position so that we can update uncomp_size.
+	const size_t out_start = *out_pos;
+
+	// Limit the amount of input so that the decoder won't read more than
+	// comp_size. This is required when uncomp_size isn't exact because
+	// in that case the LZMA decoder will try to decode more input even
+	// when it has no output space (it can be looking for EOPM).
+	if (in_size - *in_pos > coder->comp_size)
+		in_size = *in_pos + (size_t)(coder->comp_size);
+
+	// When the exact uncompressed size isn't known, we must limit
+	// the available output space to prevent the LZMA decoder from
+	// trying to decode too much.
+	if (!coder->uncomp_size_is_exact
+			&& out_size - *out_pos > coder->uncomp_size)
+		out_size = *out_pos + (size_t)(coder->uncomp_size);
+
+	if (!coder->props_decoded) {
+		// There must be at least one byte of input to decode
+		// the properties byte.
+		if (*in_pos >= in_size)
+			return LZMA_OK;
+
+		lzma_options_lzma options = {
+			.dict_size = coder->dict_size,
+			.preset_dict = NULL,
+			.preset_dict_size = 0,
+			.ext_flags = 0, // EOPM not allowed when size is known
+			.ext_size_low = UINT32_MAX, // Unknown size by default
+			.ext_size_high = UINT32_MAX,
+		};
+
+		if (coder->uncomp_size_is_exact)
+			lzma_set_ext_size(options, coder->uncomp_size);
+
+		// The properties are stored as bitwise-negation
+		// of the typical encoding.
+		if (lzma_lzma_lclppb_decode(&options, ~in[*in_pos]))
+			return LZMA_OPTIONS_ERROR;
+
+		++*in_pos;
+
+		// Initialize the decoder.
+		lzma_filter_info filters[2] = {
+			{
+				.id = LZMA_FILTER_LZMA1EXT,
+				.init = &lzma_lzma_decoder_init,
+				.options = &options,
+			}, {
+				.init = NULL,
+			}
+		};
+
+		return_if_error(lzma_next_filter_init(&coder->lzma,
+				allocator, filters));
+
+		// Pass one dummy 0x00 byte to the LZMA decoder since that
+		// is what it expects the first byte to be.
+		const uint8_t dummy_in = 0;
+		size_t dummy_in_pos = 0;
+		if (coder->lzma.code(coder->lzma.coder, allocator,
+				&dummy_in, &dummy_in_pos, 1,
+				out, out_pos, out_size, LZMA_RUN) != LZMA_OK)
+			return LZMA_PROG_ERROR;
+
+		assert(dummy_in_pos == 1);
+		coder->props_decoded = true;
+	}
+
+	// The rest is normal LZMA decoding.
+	lzma_ret ret = coder->lzma.code(coder->lzma.coder, allocator,
+				in, in_pos, in_size,
+				out, out_pos, out_size, action);
+
+	// Update the remaining compressed size.
+	assert(coder->comp_size >= *in_pos - in_start);
+	coder->comp_size -= *in_pos - in_start;
+
+	if (coder->uncomp_size_is_exact) {
+		// After successful decompression of the complete stream
+		// the compressed size must match.
+		if (ret == LZMA_STREAM_END && coder->comp_size != 0)
+			ret = LZMA_DATA_ERROR;
+	} else {
+		// Update the amount of output remaining.
+		assert(coder->uncomp_size >= *out_pos - out_start);
+		coder->uncomp_size -= *out_pos - out_start;
+
+		// - We must not get LZMA_STREAM_END because the stream
+		//   shouldn't have EOPM.
+		// - We must use uncomp_size to determine when to
+		//   return LZMA_STREAM_END.
+		if (ret == LZMA_STREAM_END)
+			ret = LZMA_DATA_ERROR;
+		else if (coder->uncomp_size == 0)
+			ret = LZMA_STREAM_END;
+	}
+
+	return ret;
+}
+
+
+static void
+microlzma_decoder_end(void *coder_ptr, const lzma_allocator *allocator)
+{
+	lzma_microlzma_coder *coder = coder_ptr;
+	lzma_next_end(&coder->lzma, allocator);
+	lzma_free(coder, allocator);
+	return;
+}
+
+
+static lzma_ret
+microlzma_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
+		uint64_t comp_size,
+		uint64_t uncomp_size, bool uncomp_size_is_exact,
+		uint32_t dict_size)
+{
+	lzma_next_coder_init(&microlzma_decoder_init, next, allocator);
+
+	lzma_microlzma_coder *coder = next->coder;
+
+	if (coder == NULL) {
+		coder = lzma_alloc(sizeof(lzma_microlzma_coder), allocator);
+		if (coder == NULL)
+			return LZMA_MEM_ERROR;
+
+		next->coder = coder;
+		next->code = &microlzma_decode;
+		next->end = &microlzma_decoder_end;
+
+		coder->lzma = LZMA_NEXT_CODER_INIT;
+	}
+
+	// The public API is uint64_t but the internal LZ decoder API uses
+	// lzma_vli.
+	if (uncomp_size > LZMA_VLI_MAX)
+		return LZMA_OPTIONS_ERROR;
+
+	coder->comp_size = comp_size;
+	coder->uncomp_size = uncomp_size;
+	coder->uncomp_size_is_exact = uncomp_size_is_exact;
+	coder->dict_size = dict_size;
+
+	coder->props_decoded = false;
+
+	return LZMA_OK;
+}
+
+
+extern LZMA_API(lzma_ret)
+lzma_microlzma_decoder(lzma_stream *strm, uint64_t comp_size,
+		uint64_t uncomp_size, lzma_bool uncomp_size_is_exact,
+		uint32_t dict_size)
+{
+	lzma_next_strm_init(microlzma_decoder_init, strm, comp_size,
+			uncomp_size, uncomp_size_is_exact, dict_size);
+
+	strm->internal->supported_actions[LZMA_RUN] = true;
+	strm->internal->supported_actions[LZMA_FINISH] = true;
+
+	return LZMA_OK;
+}
--- a/src/liblzma/common/microlzma_encoder.c
+++ b/src/liblzma/common/microlzma_encoder.c
@ -0,0 +1,140 @@
+///////////////////////////////////////////////////////////////////////////////
+//
+/// \file       microlzma_encoder.c
+/// \brief      Encode into MicroLZMA format
+//
+//  Author:     Lasse Collin
+//
+//  This file has been put into the public domain.
+//  You can do whatever you want with this file.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+#include "lzma_encoder.h"
+
+
+typedef struct {
+	/// LZMA1 encoder
+	lzma_next_coder lzma;
+
+	/// LZMA properties byte (lc/lp/pb)
+	uint8_t props;
+} lzma_microlzma_coder;
+
+
+static lzma_ret
+microlzma_encode(void *coder_ptr, const lzma_allocator *allocator,
+		const uint8_t *restrict in, size_t *restrict in_pos,
+		size_t in_size, uint8_t *restrict out,
+		size_t *restrict out_pos, size_t out_size, lzma_action action)
+{
+	lzma_microlzma_coder *coder = coder_ptr;
+
+	// Remember *out_pos so that we can overwrite the first byte with
+	// the LZMA properties byte.
+	const size_t out_start = *out_pos;
+
+	// Remember *in_pos so that we can set it based on how many
+	// uncompressed bytes were actually encoded.
+	const size_t in_start = *in_pos;
+
+	// Set the output size limit based on the available output space.
+	// We know that the encoder supports set_out_limit() so
+	// LZMA_OPTIONS_ERROR isn't possible. LZMA_BUF_ERROR is possible
+	// but lzma_code() has an assertion to not allow it to be returned
+	// from here and I don't want to change that for now, so
+	// LZMA_BUF_ERROR becomes LZMA_PROG_ERROR.
+	uint64_t uncomp_size;
+	if (coder->lzma.set_out_limit(coder->lzma.coder,
+			&uncomp_size, out_size - *out_pos) != LZMA_OK)
+		return LZMA_PROG_ERROR;
+
+	// set_out_limit fails if this isn't true.
+	assert(out_size - *out_pos >= 6);
+
+	// Encode as much as possible.
+	const lzma_ret ret = coder->lzma.code(coder->lzma.coder, allocator,
+			in, in_pos, in_size, out, out_pos, out_size, action);
+
+	if (ret != LZMA_STREAM_END) {
+		if (ret == LZMA_OK) {
+			assert(0);
+			return LZMA_PROG_ERROR;
+		}
+
+		return ret;
+	}
+
+	// The first output byte is bitwise-negation of the properties byte.
+	// We know that there is space for this byte because set_out_limit
+	// and the actual encoding succeeded.
+	out[out_start] = (uint8_t)(~coder->props);
+
+	// The LZMA encoder likely read more input than it was able to encode.
+	// Set *in_pos based on uncomp_size.
+	assert(uncomp_size <= in_size - in_start);
+	*in_pos = in_start + (size_t)(uncomp_size);
+
+	return ret;
+}
+
+
+static void
+microlzma_encoder_end(void *coder_ptr, const lzma_allocator *allocator)
+{
+	lzma_microlzma_coder *coder = coder_ptr;
+	lzma_next_end(&coder->lzma, allocator);
+	lzma_free(coder, allocator);
+	return;
+}
+
+
+static lzma_ret
+microlzma_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
+		const lzma_options_lzma *options)
+{
+	lzma_next_coder_init(&microlzma_encoder_init, next, allocator);
+
+	lzma_microlzma_coder *coder = next->coder;
+
+	if (coder == NULL) {
+		coder = lzma_alloc(sizeof(lzma_microlzma_coder), allocator);
+		if (coder == NULL)
+			return LZMA_MEM_ERROR;
+
+		next->coder = coder;
+		next->code = &microlzma_encode;
+		next->end = &microlzma_encoder_end;
+
+		coder->lzma = LZMA_NEXT_CODER_INIT;
+	}
+
+	// Encode the properties byte. Bitwise-negation of it will be the
+	// first output byte.
+	return_if_error(lzma_lzma_lclppb_encode(options, &coder->props));
+
+	// Initialize the LZMA encoder.
+	const lzma_filter_info filters[2] = {
+		{
+			.id = LZMA_FILTER_LZMA1,
+			.init = &lzma_lzma_encoder_init,
+			.options = (void *)(options),
+		}, {
+			.init = NULL,
+		}
+	};
+
+	return lzma_next_filter_init(&coder->lzma, allocator, filters);
+}
+
+
+extern LZMA_API(lzma_ret)
+lzma_microlzma_encoder(lzma_stream *strm, const lzma_options_lzma *options)
+{
+	lzma_next_strm_init(microlzma_encoder_init, strm, options);
+
+	strm->internal->supported_actions[LZMA_FINISH] = true;
+
+	return LZMA_OK;
+
+}
--- a/src/liblzma/common/outqueue.c
+++ b/src/liblzma/common/outqueue.c
@ -13,84 +13,121 @@
 #include "outqueue.h"


-/// This is to ease integer overflow checking: We may allocate up to
-/// 2 * LZMA_THREADS_MAX buffers and we need some extra memory for other
-/// data structures (that's the second /2).
-#define BUF_SIZE_MAX (UINT64_MAX / LZMA_THREADS_MAX / 2 / 2)
-
-
-static lzma_ret
-get_options(uint64_t *bufs_alloc_size, uint32_t *bufs_count,
-		uint64_t buf_size_max, uint32_t threads)
-{
-	if (threads > LZMA_THREADS_MAX || buf_size_max > BUF_SIZE_MAX)
-		return LZMA_OPTIONS_ERROR;
-
-	// The number of buffers is twice the number of threads.
-	// This wastes RAM but keeps the threads busy when buffers
-	// finish out of order.
-	//
-	// NOTE: If this is changed, update BUF_SIZE_MAX too.
-	*bufs_count = threads * 2;
-	*bufs_alloc_size = *bufs_count * buf_size_max;
-
-	return LZMA_OK;
-}
+/// Get the maximum number of buffers that may be allocated based
+/// on the number of threads. For now this is twice the number of threads.
+/// It's a compromise between RAM usage and keeping the worker threads busy
+/// when buffers finish out of order.
+#define GET_BUFS_LIMIT(threads) (2 * (threads))


 extern uint64_t
 lzma_outq_memusage(uint64_t buf_size_max, uint32_t threads)
 {
-	uint64_t bufs_alloc_size;
-	uint32_t bufs_count;
+	// This is to ease integer overflow checking: We may allocate up to
+	// GET_BUFS_LIMIT(LZMA_THREADS_MAX) buffers and we need some extra
+	// memory for other data structures too (that's the /2).
+	//
+	// lzma_outq_prealloc_buf() will still accept bigger buffers than this.
+	const uint64_t limit
+			= UINT64_MAX / GET_BUFS_LIMIT(LZMA_THREADS_MAX) / 2;

-	if (get_options(&bufs_alloc_size, &bufs_count, buf_size_max, threads)
-			!= LZMA_OK)
+	if (threads > LZMA_THREADS_MAX || buf_size_max > limit)
 		return UINT64_MAX;

-	return sizeof(lzma_outq) + bufs_count * sizeof(lzma_outbuf)
-			+ bufs_alloc_size;
+	return GET_BUFS_LIMIT(threads)
+			* lzma_outq_outbuf_memusage(buf_size_max);
+}
+
+
+static void
+move_head_to_cache(lzma_outq *outq, const lzma_allocator *allocator)
+{
+	assert(outq->head != NULL);
+	assert(outq->tail != NULL);
+	assert(outq->bufs_in_use > 0);
+
+	lzma_outbuf *buf = outq->head;
+	outq->head = buf->next;
+	if (outq->head == NULL)
+		outq->tail = NULL;
+
+	if (outq->cache != NULL && outq->cache->allocated != buf->allocated)
+		lzma_outq_clear_cache(outq, allocator);
+
+	buf->next = outq->cache;
+	outq->cache = buf;
+
+	--outq->bufs_in_use;
+	outq->mem_in_use -= lzma_outq_outbuf_memusage(buf->allocated);
+
+	return;
+}
+
+
+static void
+free_one_cached_buffer(lzma_outq *outq, const lzma_allocator *allocator)
+{
+	assert(outq->cache != NULL);
+
+	lzma_outbuf *buf = outq->cache;
+	outq->cache = buf->next;
+
+	--outq->bufs_allocated;
+	outq->mem_allocated -= lzma_outq_outbuf_memusage(buf->allocated);
+
+	lzma_free(buf, allocator);
+	return;
+}
+
+
+extern void
+lzma_outq_clear_cache(lzma_outq *outq, const lzma_allocator *allocator)
+{
+	while (outq->cache != NULL)
+		free_one_cached_buffer(outq, allocator);
+
+	return;
+}
+
+
+extern void
+lzma_outq_clear_cache2(lzma_outq *outq, const lzma_allocator *allocator,
+		size_t keep_size)
+{
+	if (outq->cache == NULL)
+		return;
+
+	// Free all but one.
+	while (outq->cache->next != NULL)
+		free_one_cached_buffer(outq, allocator);
+
+	// Free the last one only if its size doesn't equal to keep_size.
+	if (outq->cache->allocated != keep_size)
+		free_one_cached_buffer(outq, allocator);
+
+	return;
 }


 extern lzma_ret
 lzma_outq_init(lzma_outq *outq, const lzma_allocator *allocator,
-		uint64_t buf_size_max, uint32_t threads)
+		uint32_t threads)
 {
-	uint64_t bufs_alloc_size;
-	uint32_t bufs_count;
+	if (threads > LZMA_THREADS_MAX)
+		return LZMA_OPTIONS_ERROR;

-	// Set bufs_count and bufs_alloc_size.
-	return_if_error(get_options(&bufs_alloc_size, &bufs_count,
-			buf_size_max, threads));
+	const uint32_t bufs_limit = GET_BUFS_LIMIT(threads);

-	// Allocate memory if needed.
-	if (outq->buf_size_max != buf_size_max
-			|| outq->bufs_allocated != bufs_count) {
-		lzma_outq_end(outq, allocator);
+	// Clear head/tail.
+	while (outq->head != NULL)
+		move_head_to_cache(outq, allocator);

-#if SIZE_MAX < UINT64_MAX
-		if (bufs_alloc_size > SIZE_MAX)
-			return LZMA_MEM_ERROR;
-#endif
+	// If new buf_limit is lower than the old one, we may need to free
+	// a few cached buffers.
+	while (bufs_limit < outq->bufs_allocated)
+		free_one_cached_buffer(outq, allocator);

-		outq->bufs = lzma_alloc(bufs_count * sizeof(lzma_outbuf),
-				allocator);
-		outq->bufs_mem = lzma_alloc((size_t)(bufs_alloc_size),
-				allocator);
-
-		if (outq->bufs == NULL || outq->bufs_mem == NULL) {
-			lzma_outq_end(outq, allocator);
-			return LZMA_MEM_ERROR;
-		}
-	}
-
-	// Initialize the rest of the main structure. Initialization of
-	// outq->bufs[] is done when they are actually needed.
-	outq->buf_size_max = (size_t)(buf_size_max);
-	outq->bufs_allocated = bufs_count;
-	outq->bufs_pos = 0;
-	outq->bufs_used = 0;
+	outq->bufs_limit = bufs_limit;
 	outq->read_pos = 0;

 	return LZMA_OK;
@ -100,33 +137,81 @@ lzma_outq_init(lzma_outq *outq, const lzma_allocator *allocator,
 extern void
 lzma_outq_end(lzma_outq *outq, const lzma_allocator *allocator)
 {
-	lzma_free(outq->bufs, allocator);
-	outq->bufs = NULL;
-
-	lzma_free(outq->bufs_mem, allocator);
-	outq->bufs_mem = NULL;
+	while (outq->head != NULL)
+		move_head_to_cache(outq, allocator);

+	lzma_outq_clear_cache(outq, allocator);
 	return;
 }


-extern lzma_outbuf *
-lzma_outq_get_buf(lzma_outq *outq)
+extern lzma_ret
+lzma_outq_prealloc_buf(lzma_outq *outq, const lzma_allocator *allocator,
+		size_t size)
 {
 	// Caller must have checked it with lzma_outq_has_buf().
-	assert(outq->bufs_used < outq->bufs_allocated);
+	assert(outq->bufs_in_use < outq->bufs_limit);

-	// Initialize the new buffer.
-	lzma_outbuf *buf = &outq->bufs[outq->bufs_pos];
-	buf->buf = outq->bufs_mem + outq->bufs_pos * outq->buf_size_max;
-	buf->size = 0;
+	// If there already is appropriately-sized buffer in the cache,
+	// we need to do nothing.
+	if (outq->cache != NULL && outq->cache->allocated == size)
+		return LZMA_OK;
+
+	if (size > SIZE_MAX - sizeof(lzma_outbuf))
+		return LZMA_MEM_ERROR;
+
+	const size_t alloc_size = lzma_outq_outbuf_memusage(size);
+
+	// The cache may have buffers but their size is wrong.
+	lzma_outq_clear_cache(outq, allocator);
+
+	outq->cache = lzma_alloc(alloc_size, allocator);
+	if (outq->cache == NULL)
+		return LZMA_MEM_ERROR;
+
+	outq->cache->next = NULL;
+	outq->cache->allocated = size;
+
+	++outq->bufs_allocated;
+	outq->mem_allocated += alloc_size;
+
+	return LZMA_OK;
+}
+
+
+extern lzma_outbuf *
+lzma_outq_get_buf(lzma_outq *outq, void *worker)
+{
+	// Caller must have used lzma_outq_prealloc_buf() to ensure these.
+	assert(outq->bufs_in_use < outq->bufs_limit);
+	assert(outq->bufs_in_use < outq->bufs_allocated);
+	assert(outq->cache != NULL);
+
+	lzma_outbuf *buf = outq->cache;
+	outq->cache = buf->next;
+	buf->next = NULL;
+
+	if (outq->tail != NULL) {
+		assert(outq->head != NULL);
+		outq->tail->next = buf;
+	} else {
+		assert(outq->head == NULL);
+		outq->head = buf;
+	}
+
+	outq->tail = buf;
+
+	buf->worker = worker;
 	buf->finished = false;
+	buf->finish_ret = LZMA_STREAM_END;
+	buf->pos = 0;
+	buf->decoder_in_pos = 0;

-	// Update the queue state.
-	if (++outq->bufs_pos == outq->bufs_allocated)
-		outq->bufs_pos = 0;
+	buf->unpadded_size = 0;
+	buf->uncompressed_size = 0;

-	++outq->bufs_used;
+	++outq->bufs_in_use;
+	outq->mem_in_use += lzma_outq_outbuf_memusage(buf->allocated);

 	return buf;
 }
@ -135,50 +220,68 @@ lzma_outq_get_buf(lzma_outq *outq)
 extern bool
 lzma_outq_is_readable(const lzma_outq *outq)
 {
-	uint32_t i = outq->bufs_pos - outq->bufs_used;
-	if (outq->bufs_pos < outq->bufs_used)
-		i += outq->bufs_allocated;
+	if (outq->head == NULL)
+		return false;

-	return outq->bufs[i].finished;
+	return outq->read_pos < outq->head->pos || outq->head->finished;
 }


 extern lzma_ret
-lzma_outq_read(lzma_outq *restrict outq, uint8_t *restrict out,
-		size_t *restrict out_pos, size_t out_size,
+lzma_outq_read(lzma_outq *restrict outq,
+		const lzma_allocator *restrict allocator,
+		uint8_t *restrict out, size_t *restrict out_pos,
+		size_t out_size,
 		lzma_vli *restrict unpadded_size,
 		lzma_vli *restrict uncompressed_size)
 {
 	// There must be at least one buffer from which to read.
-	if (outq->bufs_used == 0)
+	if (outq->bufs_in_use == 0)
 		return LZMA_OK;

 	// Get the buffer.
-	uint32_t i = outq->bufs_pos - outq->bufs_used;
-	if (outq->bufs_pos < outq->bufs_used)
-		i += outq->bufs_allocated;
-
-	lzma_outbuf *buf = &outq->bufs[i];
-
-	// If it isn't finished yet, we cannot read from it.
-	if (!buf->finished)
-		return LZMA_OK;
+	lzma_outbuf *buf = outq->head;

 	// Copy from the buffer to output.
-	lzma_bufcpy(buf->buf, &outq->read_pos, buf->size,
+	//
+	// FIXME? In threaded decoder it may be bad to do this copy while
+	// the mutex is being held.
+	lzma_bufcpy(buf->buf, &outq->read_pos, buf->pos,
 			out, out_pos, out_size);

 	// Return if we didn't get all the data from the buffer.
-	if (outq->read_pos < buf->size)
+	if (!buf->finished || outq->read_pos < buf->pos)
 		return LZMA_OK;

 	// The buffer was finished. Tell the caller its size information.
-	*unpadded_size = buf->unpadded_size;
-	*uncompressed_size = buf->uncompressed_size;
+	if (unpadded_size != NULL)
+		*unpadded_size = buf->unpadded_size;
+
+	if (uncompressed_size != NULL)
+		*uncompressed_size = buf->uncompressed_size;
+
+	// Remember the return value.
+	const lzma_ret finish_ret = buf->finish_ret;

 	// Free this buffer for further use.
-	--outq->bufs_used;
+	move_head_to_cache(outq, allocator);
 	outq->read_pos = 0;

-	return LZMA_STREAM_END;
+	return finish_ret;
+}
+
+
+extern void
+lzma_outq_enable_partial_output(lzma_outq *outq,
+		void (*enable_partial_output)(void *worker))
+{
+	if (outq->head != NULL && !outq->head->finished
+			&& outq->head->worker != NULL) {
+		enable_partial_output(outq->head->worker);
+
+		// Set it to NULL since calling it twice is pointless.
+		outq->head->worker = NULL;
+	}
+
+	return;
 }
--- a/src/liblzma/common/outqueue.h
+++ b/src/liblzma/common/outqueue.h
@ -14,16 +14,36 @@


 /// Output buffer for a single thread
-typedef struct {
-	/// Pointer to the output buffer of lzma_outq.buf_size_max bytes
-	uint8_t *buf;
+typedef struct lzma_outbuf_s lzma_outbuf;
+struct lzma_outbuf_s {
+	/// Pointer to the next buffer. This is used for the cached buffers.
+	/// The worker thread must not modify this.
+	lzma_outbuf *next;

-	/// Amount of data written to buf
-	size_t size;
+	/// This initialized by lzma_outq_get_buf() and
+	/// is used by lzma_outq_enable_partial_output().
+	/// The worker thread must not modify this.
+	void *worker;

-	/// Additional size information
-	lzma_vli unpadded_size;
-	lzma_vli uncompressed_size;
+	/// Amount of memory allocated for buf[].
+	/// The worker thread must not modify this.
+	size_t allocated;
+
+	/// Writing position in the worker thread or, in other words, the
+	/// amount of finished data written to buf[] which can be copied out
+	///
+	/// \note       This is read by another thread and thus access
+	///             to this variable needs a mutex.
+	size_t pos;
+
+	/// Decompression: Position in the input buffer in the worker thread
+	/// that matches the output "pos" above. This is used to detect if
+	/// more output might be possible from the worker thread: if it has
+	/// consumed all its input, then more output isn't possible.
+	///
+	/// \note       This is read by another thread and thus access
+	///             to this variable needs a mutex.
+	size_t decoder_in_pos;

 	/// True when no more data will be written into this buffer.
 	///
@ -31,32 +51,55 @@ typedef struct {
 	///             to this variable needs a mutex.
 	bool finished;

-} lzma_outbuf;
+	/// Return value for lzma_outq_read() when the last byte from
+	/// a finished buffer has been read. Defaults to LZMA_STREAM_END.
+	/// This must *not* be LZMA_OK. The idea is to allow a decoder to
+	/// pass an error code to the main thread, setting the code here
+	/// together with finished = true.
+	lzma_ret finish_ret;
+
+	/// Additional size information. lzma_outq_read() may read these
+	/// when "finished" is true.
+	lzma_vli unpadded_size;
+	lzma_vli uncompressed_size;
+
+	/// Buffer of "allocated" bytes
+	uint8_t buf[];
+};


 typedef struct {
-	/// Array of buffers that are used cyclically.
-	lzma_outbuf *bufs;
+	/// Linked list of buffers in use. The next output byte will be
+	/// read from the head and buffers for the next thread will be
+	/// appended to the tail. tail->next is always NULL.
+	lzma_outbuf *head;
+	lzma_outbuf *tail;

-	/// Memory allocated for all the buffers
-	uint8_t *bufs_mem;
-
-	/// Amount of buffer space available in each buffer
-	size_t buf_size_max;
-
-	/// Number of buffers allocated
-	uint32_t bufs_allocated;
-
-	/// Position in the bufs array. The next buffer to be taken
-	/// into use is bufs[bufs_pos].
-	uint32_t bufs_pos;
-
-	/// Number of buffers in use
-	uint32_t bufs_used;
-
-	/// Position in the buffer in lzma_outq_read()
+	/// Number of bytes read from head->buf[] in lzma_outq_read()
 	size_t read_pos;

+	/// Linked list of allocated buffers that aren't currently used.
+	/// This way buffers of similar size can be reused and don't
+	/// need to be reallocated every time. For simplicity, all
+	/// cached buffers in the list have the same allocated size.
+	lzma_outbuf *cache;
+
+	/// Total amount of memory allocated for buffers
+	uint64_t mem_allocated;
+
+	/// Amount of memory used by the buffers that are in use in
+	/// the head...tail linked list.
+	uint64_t mem_in_use;
+
+	/// Number of buffers in use in the head...tail list. If and only if
+	/// this is zero, the pointers head and tail above are NULL.
+	uint32_t bufs_in_use;
+
+	/// Number of buffers allocated (in use + cached)
+	uint32_t bufs_allocated;
+
+	/// Maximum allowed number of allocated buffers
+	uint32_t bufs_limit;
 } lzma_outq;


@ -76,32 +119,60 @@ extern uint64_t lzma_outq_memusage(uint64_t buf_size_max, uint32_t threads);
 ///                             function knows that there are no previous
 ///                             allocations to free.
 /// \param      allocator       Pointer to allocator or NULL
-/// \param      buf_size_max    Maximum amount of data that a single buffer
-///                             in the queue may need to store.
 /// \param      threads         Number of buffers that may be in use
 ///                             concurrently. Note that more than this number
-///                             of buffers will actually get allocated to
+///                             of buffers may actually get allocated to
 ///                             improve performance when buffers finish
-///                             out of order.
+///                             out of order. The actual maximum number of
+///                             allocated buffers is derived from the number
+///                             of threads.
 ///
 /// \return     - LZMA_OK
 ///             - LZMA_MEM_ERROR
 ///
-extern lzma_ret lzma_outq_init(
-		lzma_outq *outq, const lzma_allocator *allocator,
-		uint64_t buf_size_max, uint32_t threads);
+extern lzma_ret lzma_outq_init(lzma_outq *outq,
+		const lzma_allocator *allocator, uint32_t threads);


 /// \brief      Free the memory associated with the output queue
 extern void lzma_outq_end(lzma_outq *outq, const lzma_allocator *allocator);


+/// \brief      Free all cached buffers that consume memory but aren't in use
+extern void lzma_outq_clear_cache(
+		lzma_outq *outq, const lzma_allocator *allocator);
+
+
+/// \brief      Like lzma_outq_clear_cache() but might keep one buffer
+///
+/// One buffer is not freed if its size is equal to keep_size.
+/// This is useful if the caller knows that it will soon need a buffer of
+/// keep_size bytes. This way it won't be freed and immediately reallocated.
+extern void lzma_outq_clear_cache2(
+		lzma_outq *outq, const lzma_allocator *allocator,
+		size_t keep_size);
+
+
+/// \brief      Preallocate a new buffer into cache
+///
+/// Splitting the buffer allocation into a separate function makes it
+/// possible to ensure that way lzma_outq_get_buf() cannot fail.
+/// If the preallocated buffer isn't actually used (for example, some
+/// other error occurs), the caller has to do nothing as the buffer will
+/// be used later or cleared from the cache when not needed.
+///
+/// \return     LZMA_OK on success, LZMA_MEM_ERROR if allocation fails
+///
+extern lzma_ret lzma_outq_prealloc_buf(
+		lzma_outq *outq, const lzma_allocator *allocator, size_t size);
+
+
 /// \brief      Get a new buffer
 ///
-/// lzma_outq_has_buf() must be used to check that there is a buffer
+/// lzma_outq_prealloc_buf() must be used to ensure that there is a buffer
 /// available before calling lzma_outq_get_buf().
 ///
-extern lzma_outbuf *lzma_outq_get_buf(lzma_outq *outq);
+extern lzma_outbuf *lzma_outq_get_buf(lzma_outq *outq, void *worker);


 /// \brief      Test if there is data ready to be read
@ -126,17 +197,32 @@ extern bool lzma_outq_is_readable(const lzma_outq *outq);
 /// \return     - LZMA: All OK. Either no data was available or the buffer
 ///               being read didn't become empty yet.
 ///             - LZMA_STREAM_END: The buffer being read was finished.
-///               *unpadded_size and *uncompressed_size were set.
+///               *unpadded_size and *uncompressed_size were set if they
+///               were not NULL.
 ///
-/// \note       This reads lzma_outbuf.finished variables and thus call
-///             to this function needs to be protected with a mutex.
+/// \note       This reads lzma_outbuf.finished and .pos variables and thus
+///             calls to this function need to be protected with a mutex.
 ///
 extern lzma_ret lzma_outq_read(lzma_outq *restrict outq,
+		const lzma_allocator *restrict allocator,
 		uint8_t *restrict out, size_t *restrict out_pos,
 		size_t out_size, lzma_vli *restrict unpadded_size,
 		lzma_vli *restrict uncompressed_size);


+/// \brief      Enable partial output from a worker thread
+///
+/// If the buffer at the head of the output queue isn't finished,
+/// this will call enable_partial_output on the worker associated with
+/// that output buffer.
+///
+/// \note       This reads a lzma_outbuf.finished variable and thus
+///             calls to this function need to be protected with a mutex.
+///
+extern void lzma_outq_enable_partial_output(lzma_outq *outq,
+		void (*enable_partial_output)(void *worker));
+
+
 /// \brief      Test if there is at least one buffer free
 ///
 /// This must be used before getting a new buffer with lzma_outq_get_buf().
@ -144,7 +230,7 @@ extern lzma_ret lzma_outq_read(lzma_outq *restrict outq,
 static inline bool
 lzma_outq_has_buf(const lzma_outq *outq)
 {
-	return outq->bufs_used < outq->bufs_allocated;
+	return outq->bufs_in_use < outq->bufs_limit;
 }


@ -152,5 +238,17 @@ lzma_outq_has_buf(const lzma_outq *outq)
 static inline bool
 lzma_outq_is_empty(const lzma_outq *outq)
 {
-	return outq->bufs_used == 0;
+	return outq->bufs_in_use == 0;
+}
+
+
+/// \brief      Get the amount of memory needed for a single lzma_outbuf
+///
+/// \note       Caller must check that the argument is significantly less
+///             than SIZE_MAX to avoid an integer overflow!
+static inline uint64_t
+lzma_outq_outbuf_memusage(size_t buf_size)
+{
+	assert(buf_size <= SIZE_MAX - sizeof(lzma_outbuf));
+	return sizeof(lzma_outbuf) + buf_size;
 }
--- a/src/liblzma/common/stream_decoder.c
+++ b/src/liblzma/common/stream_decoder.c
@ -243,9 +243,7 @@ stream_decode(void *coder_ptr, const lzma_allocator *allocator,

 		// Free the allocated filter options since they are needed
 		// only to initialize the Block decoder.
-		for (size_t i = 0; i < LZMA_FILTERS_MAX; ++i)
-			lzma_free(filters[i].options, allocator);
-
+		lzma_filters_free(filters, allocator);
 		coder->block_options.filters = NULL;

 		// Check if memory usage calculation and Block decoder
--- a/src/liblzma/common/stream_decoder_mt.c
+++ b/src/liblzma/common/stream_decoder_mt.c
--- a/src/liblzma/common/stream_encoder.c
+++ b/src/liblzma/common/stream_encoder.c
@ -219,8 +219,7 @@ stream_encoder_end(void *coder_ptr, const lzma_allocator *allocator)
 	lzma_next_end(&coder->index_encoder, allocator);
 	lzma_index_end(coder->index, allocator);

-	for (size_t i = 0; coder->filters[i].id != LZMA_VLI_UNKNOWN; ++i)
-		lzma_free(coder->filters[i].options, allocator);
+	lzma_filters_free(coder->filters, allocator);

 	lzma_free(coder, allocator);
 	return;
@ -271,22 +270,15 @@ stream_encoder_update(void *coder_ptr, const lzma_allocator *allocator,
 	}

 	// Free the options of the old chain.
-	for (size_t i = 0; coder->filters[i].id != LZMA_VLI_UNKNOWN; ++i)
-		lzma_free(coder->filters[i].options, allocator);
+	lzma_filters_free(coder->filters, allocator);

 	// Copy the new filter chain in place.
-	size_t j = 0;
-	do {
-		coder->filters[j].id = temp[j].id;
-		coder->filters[j].options = temp[j].options;
-	} while (temp[j++].id != LZMA_VLI_UNKNOWN);
+	memcpy(coder->filters, temp, sizeof(temp));

 	return LZMA_OK;

 error:
-	for (size_t i = 0; temp[i].id != LZMA_VLI_UNKNOWN; ++i)
-		lzma_free(temp[i].options, allocator);
-
+	lzma_filters_free(temp, allocator);
 	return ret;
 }

--- a/src/liblzma/common/stream_encoder_mt.c
+++ b/src/liblzma/common/stream_encoder_mt.c
@ -85,6 +85,11 @@ struct worker_thread_s {
 	/// Compression options for this Block
 	lzma_block block_options;

+	/// Filter chain for this thread. By copying the filters array
+	/// to each thread it is possible to change the filter chain
+	/// between Blocks using lzma_filters_update().
+	lzma_filter filters[LZMA_FILTERS_MAX + 1];
+
 	/// Next structure in the stack of free worker threads.
 	worker_thread *next;

@ -109,9 +114,22 @@ struct lzma_stream_coder_s {
 	/// LZMA_FULL_FLUSH or LZMA_FULL_BARRIER is used earlier.
 	size_t block_size;

-	/// The filter chain currently in use
+	/// The filter chain to use for the next Block.
+	/// This can be updated using lzma_filters_update()
+	/// after LZMA_FULL_BARRIER or LZMA_FULL_FLUSH.
 	lzma_filter filters[LZMA_FILTERS_MAX + 1];

+	/// A copy of filters[] will be put here when attempting to get
+	/// a new worker thread. This will be copied to a worker thread
+	/// when a thread becomes free and then this cache is marked as
+	/// empty by setting [0].id = LZMA_VLI_UNKNOWN. Without this cache
+	/// the filter options from filters[] would get uselessly copied
+	/// multiple times (allocated and freed) when waiting for a new free
+	/// worker thread.
+	///
+	/// This is freed if filters[] is updated via lzma_filters_update().
+	lzma_filter filters_cache[LZMA_FILTERS_MAX + 1];
+

 	/// Index to hold sizes of the Blocks
 	lzma_index *index;
@ -133,6 +151,9 @@ struct lzma_stream_coder_s {
 	/// Output buffer queue for compressed data
 	lzma_outq outq;

+	/// How much memory to allocate for each lzma_outbuf.buf
+	size_t outbuf_alloc_size;
+

 	/// Maximum wait time if cannot use all the input and cannot
 	/// fill the output buffer. This is in milliseconds.
@ -196,7 +217,7 @@ worker_error(worker_thread *thr, lzma_ret ret)


 static worker_state
-worker_encode(worker_thread *thr, worker_state state)
+worker_encode(worker_thread *thr, size_t *out_pos, worker_state state)
 {
 	assert(thr->progress_in == 0);
 	assert(thr->progress_out == 0);
@ -205,12 +226,9 @@ worker_encode(worker_thread *thr, worker_state state)
 	thr->block_options = (lzma_block){
 		.version = 0,
 		.check = thr->coder->stream_flags.check,
-		.compressed_size = thr->coder->outq.buf_size_max,
+		.compressed_size = thr->outbuf->allocated,
 		.uncompressed_size = thr->coder->block_size,
-
-		// TODO: To allow changing the filter chain, the filters
-		// array must be copied to each worker_thread.
-		.filters = thr->coder->filters,
+		.filters = thr->filters,
 	};

 	// Calculate maximum size of the Block Header. This amount is
@ -234,12 +252,12 @@ worker_encode(worker_thread *thr, worker_state state)
 	size_t in_pos = 0;
 	size_t in_size = 0;

-	thr->outbuf->size = thr->block_options.header_size;
-	const size_t out_size = thr->coder->outq.buf_size_max;
+	*out_pos = thr->block_options.header_size;
+	const size_t out_size = thr->outbuf->allocated;

 	do {
 		mythread_sync(thr->mutex) {
-			// Store in_pos and out_pos into *thr so that
+			// Store in_pos and *out_pos into *thr so that
 			// an application may read them via
 			// lzma_get_progress() to get progress information.
 			//
@ -247,7 +265,7 @@ worker_encode(worker_thread *thr, worker_state state)
 			// finishes. Instead, the final values are taken
 			// later from thr->outbuf.
 			thr->progress_in = in_pos;
-			thr->progress_out = thr->outbuf->size;
+			thr->progress_out = *out_pos;

 			while (in_size == thr->in_size
 					&& thr->state == THR_RUN)
@ -277,8 +295,8 @@ worker_encode(worker_thread *thr, worker_state state)
 		ret = thr->block_encoder.code(
 				thr->block_encoder.coder, thr->allocator,
 				thr->in, &in_pos, in_limit, thr->outbuf->buf,
-				&thr->outbuf->size, out_size, action);
-	} while (ret == LZMA_OK && thr->outbuf->size < out_size);
+				out_pos, out_size, action);
+	} while (ret == LZMA_OK && *out_pos < out_size);

 	switch (ret) {
 	case LZMA_STREAM_END:
@ -313,10 +331,10 @@ worker_encode(worker_thread *thr, worker_state state)
 			return state;

 		// Do the encoding. This takes care of the Block Header too.
-		thr->outbuf->size = 0;
+		*out_pos = 0;
 		ret = lzma_block_uncomp_encode(&thr->block_options,
 				thr->in, in_size, thr->outbuf->buf,
-				&thr->outbuf->size, out_size);
+				out_pos, out_size);

 		// It shouldn't fail.
 		if (ret != LZMA_OK) {
@ -367,11 +385,13 @@ worker_start(void *thr_ptr)
 			}
 		}

+		size_t out_pos = 0;
+
 		assert(state != THR_IDLE);
 		assert(state != THR_STOP);

 		if (state <= THR_FINISH)
-			state = worker_encode(thr, state);
+			state = worker_encode(thr, &out_pos, state);

 		if (state == THR_EXIT)
 			break;
@ -387,14 +407,17 @@ worker_start(void *thr_ptr)
 		}

 		mythread_sync(thr->coder->mutex) {
-			// Mark the output buffer as finished if
-			// no errors occurred.
-			thr->outbuf->finished = state == THR_FINISH;
+			// If no errors occurred, make the encoded data
+			// available to be copied out.
+			if (state == THR_FINISH) {
+				thr->outbuf->pos = out_pos;
+				thr->outbuf->finished = true;
+			}

 			// Update the main progress info.
 			thr->coder->progress_in
 					+= thr->outbuf->uncompressed_size;
-			thr->coder->progress_out += thr->outbuf->size;
+			thr->coder->progress_out += out_pos;
 			thr->progress_in = 0;
 			thr->progress_out = 0;

@ -407,6 +430,8 @@ worker_start(void *thr_ptr)
 	}

 	// Exiting, free the resources.
+	lzma_filters_free(thr->filters, thr->allocator);
+
 	mythread_mutex_destroy(&thr->mutex);
 	mythread_cond_destroy(&thr->cond);

@ -490,6 +515,7 @@ initialize_new_thread(lzma_stream_coder *coder,
 	thr->progress_in = 0;
 	thr->progress_out = 0;
 	thr->block_encoder = LZMA_NEXT_CODER_INIT;
+	thr->filters[0].id = LZMA_VLI_UNKNOWN;

 	if (mythread_create(&thr->thread_id, &worker_start, thr))
 		goto error_thread;
@ -519,6 +545,18 @@ get_thread(lzma_stream_coder *coder, const lzma_allocator *allocator)
 	if (!lzma_outq_has_buf(&coder->outq))
 		return LZMA_OK;

+	// That's also true if we cannot allocate memory for the output
+	// buffer in the output queue.
+	return_if_error(lzma_outq_prealloc_buf(&coder->outq, allocator,
+			coder->outbuf_alloc_size));
+
+	// Make a thread-specific copy of the filter chain. Put it in
+	// the cache array first so that if we cannot get a new thread yet,
+	// the allocation is ready when we try again.
+	if (coder->filters_cache[0].id == LZMA_VLI_UNKNOWN)
+		return_if_error(lzma_filters_copy(
+			coder->filters, coder->filters_cache, allocator));
+
 	// If there is a free structure on the stack, use it.
 	mythread_sync(coder->mutex) {
 		if (coder->threads_free != NULL) {
@ -541,7 +579,16 @@ get_thread(lzma_stream_coder *coder, const lzma_allocator *allocator)
 	mythread_sync(coder->thr->mutex) {
 		coder->thr->state = THR_RUN;
 		coder->thr->in_size = 0;
-		coder->thr->outbuf = lzma_outq_get_buf(&coder->outq);
+		coder->thr->outbuf = lzma_outq_get_buf(&coder->outq, NULL);
+
+		// Free the old thread-specific filter options and replace
+		// them with the already-allocated new options from
+		// coder->filters_cache[]. Then mark the cache as empty.
+		lzma_filters_free(coder->thr->filters, allocator);
+		memcpy(coder->thr->filters, coder->filters_cache,
+				sizeof(coder->filters_cache));
+		coder->filters_cache[0].id = LZMA_VLI_UNKNOWN;
+
 		mythread_cond_signal(&coder->thr->cond);
 	}

@ -627,9 +674,13 @@ wait_for_work(lzma_stream_coder *coder, mythread_condtime *wait_abs,
 		// to true here and calculate the absolute time when
 		// we must return if there's nothing to do.
 		//
-		// The idea of *has_blocked is to avoid unneeded calls
-		// to mythread_condtime_set(), which may do a syscall
-		// depending on the operating system.
+		// This way if we block multiple times for short moments
+		// less than "timeout" milliseconds, we will return once
+		// "timeout" amount of time has passed since the *first*
+		// blocking occurred. If the absolute time was calculated
+		// again every time we block, "timeout" would effectively
+		// be meaningless if we never consecutively block longer
+		// than "timeout" ms.
 		*has_blocked = true;
 		mythread_condtime_set(wait_abs, &coder->cond, coder->timeout);
 	}
@ -704,7 +755,7 @@ stream_encode_mt(void *coder_ptr, const lzma_allocator *allocator,
 				}

 				// Try to read compressed data to out[].
-				ret = lzma_outq_read(&coder->outq,
+				ret = lzma_outq_read(&coder->outq, allocator,
 						out, out_pos, out_size,
 						&unpadded_size,
 						&uncompressed_size);
@ -849,8 +900,8 @@ stream_encoder_mt_end(void *coder_ptr, const lzma_allocator *allocator)
 	threads_end(coder, allocator);
 	lzma_outq_end(&coder->outq, allocator);

-	for (size_t i = 0; coder->filters[i].id != LZMA_VLI_UNKNOWN; ++i)
-		lzma_free(coder->filters[i].options, allocator);
+	lzma_filters_free(coder->filters, allocator);
+	lzma_filters_free(coder->filters_cache, allocator);

 	lzma_next_end(&coder->index_encoder, allocator);
 	lzma_index_end(coder->index, allocator);
@ -863,6 +914,45 @@ stream_encoder_mt_end(void *coder_ptr, const lzma_allocator *allocator)
 }


+static lzma_ret
+stream_encoder_mt_update(void *coder_ptr, const lzma_allocator *allocator,
+		const lzma_filter *filters,
+		const lzma_filter *reversed_filters
+			lzma_attribute((__unused__)))
+{
+	lzma_stream_coder *coder = coder_ptr;
+
+	// Applications shouldn't attempt to change the options when
+	// we are already encoding the Index or Stream Footer.
+	if (coder->sequence > SEQ_BLOCK)
+		return LZMA_PROG_ERROR;
+
+	// For now the threaded encoder doesn't support changing
+	// the options in the middle of a Block.
+	if (coder->thr != NULL)
+		return LZMA_PROG_ERROR;
+
+	// Check if the filter chain seems mostly valid. See the comment
+	// in stream_encoder_mt_init().
+	if (lzma_raw_encoder_memusage(filters) == UINT64_MAX)
+		return LZMA_OPTIONS_ERROR;
+
+	// Make a copy to a temporary buffer first. This way the encoder
+	// state stays unchanged if an error occurs in lzma_filters_copy().
+	lzma_filter temp[LZMA_FILTERS_MAX + 1];
+	return_if_error(lzma_filters_copy(filters, temp, allocator));
+
+	// Free the options of the old chain as well as the cache.
+	lzma_filters_free(coder->filters, allocator);
+	lzma_filters_free(coder->filters_cache, allocator);
+
+	// Copy the new filter chain in place.
+	memcpy(coder->filters, temp, sizeof(temp));
+
+	return LZMA_OK;
+}
+
+
 /// Options handling for lzma_stream_encoder_mt_init() and
 /// lzma_stream_encoder_mt_memusage()
 static lzma_ret
@ -954,14 +1044,16 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,
 			&block_size, &outbuf_size_max));

 #if SIZE_MAX < UINT64_MAX
-	if (block_size > SIZE_MAX)
+	if (block_size > SIZE_MAX || outbuf_size_max > SIZE_MAX)
 		return LZMA_MEM_ERROR;
 #endif

 	// Validate the filter chain so that we can give an error in this
 	// function instead of delaying it to the first call to lzma_code().
 	// The memory usage calculation verifies the filter chain as
-	// a side effect so we take advantage of that.
+	// a side effect so we take advantage of that. It's not a perfect
+	// check though as raw encoder allows LZMA1 too but such problems
+	// will be caught eventually with Block Header encoder.
 	if (lzma_raw_encoder_memusage(filters) == UINT64_MAX)
 		return LZMA_OPTIONS_ERROR;

@ -1001,9 +1093,10 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,
 		next->code = &stream_encode_mt;
 		next->end = &stream_encoder_mt_end;
 		next->get_progress = &get_progress;
-// 		next->update = &stream_encoder_mt_update;
+		next->update = &stream_encoder_mt_update;

 		coder->filters[0].id = LZMA_VLI_UNKNOWN;
+		coder->filters_cache[0].id = LZMA_VLI_UNKNOWN;
 		coder->index_encoder = LZMA_NEXT_CODER_INIT;
 		coder->index = NULL;
 		memzero(&coder->outq, sizeof(coder->outq));
@ -1015,6 +1108,7 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,
 	// Basic initializations
 	coder->sequence = SEQ_STREAM_HEADER;
 	coder->block_size = (size_t)(block_size);
+	coder->outbuf_alloc_size = (size_t)(outbuf_size_max);
 	coder->thread_error = LZMA_OK;
 	coder->thr = NULL;

@ -1044,19 +1138,16 @@ stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator,

 	// Output queue
 	return_if_error(lzma_outq_init(&coder->outq, allocator,
-			outbuf_size_max, options->threads));
+			options->threads));

 	// Timeout
 	coder->timeout = options->timeout;

-	// Free the old filter chain and copy the new one.
-	for (size_t i = 0; coder->filters[i].id != LZMA_VLI_UNKNOWN; ++i)
-		lzma_free(coder->filters[i].options, allocator);
-
-	// Mark it as empty so that it is in a safe state in case
-	// lzma_filters_copy() fails.
-	coder->filters[0].id = LZMA_VLI_UNKNOWN;
+	// Free the old filter chain and the cache.
+	lzma_filters_free(coder->filters, allocator);
+	lzma_filters_free(coder->filters_cache, allocator);

+	// Copy the new filter chain.
 	return_if_error(lzma_filters_copy(
 			filters, coder->filters, allocator));

--- a/src/liblzma/common/stream_flags_decoder.c
+++ b/src/liblzma/common/stream_flags_decoder.c
@ -39,8 +39,11 @@ lzma_stream_header_decode(lzma_stream_flags *options, const uint8_t *in)
 	const uint32_t crc = lzma_crc32(in + sizeof(lzma_header_magic),
 			LZMA_STREAM_FLAGS_SIZE, 0);
 	if (crc != read32le(in + sizeof(lzma_header_magic)
-			+ LZMA_STREAM_FLAGS_SIZE))
+			+ LZMA_STREAM_FLAGS_SIZE)) {
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 		return LZMA_DATA_ERROR;
+#endif
+	}

 	// Stream Flags
 	if (stream_flags_decode(options, in + sizeof(lzma_header_magic)))
@ -67,8 +70,11 @@ lzma_stream_footer_decode(lzma_stream_flags *options, const uint8_t *in)
 	// CRC32
 	const uint32_t crc = lzma_crc32(in + sizeof(uint32_t),
 			sizeof(uint32_t) + LZMA_STREAM_FLAGS_SIZE, 0);
-	if (crc != read32le(in))
+	if (crc != read32le(in)) {
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 		return LZMA_DATA_ERROR;
+#endif
+	}

 	// Stream Flags
 	if (stream_flags_decode(options, in + sizeof(uint32_t) * 2))
--- a/src/liblzma/common/string_conversion.c
+++ b/src/liblzma/common/string_conversion.c
--- a/src/liblzma/liblzma_generic.map
+++ b/src/liblzma/liblzma_generic.map
@ -106,3 +106,16 @@ global:
 	lzma_stream_encoder_mt;
 	lzma_stream_encoder_mt_memusage;
 } XZ_5.0;
+
+XZ_5.4 {
+global:
+	lzma_file_info_decoder;
+	lzma_filters_free;
+	lzma_lzip_decoder;
+	lzma_microlzma_decoder;
+	lzma_microlzma_encoder;
+	lzma_stream_decoder_mt;
+	lzma_str_from_filters;
+	lzma_str_list_filters;
+	lzma_str_to_filters;
+} XZ_5.2;
--- a/src/liblzma/liblzma_linux.map
+++ b/src/liblzma/liblzma_linux.map
@ -121,3 +121,16 @@ global:
 	lzma_stream_encoder_mt;
 	lzma_stream_encoder_mt_memusage;
 } XZ_5.1.2alpha;
+
+XZ_5.4 {
+global:
+	lzma_file_info_decoder;
+	lzma_filters_free;
+	lzma_lzip_decoder;
+	lzma_microlzma_decoder;
+	lzma_microlzma_encoder;
+	lzma_stream_decoder_mt;
+	lzma_str_from_filters;
+	lzma_str_list_filters;
+	lzma_str_to_filters;
+} XZ_5.2;
--- a/src/liblzma/lz/lz_decoder.c
+++ b/src/liblzma/lz/lz_decoder.c
@ -212,7 +212,8 @@ extern lzma_ret
 lzma_lz_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 		const lzma_filter_info *filters,
 		lzma_ret (*lz_init)(lzma_lz_decoder *lz,
-			const lzma_allocator *allocator, const void *options,
+			const lzma_allocator *allocator,
+			lzma_vli id, const void *options,
 			lzma_lz_options *lz_options))
 {
 	// Allocate the base structure if it isn't already allocated.
@ -236,7 +237,7 @@ lzma_lz_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 	// us the dictionary size.
 	lzma_lz_options lz_options;
 	return_if_error(lz_init(&coder->lz, allocator,
-			filters[0].options, &lz_options));
+			filters[0].id, filters[0].options, &lz_options));

 	// If the dictionary size is very small, increase it to 4096 bytes.
 	// This is to prevent constant wrapping of the dictionary, which
@ -301,17 +302,3 @@ lzma_lz_decoder_memusage(size_t dictionary_size)
 {
 	return sizeof(lzma_coder) + (uint64_t)(dictionary_size);
 }
-
-
-extern void
-lzma_lz_decoder_uncompressed(void *coder_ptr, lzma_vli uncompressed_size,
-		bool allow_eopm)
-{
-	lzma_coder *coder = coder_ptr;
-
-	if (uncompressed_size == LZMA_VLI_UNKNOWN)
-		allow_eopm = true;
-
-	coder->lz.set_uncompressed(coder->lz.coder, uncompressed_size,
-			allow_eopm);
-}
--- a/src/liblzma/lz/lz_decoder.h
+++ b/src/liblzma/lz/lz_decoder.h
@ -87,14 +87,12 @@ extern lzma_ret lzma_lz_decoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
 		const lzma_filter_info *filters,
 		lzma_ret (*lz_init)(lzma_lz_decoder *lz,
-			const lzma_allocator *allocator, const void *options,
+			const lzma_allocator *allocator,
+			lzma_vli id, const void *options,
 			lzma_lz_options *lz_options));

 extern uint64_t lzma_lz_decoder_memusage(size_t dictionary_size);

-extern void lzma_lz_decoder_uncompressed(
-		void *coder, lzma_vli uncompressed_size, bool allow_eopm);
-

 //////////////////////
 // Inline functions //
--- a/src/liblzma/lz/lz_encoder.c
+++ b/src/liblzma/lz/lz_encoder.c
@ -293,11 +293,15 @@ lz_encoder_prepare(lzma_mf *mf, const lzma_allocator *allocator,
 		return true;
 	}

-	// Calculate the sizes of mf->hash and mf->son and check that
-	// nice_len is big enough for the selected match finder.
-	const uint32_t hash_bytes = lz_options->match_finder & 0x0F;
-	if (hash_bytes > mf->nice_len)
-		return true;
+	// Calculate the sizes of mf->hash and mf->son.
+	//
+	// NOTE: Since 5.3.5beta the LZMA encoder ensures that nice_len
+	// is big enough for the selected match finder. This makes it
+	// easier for applications as nice_len = 2 will always be accepted
+	// even though the effective value can be slightly bigger.
+	const uint32_t hash_bytes
+			= mf_get_hash_bytes(lz_options->match_finder);
+	assert(hash_bytes <= mf->nice_len);

 	const bool is_bt = (lz_options->match_finder & 0x10) != 0;
 	uint32_t hs;
@ -521,14 +525,30 @@ lz_encoder_update(void *coder_ptr, const lzma_allocator *allocator,
 }


+static lzma_ret
+lz_encoder_set_out_limit(void *coder_ptr, uint64_t *uncomp_size,
+		uint64_t out_limit)
+{
+	lzma_coder *coder = coder_ptr;
+
+	// This is supported only if there are no other filters chained.
+	if (coder->next.code == NULL && coder->lz.set_out_limit != NULL)
+		return coder->lz.set_out_limit(
+				coder->lz.coder, uncomp_size, out_limit);
+
+	return LZMA_OPTIONS_ERROR;
+}
+
+
 extern lzma_ret
 lzma_lz_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 		const lzma_filter_info *filters,
 		lzma_ret (*lz_init)(lzma_lz_encoder *lz,
-			const lzma_allocator *allocator, const void *options,
+			const lzma_allocator *allocator,
+			lzma_vli id, const void *options,
 			lzma_lz_options *lz_options))
 {
-#ifdef HAVE_SMALL
+#if defined(HAVE_SMALL) && !defined(HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR)
 	// We need that the CRC32 table has been initialized.
 	lzma_crc32_init();
 #endif
@ -544,6 +564,7 @@ lzma_lz_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 		next->code = &lz_encode;
 		next->end = &lz_encoder_end;
 		next->update = &lz_encoder_update;
+		next->set_out_limit = &lz_encoder_set_out_limit;

 		coder->lz.coder = NULL;
 		coder->lz.code = NULL;
@ -565,7 +586,7 @@ lzma_lz_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 	// Initialize the LZ-based encoder.
 	lzma_lz_options lz_options;
 	return_if_error(lz_init(&coder->lz, allocator,
-			filters[0].options, &lz_options));
+			filters[0].id, filters[0].options, &lz_options));

 	// Setup the size information into coder->mf and deallocate
 	// old buffers if they have wrong size.
--- a/src/liblzma/lz/lz_encoder.h
+++ b/src/liblzma/lz/lz_encoder.h
@ -204,6 +204,10 @@ typedef struct {
 	/// Update the options in the middle of the encoding.
 	lzma_ret (*options_update)(void *coder, const lzma_filter *filter);

+	/// Set maximum allowed output size
+	lzma_ret (*set_out_limit)(void *coder, uint64_t *uncomp_size,
+			uint64_t out_limit);
+
 } lzma_lz_encoder;


@ -216,6 +220,15 @@ typedef struct {
 // are called `read ahead'.


+/// Get how many bytes the match finder hashes in its initial step.
+/// This is also the minimum nice_len value with the match finder.
+static inline uint32_t
+mf_get_hash_bytes(lzma_match_finder match_finder)
+{
+	return (uint32_t)match_finder & 0x0F;
+}
+
+
 /// Get pointer to the first byte not ran through the match finder
 static inline const uint8_t *
 mf_ptr(const lzma_mf *mf)
@ -298,7 +311,8 @@ extern lzma_ret lzma_lz_encoder_init(
 		lzma_next_coder *next, const lzma_allocator *allocator,
 		const lzma_filter_info *filters,
 		lzma_ret (*lz_init)(lzma_lz_encoder *lz,
-			const lzma_allocator *allocator, const void *options,
+			const lzma_allocator *allocator,
+			lzma_vli id, const void *options,
 			lzma_lz_options *lz_options));


--- a/src/liblzma/lzma/lzma2_decoder.c
+++ b/src/liblzma/lzma/lzma2_decoder.c
@ -226,7 +226,8 @@ lzma2_decoder_end(void *coder_ptr, const lzma_allocator *allocator)

 static lzma_ret
 lzma2_decoder_init(lzma_lz_decoder *lz, const lzma_allocator *allocator,
-		const void *opt, lzma_lz_options *lz_options)
+		lzma_vli id lzma_attribute((__unused__)), const void *opt,
+		lzma_lz_options *lz_options)
 {
 	lzma_lzma2_coder *coder = lz->coder;
 	if (coder == NULL) {
--- a/src/liblzma/lzma/lzma2_encoder.c
+++ b/src/liblzma/lzma/lzma2_encoder.c
@ -310,7 +310,8 @@ lzma2_encoder_options_update(void *coder_ptr, const lzma_filter *filter)

 static lzma_ret
 lzma2_encoder_init(lzma_lz_encoder *lz, const lzma_allocator *allocator,
-		const void *options, lzma_lz_options *lz_options)
+		lzma_vli id lzma_attribute((__unused__)), const void *options,
+		lzma_lz_options *lz_options)
 {
 	if (options == NULL)
 		return LZMA_PROG_ERROR;
@ -340,7 +341,7 @@ lzma2_encoder_init(lzma_lz_encoder *lz, const lzma_allocator *allocator,

 	// Initialize LZMA encoder
 	return_if_error(lzma_lzma_encoder_create(&coder->lzma, allocator,
-			&coder->opt_cur, lz_options));
+			LZMA_FILTER_LZMA2, &coder->opt_cur, lz_options));

 	// Make sure that we will always have enough history available in
 	// case we need to use uncompressed chunks. They are used when the
--- a/src/liblzma/lzma/lzma_decoder.c
+++ b/src/liblzma/lzma/lzma_decoder.c
@ -986,7 +986,7 @@ lzma_decoder_reset(void *coder_ptr, const void *opt)

 extern lzma_ret
 lzma_lzma_decoder_create(lzma_lz_decoder *lz, const lzma_allocator *allocator,
-		const void *opt, lzma_lz_options *lz_options)
+		const lzma_options_lzma *options, lzma_lz_options *lz_options)
 {
 	if (lz->coder == NULL) {
 		lz->coder = lzma_alloc(sizeof(lzma_lzma1_decoder), allocator);
@ -1000,7 +1000,6 @@ lzma_lzma_decoder_create(lzma_lz_decoder *lz, const lzma_allocator *allocator,

 	// All dictionary sizes are OK here. LZ decoder will take care of
 	// the special cases.
-	const lzma_options_lzma *options = opt;
 	lz_options->dict_size = options->dict_size;
 	lz_options->preset_dict = options->preset_dict;
 	lz_options->preset_dict_size = options->preset_dict_size;
@ -1014,16 +1013,40 @@ lzma_lzma_decoder_create(lzma_lz_decoder *lz, const lzma_allocator *allocator,
 /// the LZ initialization).
 static lzma_ret
 lzma_decoder_init(lzma_lz_decoder *lz, const lzma_allocator *allocator,
-		const void *options, lzma_lz_options *lz_options)
+		lzma_vli id, const void *options, lzma_lz_options *lz_options)
 {
 	if (!is_lclppb_valid(options))
 		return LZMA_PROG_ERROR;

+	lzma_vli uncomp_size = LZMA_VLI_UNKNOWN;
+	bool allow_eopm = true;
+
+	if (id == LZMA_FILTER_LZMA1EXT) {
+		const lzma_options_lzma *opt = options;
+
+		// Only one flag is supported.
+		if (opt->ext_flags & ~LZMA_LZMA1EXT_ALLOW_EOPM)
+			return LZMA_OPTIONS_ERROR;
+
+		// FIXME? Using lzma_vli instead of uint64_t is weird because
+		// this has nothing to do with .xz headers and variable-length
+		// integer encoding. On the other hand, using LZMA_VLI_UNKNOWN
+		// instead of UINT64_MAX is clearer when unknown size is
+		// meant. A problem with using lzma_vli is that now we
+		// allow > LZMA_VLI_MAX which is fine in this file but
+		// it's still confusing. Note that alone_decoder.c also
+		// allows > LZMA_VLI_MAX when setting uncompressed size.
+		uncomp_size = opt->ext_size_low
+				+ ((uint64_t)(opt->ext_size_high) << 32);
+		allow_eopm = (opt->ext_flags & LZMA_LZMA1EXT_ALLOW_EOPM) != 0
+				|| uncomp_size == LZMA_VLI_UNKNOWN;
+	}
+
 	return_if_error(lzma_lzma_decoder_create(
 			lz, allocator, options, lz_options));

 	lzma_decoder_reset(lz->coder, options);
-	lzma_decoder_uncompressed(lz->coder, LZMA_VLI_UNKNOWN, true);
+	lzma_decoder_uncompressed(lz->coder, uncomp_size, allow_eopm);

 	return LZMA_OK;
 }
--- a/src/liblzma/lzma/lzma_decoder.h
+++ b/src/liblzma/lzma/lzma_decoder.h
@ -42,7 +42,7 @@ extern bool lzma_lzma_lclppb_decode(
 /// LZMA2 decoders.
 extern lzma_ret lzma_lzma_decoder_create(
 		lzma_lz_decoder *lz, const lzma_allocator *allocator,
-		const void *opt, lzma_lz_options *lz_options);
+		const lzma_options_lzma *opt, lzma_lz_options *lz_options);

 /// Gets memory usage without validating lc/lp/pb. This is used by LZMA2
 /// decoder, because raw LZMA2 decoding doesn't need lc/lp/pb.
--- a/src/liblzma/lzma/lzma_encoder.c
+++ b/src/liblzma/lzma/lzma_encoder.c
@ -268,6 +268,7 @@ static bool
 encode_init(lzma_lzma1_encoder *coder, lzma_mf *mf)
 {
 	assert(mf_position(mf) == 0);
+	assert(coder->uncomp_size == 0);

 	if (mf->read_pos == mf->read_limit) {
 		if (mf->action == LZMA_RUN)
@ -283,6 +284,7 @@ encode_init(lzma_lzma1_encoder *coder, lzma_mf *mf)
 		mf->read_ahead = 0;
 		rc_bit(&coder->rc, &coder->is_match[0][0], 0);
 		rc_bittree(&coder->rc, coder->literal[0], 8, mf->buffer[0]);
+		++coder->uncomp_size;
 	}

 	// Initialization is done (except if empty file).
@ -317,21 +319,28 @@ lzma_lzma_encode(lzma_lzma1_encoder *restrict coder, lzma_mf *restrict mf,
 	if (!coder->is_initialized && !encode_init(coder, mf))
 		return LZMA_OK;

-	// Get the lowest bits of the uncompressed offset from the LZ layer.
-	uint32_t position = mf_position(mf);
+	// Encode pending output bytes from the range encoder.
+	// At the start of the stream, encode_init() encodes one literal.
+	// Later there can be pending output only with LZMA1 because LZMA2
+	// ensures that there is always enough output space. Thus when using
+	// LZMA2, rc_encode() calls in this function will always return false.
+	if (rc_encode(&coder->rc, out, out_pos, out_size)) {
+		// We don't get here with LZMA2.
+		assert(limit == UINT32_MAX);
+		return LZMA_OK;
+	}
+
+	// If the range encoder was flushed in an earlier call to this
+	// function but there wasn't enough output buffer space, those
+	// bytes would have now been encoded by the above rc_encode() call
+	// and the stream has now been finished. This can only happen with
+	// LZMA1 as LZMA2 always provides enough output buffer space.
+	if (coder->is_flushed) {
+		assert(limit == UINT32_MAX);
+		return LZMA_STREAM_END;
+	}

 	while (true) {
-		// Encode pending bits, if any. Calling this before encoding
-		// the next symbol is needed only with plain LZMA, since
-		// LZMA2 always provides big enough buffer to flush
-		// everything out from the range encoder. For the same reason,
-		// rc_encode() never returns true when this function is used
-		// as part of LZMA2 encoder.
-		if (rc_encode(&coder->rc, out, out_pos, out_size)) {
-			assert(limit == UINT32_MAX);
-			return LZMA_OK;
-		}
-
 		// With LZMA2 we need to take care that compressed size of
 		// a chunk doesn't get too big.
 		// FIXME? Check if this could be improved.
@ -365,37 +374,64 @@ lzma_lzma_encode(lzma_lzma1_encoder *restrict coder, lzma_mf *restrict mf,
 		if (coder->fast_mode)
 			lzma_lzma_optimum_fast(coder, mf, &back, &len);
 		else
-			lzma_lzma_optimum_normal(
-					coder, mf, &back, &len, position);
+			lzma_lzma_optimum_normal(coder, mf, &back, &len,
+					(uint32_t)(coder->uncomp_size));

-		encode_symbol(coder, mf, back, len, position);
+		encode_symbol(coder, mf, back, len,
+				(uint32_t)(coder->uncomp_size));

-		position += len;
-	}
+		// If output size limiting is active (out_limit != 0), check
+		// if encoding this LZMA symbol would make the output size
+		// exceed the specified limit.
+		if (coder->out_limit != 0 && rc_encode_dummy(
+				&coder->rc, coder->out_limit)) {
+			// The most recent LZMA symbol would make the output
+			// too big. Throw it away.
+			rc_forget(&coder->rc);

-	if (!coder->is_flushed) {
-		coder->is_flushed = true;
+			// FIXME: Tell the LZ layer to not read more input as
+			// it would be waste of time. This doesn't matter if
+			// output-size-limited encoding is done with a single
+			// call though.

-		// We don't support encoding plain LZMA streams without EOPM,
-		// and LZMA2 doesn't use EOPM at LZMA level.
-		if (limit == UINT32_MAX)
-			encode_eopm(coder, position);
+			break;
+		}

-		// Flush the remaining bytes from the range encoder.
-		rc_flush(&coder->rc);
+		// This symbol will be encoded so update the uncompressed size.
+		coder->uncomp_size += len;

-		// Copy the remaining bytes to the output buffer. If there
-		// isn't enough output space, we will copy out the remaining
-		// bytes on the next call to this function by using
-		// the rc_encode() call in the encoding loop above.
+		// Encode the LZMA symbol.
 		if (rc_encode(&coder->rc, out, out_pos, out_size)) {
+			// Once again, this can only happen with LZMA1.
 			assert(limit == UINT32_MAX);
 			return LZMA_OK;
 		}
 	}

-	// Make it ready for the next LZMA2 chunk.
-	coder->is_flushed = false;
+	// Make the uncompressed size available to the application.
+	if (coder->uncomp_size_ptr != NULL)
+		*coder->uncomp_size_ptr = coder->uncomp_size;
+
+	// LZMA2 doesn't use EOPM at LZMA level.
+	//
+	// Plain LZMA streams without EOPM aren't supported except when
+	// output size limiting is enabled.
+	if (coder->use_eopm)
+		encode_eopm(coder, (uint32_t)(coder->uncomp_size));
+
+	// Flush the remaining bytes from the range encoder.
+	rc_flush(&coder->rc);
+
+	// Copy the remaining bytes to the output buffer. If there
+	// isn't enough output space, we will copy out the remaining
+	// bytes on the next call to this function.
+	if (rc_encode(&coder->rc, out, out_pos, out_size)) {
+		// This cannot happen with LZMA2.
+		assert(limit == UINT32_MAX);
+
+		coder->is_flushed = true;
+		return LZMA_OK;
+	}

 	return LZMA_STREAM_END;
 }
@ -414,6 +450,23 @@ lzma_encode(void *coder, lzma_mf *restrict mf,
 }


+static lzma_ret
+lzma_lzma_set_out_limit(
+		void *coder_ptr, uint64_t *uncomp_size, uint64_t out_limit)
+{
+	// Minimum output size is 5 bytes but that cannot hold any output
+	// so we use 6 bytes.
+	if (out_limit < 6)
+		return LZMA_BUF_ERROR;
+
+	lzma_lzma1_encoder *coder = coder_ptr;
+	coder->out_limit = out_limit;
+	coder->uncomp_size_ptr = uncomp_size;
+	coder->use_eopm = false;
+	return LZMA_OK;
+}
+
+
 ////////////////////
 // Initialization //
 ////////////////////
@ -440,7 +493,8 @@ set_lz_options(lzma_lz_options *lz_options, const lzma_options_lzma *options)
 	lz_options->dict_size = options->dict_size;
 	lz_options->after_size = LOOP_INPUT_MAX;
 	lz_options->match_len_max = MATCH_LEN_MAX;
-	lz_options->nice_len = options->nice_len;
+	lz_options->nice_len = my_max(mf_get_hash_bytes(options->mf),
+				options->nice_len);
 	lz_options->match_finder = options->mf;
 	lz_options->depth = options->depth;
 	lz_options->preset_dict = options->preset_dict;
@ -546,10 +600,13 @@ lzma_lzma_encoder_reset(lzma_lzma1_encoder *coder,


 extern lzma_ret
-lzma_lzma_encoder_create(void **coder_ptr,
-		const lzma_allocator *allocator,
-		const lzma_options_lzma *options, lzma_lz_options *lz_options)
+lzma_lzma_encoder_create(void **coder_ptr, const lzma_allocator *allocator,
+		lzma_vli id, const lzma_options_lzma *options,
+		lzma_lz_options *lz_options)
 {
+	assert(id == LZMA_FILTER_LZMA1 || id == LZMA_FILTER_LZMA1EXT
+			|| id == LZMA_FILTER_LZMA2);
+
 	// Allocate lzma_lzma1_encoder if it wasn't already allocated.
 	if (*coder_ptr == NULL) {
 		*coder_ptr = lzma_alloc(sizeof(lzma_lzma1_encoder), allocator);
@ -591,10 +648,14 @@ lzma_lzma_encoder_create(void **coder_ptr,
 			coder->dist_table_size = log_size * 2;

 			// Length encoders' price table size
+			const uint32_t nice_len = my_max(
+					mf_get_hash_bytes(options->mf),
+					options->nice_len);
+
 			coder->match_len_encoder.table_size
-				= options->nice_len + 1 - MATCH_LEN_MIN;
+					= nice_len + 1 - MATCH_LEN_MIN;
 			coder->rep_len_encoder.table_size
-				= options->nice_len + 1 - MATCH_LEN_MIN;
+					= nice_len + 1 - MATCH_LEN_MIN;
 			break;
 		}

@ -609,6 +670,37 @@ lzma_lzma_encoder_create(void **coder_ptr,
 	coder->is_initialized = options->preset_dict != NULL
 			&& options->preset_dict_size > 0;
 	coder->is_flushed = false;
+	coder->uncomp_size = 0;
+	coder->uncomp_size_ptr = NULL;
+
+	// Output size limitting is disabled by default.
+	coder->out_limit = 0;
+
+	// Determine if end marker is wanted:
+	//   - It is never used with LZMA2.
+	//   - It is always used with LZMA_FILTER_LZMA1 (unless
+	//     lzma_lzma_set_out_limit() is called later).
+	//   - LZMA_FILTER_LZMA1EXT has a flag for it in the options.
+	coder->use_eopm = (id == LZMA_FILTER_LZMA1);
+	if (id == LZMA_FILTER_LZMA1EXT) {
+		// Check if unsupported flags are present.
+		if (options->ext_flags & ~LZMA_LZMA1EXT_ALLOW_EOPM)
+			return LZMA_OPTIONS_ERROR;
+
+		coder->use_eopm = (options->ext_flags
+				& LZMA_LZMA1EXT_ALLOW_EOPM) != 0;
+
+		// TODO? As long as there are no filters that change the size
+		// of the data, it is enough to look at lzma_stream.total_in
+		// after encoding has been finished to know the uncompressed
+		// size of the LZMA1 stream. But in the future there could be
+		// filters that change the size of the data and then total_in
+		// doesn't work as the LZMA1 stream size might be different
+		// due to another filter in the chain. The problem is simple
+		// to solve: Add another flag to ext_flags and then set
+		// coder->uncomp_size_ptr to the address stored in
+		// lzma_options_lzma.reserved_ptr2 (or _ptr1).
+	}

 	set_lz_options(lz_options, options);

@ -618,11 +710,12 @@ lzma_lzma_encoder_create(void **coder_ptr,

 static lzma_ret
 lzma_encoder_init(lzma_lz_encoder *lz, const lzma_allocator *allocator,
-		const void *options, lzma_lz_options *lz_options)
+		lzma_vli id, const void *options, lzma_lz_options *lz_options)
 {
 	lz->code = &lzma_encode;
+	lz->set_out_limit = &lzma_lzma_set_out_limit;
 	return lzma_lzma_encoder_create(
-			&lz->coder, allocator, options, lz_options);
+			&lz->coder, allocator, id, options, lz_options);
 }


--- a/src/liblzma/lzma/lzma_encoder.h
+++ b/src/liblzma/lzma/lzma_encoder.h
@ -40,7 +40,8 @@ extern bool lzma_lzma_lclppb_encode(
 /// Initializes raw LZMA encoder; this is used by LZMA2.
 extern lzma_ret lzma_lzma_encoder_create(
 		void **coder_ptr, const lzma_allocator *allocator,
-		const lzma_options_lzma *options, lzma_lz_options *lz_options);
+		lzma_vli id, const lzma_options_lzma *options,
+		lzma_lz_options *lz_options);


 /// Resets an already initialized LZMA encoder; this is used by LZMA2.
--- a/src/liblzma/lzma/lzma_encoder_private.h
+++ b/src/liblzma/lzma/lzma_encoder_private.h
@ -72,6 +72,18 @@ struct lzma_lzma1_encoder_s {
 	/// Range encoder
 	lzma_range_encoder rc;

+	/// Uncompressed size (doesn't include possible preset dictionary)
+	uint64_t uncomp_size;
+
+	/// If non-zero, produce at most this much output.
+	/// Some input may then be missing from the output.
+	uint64_t out_limit;
+
+	/// If the above out_limit is non-zero, *uncomp_size_ptr is set to
+	/// the amount of uncompressed data that we were able to fit
+	/// in the output buffer.
+	uint64_t *uncomp_size_ptr;
+
 	/// State
 	lzma_lzma_state state;

@ -99,6 +111,9 @@ struct lzma_lzma1_encoder_s {
 	/// have been written to the output buffer yet.
 	bool is_flushed;

+	/// True if end of payload marker will be written.
+	bool use_eopm;
+
 	uint32_t pos_mask;         ///< (1 << pos_bits) - 1
 	uint32_t literal_context_bits;
 	uint32_t literal_pos_mask;
--- a/src/liblzma/rangecoder/range_encoder.h
+++ b/src/liblzma/rangecoder/range_encoder.h
@ -19,9 +19,9 @@


 /// Maximum number of symbols that can be put pending into lzma_range_encoder
-/// structure between calls to lzma_rc_encode(). For LZMA, 52+5 is enough
+/// structure between calls to lzma_rc_encode(). For LZMA, 48+5 is enough
 /// (match with big distance and length followed by range encoder flush).
-#define RC_SYMBOLS_MAX 58
+#define RC_SYMBOLS_MAX 53


 typedef struct {
@ -30,6 +30,9 @@ typedef struct {
 	uint32_t range;
 	uint8_t cache;

+	/// Number of bytes written out by rc_encode() -> rc_shift_low()
+	uint64_t out_total;
+
 	/// Number of symbols in the tables
 	size_t count;

@ -58,11 +61,21 @@ rc_reset(lzma_range_encoder *rc)
 	rc->cache_size = 1;
 	rc->range = UINT32_MAX;
 	rc->cache = 0;
+	rc->out_total = 0;
 	rc->count = 0;
 	rc->pos = 0;
 }


+static inline void
+rc_forget(lzma_range_encoder *rc)
+{
+	// This must not be called when rc_encode() is partially done.
+	assert(rc->pos == 0);
+	rc->count = 0;
+}
+
+
 static inline void
 rc_bit(lzma_range_encoder *rc, probability *prob, uint32_t bit)
 {
@ -132,6 +145,7 @@ rc_shift_low(lzma_range_encoder *rc,

 			out[*out_pos] = rc->cache + (uint8_t)(rc->low >> 32);
 			++*out_pos;
+			++rc->out_total;
 			rc->cache = 0xFF;

 		} while (--rc->cache_size != 0);
@ -146,6 +160,34 @@ rc_shift_low(lzma_range_encoder *rc,
 }


+// NOTE: The last two arguments are uint64_t instead of size_t because in
+// the dummy version these refer to the size of the whole range-encoded
+// output stream, not just to the currently available output buffer space.
+static inline bool
+rc_shift_low_dummy(uint64_t *low, uint64_t *cache_size, uint8_t *cache,
+		uint64_t *out_pos, uint64_t out_size)
+{
+	if ((uint32_t)(*low) < (uint32_t)(0xFF000000)
+			|| (uint32_t)(*low >> 32) != 0) {
+		do {
+			if (*out_pos == out_size)
+				return true;
+
+			++*out_pos;
+			*cache = 0xFF;
+
+		} while (--*cache_size != 0);
+
+		*cache = (*low >> 24) & 0xFF;
+	}
+
+	++*cache_size;
+	*low = (*low & 0x00FFFFFF) << RC_SHIFT_BITS;
+
+	return false;
+}
+
+
 static inline bool
 rc_encode(lzma_range_encoder *rc,
 		uint8_t *out, size_t *out_pos, size_t out_size)
@ -222,6 +264,83 @@ rc_encode(lzma_range_encoder *rc,
 }


+static inline bool
+rc_encode_dummy(const lzma_range_encoder *rc, uint64_t out_limit)
+{
+	assert(rc->count <= RC_SYMBOLS_MAX);
+
+	uint64_t low = rc->low;
+	uint64_t cache_size = rc->cache_size;
+	uint32_t range = rc->range;
+	uint8_t cache = rc->cache;
+	uint64_t out_pos = rc->out_total;
+
+	size_t pos = rc->pos;
+
+	while (true) {
+		// Normalize
+		if (range < RC_TOP_VALUE) {
+			if (rc_shift_low_dummy(&low, &cache_size, &cache,
+					&out_pos, out_limit))
+				return true;
+
+			range <<= RC_SHIFT_BITS;
+		}
+
+		// This check is here because the normalization above must
+		// be done before flushing the last bytes.
+		if (pos == rc->count)
+			break;
+
+		// Encode a bit
+		switch (rc->symbols[pos]) {
+		case RC_BIT_0: {
+			probability prob = *rc->probs[pos];
+			range = (range >> RC_BIT_MODEL_TOTAL_BITS)
+					* prob;
+			break;
+		}
+
+		case RC_BIT_1: {
+			probability prob = *rc->probs[pos];
+			const uint32_t bound = prob * (range
+					>> RC_BIT_MODEL_TOTAL_BITS);
+			low += bound;
+			range -= bound;
+			break;
+		}
+
+		case RC_DIRECT_0:
+			range >>= 1;
+			break;
+
+		case RC_DIRECT_1:
+			range >>= 1;
+			low += range;
+			break;
+
+		case RC_FLUSH:
+		default:
+			assert(0);
+			break;
+		}
+
+		++pos;
+	}
+
+	// Flush the last bytes. This isn't in rc->symbols[] so we do
+	// it after the above loop to take into account the size of
+	// the flushing that will be done at the end of the stream.
+	for (pos = 0; pos < 5; ++pos) {
+		if (rc_shift_low_dummy(&low, &cache_size,
+				&cache, &out_pos, out_limit))
+			return true;
+	}
+
+	return false;
+}
+
+
 static inline uint64_t
 rc_pending(const lzma_range_encoder *rc)
 {
--- a/src/liblzma/simple/arm.c
+++ b/src/liblzma/simple/arm.c
@ -53,6 +53,7 @@ arm_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 }


+#ifdef HAVE_ENCODER_ARM
 extern lzma_ret
 lzma_simple_arm_encoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -60,8 +61,10 @@ lzma_simple_arm_encoder_init(lzma_next_coder *next,
 {
 	return arm_coder_init(next, allocator, filters, true);
 }
+#endif


+#ifdef HAVE_DECODER_ARM
 extern lzma_ret
 lzma_simple_arm_decoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -69,3 +72,4 @@ lzma_simple_arm_decoder_init(lzma_next_coder *next,
 {
 	return arm_coder_init(next, allocator, filters, false);
 }
+#endif
--- a/src/liblzma/simple/arm64.c
+++ b/src/liblzma/simple/arm64.c
@ -0,0 +1,136 @@
+///////////////////////////////////////////////////////////////////////////////
+//
+/// \file       arm64.c
+/// \brief      Filter for ARM64 binaries
+///
+/// This converts ARM64 relative addresses in the BL and ADRP immediates
+/// to absolute values to increase redundancy of ARM64 code.
+///
+/// Converting B or ADR instructions was also tested but it's not useful.
+/// A majority of the jumps for the B instruction are very small (+/- 0xFF).
+/// These are typical for loops and if-statements. Encoding them to their
+/// absolute address reduces redundancy since many of the small relative
+/// jump values are repeated, but very few of the absolute addresses are.
+//
+//  Authors:    Lasse Collin
+//              Jia Tan
+//
+//  This file has been put into the public domain.
+//  You can do whatever you want with this file.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+#include "simple_private.h"
+
+
+static size_t
+arm64_code(void *simple lzma_attribute((__unused__)),
+		uint32_t now_pos, bool is_encoder,
+		uint8_t *buffer, size_t size)
+{
+	size_t i;
+
+	// Clang 14.0.6 on x86-64 makes this four times bigger and 40 % slower
+	// with auto-vectorization that is enabled by default with -O2.
+	// Such vectorization bloat happens with -O2 when targeting ARM64 too
+	// but performance hasn't been tested.
+#ifdef __clang__
+#	pragma clang loop vectorize(disable)
+#endif
+	for (i = 0; i + 4 <= size; i += 4) {
+		uint32_t pc = (uint32_t)(now_pos + i);
+		uint32_t instr = read32le(buffer + i);
+
+		if ((instr >> 26) == 0x25) {
+			// BL instruction:
+			// The full 26-bit immediate is converted.
+			// The range is +/-128 MiB.
+			//
+			// Using the full range is helps quite a lot with
+			// big executables. Smaller range would reduce false
+			// positives in non-code sections of the input though
+			// so this is a compromise that slightly favors big
+			// files. With the full range only six bits of the 32
+			// need to match to trigger a conversion.
+			const uint32_t src = instr;
+			instr = 0x94000000;
+
+			pc >>= 2;
+			if (!is_encoder)
+				pc = 0U - pc;
+
+			instr |= (src + pc) & 0x03FFFFFF;
+			write32le(buffer + i, instr);
+
+		} else if ((instr & 0x9F000000) == 0x90000000) {
+			// ADRP instruction:
+			// Only values in the range +/-512 MiB are converted.
+			//
+			// Using less than the full +/-4 GiB range reduces
+			// false positives on non-code sections of the input
+			// while being excellent for executables up to 512 MiB.
+			// The positive effect of ADRP conversion is smaller
+			// than that of BL but it also doesn't hurt so much in
+			// non-code sections of input because, with +/-512 MiB
+			// range, nine bits of 32 need to match to trigger a
+			// conversion (two 10-bit match choices = 9 bits).
+			const uint32_t src = ((instr >> 29) & 3)
+					| ((instr >> 3) & 0x001FFFFC);
+
+			// With the addition only one branch is needed to
+			// check the +/- range. This is usually false when
+			// processing ARM64 code so branch prediction will
+			// handle it well in terms of performance.
+			//
+			//if ((src & 0x001E0000) != 0
+			// && (src & 0x001E0000) != 0x001E0000)
+			if ((src + 0x00020000) & 0x001C0000)
+				continue;
+
+			instr &= 0x9000001F;
+
+			pc >>= 12;
+			if (!is_encoder)
+				pc = 0U - pc;
+
+			const uint32_t dest = src + pc;
+			instr |= (dest & 3) << 29;
+			instr |= (dest & 0x0003FFFC) << 3;
+			instr |= (0U - (dest & 0x00020000)) & 0x00E00000;
+			write32le(buffer + i, instr);
+		}
+	}
+
+	return i;
+}
+
+
+static lzma_ret
+arm64_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
+		const lzma_filter_info *filters, bool is_encoder)
+{
+	return lzma_simple_coder_init(next, allocator, filters,
+			&arm64_code, 0, 4, 4, is_encoder);
+}
+
+
+#ifdef HAVE_ENCODER_ARM64
+extern lzma_ret
+lzma_simple_arm64_encoder_init(lzma_next_coder *next,
+		const lzma_allocator *allocator,
+		const lzma_filter_info *filters)
+{
+	return arm64_coder_init(next, allocator, filters, true);
+}
+#endif
+
+
+#ifdef HAVE_DECODER_ARM64
+extern lzma_ret
+lzma_simple_arm64_decoder_init(lzma_next_coder *next,
+		const lzma_allocator *allocator,
+		const lzma_filter_info *filters)
+{
+	return arm64_coder_init(next, allocator, filters, false);
+}
+#endif
--- a/src/liblzma/simple/armthumb.c
+++ b/src/liblzma/simple/armthumb.c
@ -58,6 +58,7 @@ armthumb_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 }


+#ifdef HAVE_ENCODER_ARMTHUMB
 extern lzma_ret
 lzma_simple_armthumb_encoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -65,8 +66,10 @@ lzma_simple_armthumb_encoder_init(lzma_next_coder *next,
 {
 	return armthumb_coder_init(next, allocator, filters, true);
 }
+#endif


+#ifdef HAVE_DECODER_ARMTHUMB
 extern lzma_ret
 lzma_simple_armthumb_decoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -74,3 +77,4 @@ lzma_simple_armthumb_decoder_init(lzma_next_coder *next,
 {
 	return armthumb_coder_init(next, allocator, filters, false);
 }
+#endif
--- a/src/liblzma/simple/ia64.c
+++ b/src/liblzma/simple/ia64.c
@ -94,6 +94,7 @@ ia64_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 }


+#ifdef HAVE_ENCODER_IA64
 extern lzma_ret
 lzma_simple_ia64_encoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -101,8 +102,10 @@ lzma_simple_ia64_encoder_init(lzma_next_coder *next,
 {
 	return ia64_coder_init(next, allocator, filters, true);
 }
+#endif


+#ifdef HAVE_DECODER_IA64
 extern lzma_ret
 lzma_simple_ia64_decoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -110,3 +113,4 @@ lzma_simple_ia64_decoder_init(lzma_next_coder *next,
 {
 	return ia64_coder_init(next, allocator, filters, false);
 }
+#endif
--- a/src/liblzma/simple/powerpc.c
+++ b/src/liblzma/simple/powerpc.c
@ -58,6 +58,7 @@ powerpc_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 }


+#ifdef HAVE_ENCODER_POWERPC
 extern lzma_ret
 lzma_simple_powerpc_encoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -65,8 +66,10 @@ lzma_simple_powerpc_encoder_init(lzma_next_coder *next,
 {
 	return powerpc_coder_init(next, allocator, filters, true);
 }
+#endif


+#ifdef HAVE_DECODER_POWERPC
 extern lzma_ret
 lzma_simple_powerpc_decoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -74,3 +77,4 @@ lzma_simple_powerpc_decoder_init(lzma_next_coder *next,
 {
 	return powerpc_coder_init(next, allocator, filters, false);
 }
+#endif
--- a/src/liblzma/simple/simple_coder.h
+++ b/src/liblzma/simple/simple_coder.h
@ -61,6 +61,15 @@ extern lzma_ret lzma_simple_armthumb_decoder_init(lzma_next_coder *next,
 		const lzma_filter_info *filters);


+extern lzma_ret lzma_simple_arm64_encoder_init(lzma_next_coder *next,
+               const lzma_allocator *allocator,
+               const lzma_filter_info *filters);
+
+extern lzma_ret lzma_simple_arm64_decoder_init(lzma_next_coder *next,
+               const lzma_allocator *allocator,
+               const lzma_filter_info *filters);
+
+
 extern lzma_ret lzma_simple_sparc_encoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
 		const lzma_filter_info *filters);
--- a/src/liblzma/simple/sparc.c
+++ b/src/liblzma/simple/sparc.c
@ -65,6 +65,7 @@ sparc_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 }


+#ifdef HAVE_ENCODER_SPARC
 extern lzma_ret
 lzma_simple_sparc_encoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -72,8 +73,10 @@ lzma_simple_sparc_encoder_init(lzma_next_coder *next,
 {
 	return sparc_coder_init(next, allocator, filters, true);
 }
+#endif


+#ifdef HAVE_DECODER_SPARC
 extern lzma_ret
 lzma_simple_sparc_decoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -81,3 +84,4 @@ lzma_simple_sparc_decoder_init(lzma_next_coder *next,
 {
 	return sparc_coder_init(next, allocator, filters, false);
 }
+#endif
--- a/src/liblzma/simple/x86.c
+++ b/src/liblzma/simple/x86.c
@ -141,6 +141,7 @@ x86_coder_init(lzma_next_coder *next, const lzma_allocator *allocator,
 }


+#ifdef HAVE_ENCODER_X86
 extern lzma_ret
 lzma_simple_x86_encoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -148,8 +149,10 @@ lzma_simple_x86_encoder_init(lzma_next_coder *next,
 {
 	return x86_coder_init(next, allocator, filters, true);
 }
+#endif


+#ifdef HAVE_DECODER_X86
 extern lzma_ret
 lzma_simple_x86_decoder_init(lzma_next_coder *next,
 		const lzma_allocator *allocator,
@ -157,3 +160,4 @@ lzma_simple_x86_decoder_init(lzma_next_coder *next,
 {
 	return x86_coder_init(next, allocator, filters, false);
 }
+#endif
--- a/src/xz/args.c
+++ b/src/xz/args.c
@ -29,19 +29,29 @@ bool opt_ignore_check = false;
 const char stdin_filename[] = "(stdin)";


-/// Parse and set the memory usage limit for compression and/or decompression.
+/// Parse and set the memory usage limit for compression, decompression,
+/// and/or multithreaded decompression.
 static void
-parse_memlimit(const char *name, const char *name_percentage, char *str,
-		bool set_compress, bool set_decompress)
+parse_memlimit(const char *name, const char *name_percentage, const char *str,
+		bool set_compress, bool set_decompress, bool set_mtdec)
 {
 	bool is_percentage = false;
 	uint64_t value;

 	const size_t len = strlen(str);
 	if (len > 0 && str[len - 1] == '%') {
-		str[len - 1] = '\0';
+		// Make a copy so that we can get rid of %.
+		//
+		// In the past str wasn't const and we modified it directly
+		// but that modified argv[] and thus affected what was visible
+		// in "ps auxf" or similar tools which was confusing. For
+		// example, --memlimit=50% would show up as --memlimit=50
+		// since the percent sign was overwritten here.
+		char *s = xstrdup(str);
+		s[len - 1] = '\0';
 		is_percentage = true;
-		value = str_to_uint64(name_percentage, str, 1, 100);
+		value = str_to_uint64(name_percentage, s, 1, 100);
+		free(s);
 	} else {
 		// On 32-bit systems, SIZE_MAX would make more sense than
 		// UINT64_MAX. But use UINT64_MAX still so that scripts
@ -49,15 +59,19 @@ parse_memlimit(const char *name, const char *name_percentage, char *str,
 		value = str_to_uint64(name, str, 0, UINT64_MAX);
 	}

-	hardware_memlimit_set(
-			value, set_compress, set_decompress, is_percentage);
+	hardware_memlimit_set(value, set_compress, set_decompress, set_mtdec,
+			is_percentage);
 	return;
 }


 static void
-parse_block_list(char *str)
+parse_block_list(const char *str_const)
 {
+	// We need a modifiable string in the for-loop.
+	char *str_start = xstrdup(str_const);
+	char *str = str_start;
+
 	// It must be non-empty and not begin with a comma.
 	if (str[0] == '\0' || str[0] == ',')
 		message_fatal(_("%s: Invalid argument to --block-list"), str);
@ -112,6 +126,8 @@ parse_block_list(char *str)

 	// Terminate the array.
 	opt_block_list[count] = 0;
+
+	free(str_start);
 	return;
 }

@ -125,6 +141,7 @@ parse_real(args_info *args, int argc, char **argv)
 		OPT_IA64,
 		OPT_ARM,
 		OPT_ARMTHUMB,
+		OPT_ARM64,
 		OPT_SPARC,
 		OPT_DELTA,
 		OPT_LZMA1,
@ -138,6 +155,7 @@ parse_real(args_info *args, int argc, char **argv)
 		OPT_BLOCK_LIST,
 		OPT_MEM_COMPRESS,
 		OPT_MEM_DECOMPRESS,
+		OPT_MEM_MT_DECOMPRESS,
 		OPT_NO_ADJUST,
 		OPT_INFO_MEMORY,
 		OPT_ROBOT,
@ -176,6 +194,7 @@ parse_real(args_info *args, int argc, char **argv)
 		{ "block-list",  required_argument, NULL,  OPT_BLOCK_LIST },
 		{ "memlimit-compress",   required_argument, NULL, OPT_MEM_COMPRESS },
 		{ "memlimit-decompress", required_argument, NULL, OPT_MEM_DECOMPRESS },
+		{ "memlimit-mt-decompress", required_argument, NULL, OPT_MEM_MT_DECOMPRESS },
 		{ "memlimit",     required_argument, NULL,  'M' },
 		{ "memory",       required_argument, NULL,  'M' }, // Old alias
 		{ "no-adjust",    no_argument,       NULL,  OPT_NO_ADJUST },
@ -194,6 +213,7 @@ parse_real(args_info *args, int argc, char **argv)
 		{ "ia64",         optional_argument, NULL,  OPT_IA64 },
 		{ "arm",          optional_argument, NULL,  OPT_ARM },
 		{ "armthumb",     optional_argument, NULL,  OPT_ARMTHUMB },
+		{ "arm64",        optional_argument, NULL,  OPT_ARM64 },
 		{ "sparc",        optional_argument, NULL,  OPT_SPARC },
 		{ "delta",        optional_argument, NULL,  OPT_DELTA },

@ -225,20 +245,27 @@ parse_real(args_info *args, int argc, char **argv)
 		case OPT_MEM_COMPRESS:
 			parse_memlimit("memlimit-compress",
 					"memlimit-compress%", optarg,
-					true, false);
+					true, false, false);
 			break;

 		// --memlimit-decompress
 		case OPT_MEM_DECOMPRESS:
 			parse_memlimit("memlimit-decompress",
 					"memlimit-decompress%", optarg,
-					false, true);
+					false, true, false);
+			break;
+
+		// --memlimit-mt-decompress
+		case OPT_MEM_MT_DECOMPRESS:
+			parse_memlimit("memlimit-mt-decompress",
+					"memlimit-mt-decompress%", optarg,
+					false, false, true);
 			break;

 		// --memlimit
 		case 'M':
 			parse_memlimit("memlimit", "memlimit%", optarg,
-					true, true);
+					true, true, true);
 			break;

 		// --suffix
@ -246,11 +273,23 @@ parse_real(args_info *args, int argc, char **argv)
 			suffix_set(optarg);
 			break;

-		case 'T':
+		case 'T': {
+			// Since xz 5.4.0: Ignore leading '+' first.
+			const char *s = optarg;
+			if (optarg[0] == '+')
+				++s;
+
 			// The max is from src/liblzma/common/common.h.
-			hardware_threads_set(str_to_uint64("threads",
-					optarg, 0, 16384));
+			uint32_t t = str_to_uint64("threads", s, 0, 16384);
+
+			// If leading '+' was used then use multi-threaded
+			// mode even if exactly one thread was specified.
+			if (t == 1 && optarg[0] == '+')
+				t = UINT32_MAX;
+
+			hardware_threads_set(t);
 			break;
+		}

 		// --version
 		case 'V':
@ -360,6 +399,11 @@ parse_real(args_info *args, int argc, char **argv)
 					options_bcj(optarg));
 			break;

+		case OPT_ARM64:
+			coder_add_filter(LZMA_FILTER_ARM64,
+					options_bcj(optarg));
+			break;
+
 		case OPT_SPARC:
 			coder_add_filter(LZMA_FILTER_SPARC,
 					options_bcj(optarg));
@ -395,8 +439,9 @@ parse_real(args_info *args, int argc, char **argv)
 				{ "xz",     FORMAT_XZ },
 				{ "lzma",   FORMAT_LZMA },
 				{ "alone",  FORMAT_LZMA },
-				// { "gzip",   FORMAT_GZIP },
-				// { "gz",     FORMAT_GZIP },
+#ifdef HAVE_LZIP_DECODER
+				{ "lzip",   FORMAT_LZIP },
+#endif
 				{ "raw",    FORMAT_RAW },
 			};

@ -475,7 +520,7 @@ parse_real(args_info *args, int argc, char **argv)
 						"or `--files0'."));

 			if (optarg == NULL) {
-				args->files_name = (char *)stdin_filename;
+				args->files_name = stdin_filename;
 				args->files_file = stdin;
 			} else {
 				args->files_name = optarg;
@ -651,6 +696,12 @@ args_parse(args_info *args, int argc, char **argv)
 				"at build time"));
 #endif

+#ifdef HAVE_LZIP_DECODER
+	if (opt_mode == MODE_COMPRESS && opt_format == FORMAT_LZIP)
+		message_fatal(_("Compression of lzip files (.lz) "
+				"is not supported"));
+#endif
+
 	// Never remove the source file when the destination is not on disk.
 	// In test mode the data is written nowhere, but setting opt_stdout
 	// will make the rest of the code behave well.
--- a/src/xz/args.h
+++ b/src/xz/args.h
@ -19,7 +19,7 @@ typedef struct {

 	/// Name of the file from which to read filenames. This is NULL
 	/// if --files or --files0 was not used.
-	char *files_name;
+	const char *files_name;

 	/// File opened for reading from which filenames are read. This is
 	/// non-NULL only if files_name is non-NULL.
--- a/src/xz/coder.c
+++ b/src/xz/coder.c
@ -51,7 +51,12 @@ static lzma_check check;
 /// This becomes false if the --check=CHECK option is used.
 static bool check_default = true;

-#if defined(HAVE_ENCODERS) && defined(MYTHREAD_ENABLED)
+/// Indicates if unconsumed input is allowed to remain after
+/// decoding has successfully finished. This is set for each file
+/// in coder_init().
+static bool allow_trailing_input;
+
+#ifdef MYTHREAD_ENABLED
 static lzma_mt mt_options = {
 	.flags = 0,
 	.timeout = 300,
@ -136,6 +141,11 @@ memlimit_too_small(uint64_t memory_usage)
 extern void
 coder_set_compression_settings(void)
 {
+#ifdef HAVE_LZIP_DECODER
+	// .lz compression isn't supported.
+	assert(opt_format != FORMAT_LZIP);
+#endif
+
 	// The default check type is CRC64, but fallback to CRC32
 	// if CRC64 isn't supported by the copy of liblzma we are
 	// using. CRC32 is always supported.
@ -211,7 +221,7 @@ coder_set_compression_settings(void)
 			}
 		}

-		if (hardware_threads_get() > 1) {
+		if (hardware_threads_is_mt()) {
 			message(V_WARNING, _("Switching to single-threaded "
 					"mode due to --flush-timeout"));
 			hardware_threads_set(1);
@ -220,12 +230,16 @@ coder_set_compression_settings(void)

 	// Get the memory usage. Note that if --format=raw was used,
 	// we can be decompressing.
-	const uint64_t memory_limit = hardware_memlimit_get(opt_mode);
+	//
+	// If multithreaded .xz compression is done, this value will be
+	// replaced.
+	uint64_t memory_limit = hardware_memlimit_get(opt_mode);
 	uint64_t memory_usage = UINT64_MAX;
 	if (opt_mode == MODE_COMPRESS) {
 #ifdef HAVE_ENCODERS
 #	ifdef MYTHREAD_ENABLED
-		if (opt_format == FORMAT_XZ && hardware_threads_get() > 1) {
+		if (opt_format == FORMAT_XZ && hardware_threads_is_mt()) {
+			memory_limit = hardware_memlimit_mtenc_get();
 			mt_options.threads = hardware_threads_get();
 			mt_options.block_size = opt_block_size;
 			mt_options.check = check;
@ -269,47 +283,90 @@ coder_set_compression_settings(void)
 	if (memory_usage <= memory_limit)
 		return;

-	// If --no-adjust was used or we didn't find LZMA1 or
-	// LZMA2 as the last filter, give an error immediately.
-	// --format=raw implies --no-adjust.
-	if (!opt_auto_adjust || opt_format == FORMAT_RAW)
+	// With --format=raw settings are never adjusted to meet
+	// the memory usage limit.
+	if (opt_format == FORMAT_RAW)
 		memlimit_too_small(memory_usage);

 	assert(opt_mode == MODE_COMPRESS);

 #ifdef HAVE_ENCODERS
 #	ifdef MYTHREAD_ENABLED
-	if (opt_format == FORMAT_XZ && mt_options.threads > 1) {
+	if (opt_format == FORMAT_XZ && hardware_threads_is_mt()) {
 		// Try to reduce the number of threads before
 		// adjusting the compression settings down.
-		do {
-			// FIXME? The real single-threaded mode has
-			// lower memory usage, but it's not comparable
-			// because it doesn't write the size info
-			// into Block Headers.
-			if (--mt_options.threads == 0)
-				memlimit_too_small(memory_usage);
-
+		while (mt_options.threads > 1) {
+			// Reduce the number of threads by one and check
+			// the memory usage.
+			--mt_options.threads;
 			memory_usage = lzma_stream_encoder_mt_memusage(
 					&mt_options);
 			if (memory_usage == UINT64_MAX)
 				message_bug();

-		} while (memory_usage > memory_limit);
+			if (memory_usage <= memory_limit) {
+				// The memory usage is now low enough.
+				message(V_WARNING, _("Reduced the number of "
+					"threads from %s to %s to not exceed "
+					"the memory usage limit of %s MiB"),
+					uint64_to_str(
+						hardware_threads_get(), 0),
+					uint64_to_str(mt_options.threads, 1),
+					uint64_to_str(round_up_to_mib(
+						memory_limit), 2));
+				return;
+			}
+		}

-		message(V_WARNING, _("Adjusted the number of threads "
-			"from %s to %s to not exceed "
-			"the memory usage limit of %s MiB"),
-			uint64_to_str(hardware_threads_get(), 0),
-			uint64_to_str(mt_options.threads, 1),
-			uint64_to_str(round_up_to_mib(
-				memory_limit), 2));
+		// If the memory usage limit is only a soft limit (automatic
+		// number of threads and no --memlimit-compress), the limit
+		// is only used to reduce the number of threads and once at
+		// just one thread, the limit is completely ignored. This
+		// way -T0 won't use insane amount of memory but at the same
+		// time the soft limit will never make xz fail and never make
+		// xz change settings that would affect the compressed output.
+		if (hardware_memlimit_mtenc_is_default()) {
+			message(V_WARNING, _("Reduced the number of threads "
+				"from %s to one. The automatic memory usage "
+				"limit of %s MiB is still being exceeded. "
+				"%s MiB of memory is required. "
+				"Continuing anyway."),
+				uint64_to_str(hardware_threads_get(), 0),
+				uint64_to_str(
+					round_up_to_mib(memory_limit), 1),
+				uint64_to_str(
+					round_up_to_mib(memory_usage), 2));
+			return;
+		}
+
+		// If --no-adjust was used, we cannot drop to single-threaded
+		// mode since it produces different compressed output.
+		//
+		// NOTE: In xz 5.2.x, --no-adjust also prevented reducing
+		// the number of threads. This changed in 5.3.3alpha.
+		if (!opt_auto_adjust)
+			memlimit_too_small(memory_usage);
+
+		// Switch to single-threaded mode. It uses
+		// less memory than using one thread in
+		// the multithreaded mode but the output
+		// is also different.
+		hardware_threads_set(1);
+		memory_usage = lzma_raw_encoder_memusage(filters);
+		message(V_WARNING, _("Switching to single-threaded mode "
+			"to not exceed the memory usage limit of %s MiB"),
+			uint64_to_str(round_up_to_mib(memory_limit), 0));
 	}
 #	endif

 	if (memory_usage <= memory_limit)
 		return;

+	// Don't adjust LZMA2 or LZMA1 dictionary size if --no-adjust
+	// was specified as that would change the compressed output.
+	if (!opt_auto_adjust)
+		memlimit_too_small(memory_usage);
+
 	// Look for the last filter if it is LZMA2 or LZMA1, so we can make
 	// it use less RAM. With other filters we don't know what to do.
 	size_t i = 0;
@ -423,6 +480,18 @@ is_format_lzma(void)

 	return true;
 }
+
+
+#ifdef HAVE_LZIP_DECODER
+/// Return true if the data in in_buf seems to be in the .lz format.
+static bool
+is_format_lzip(void)
+{
+	static const uint8_t magic[4] = { 0x4C, 0x5A, 0x49, 0x50 };
+	return strm.avail_in >= sizeof(magic)
+			&& memcmp(in_buf.u8, magic, sizeof(magic)) == 0;
+}
+#endif
 #endif


@ -436,6 +505,12 @@ coder_init(file_pair *pair)
 {
 	lzma_ret ret = LZMA_PROG_ERROR;

+	// In most cases if there is input left when coding finishes,
+	// something has gone wrong. Exceptions are --single-stream
+	// and decoding .lz files which can contain trailing non-.lz data.
+	// These will be handled later in this function.
+	allow_trailing_input = false;
+
 	if (opt_mode == MODE_COMPRESS) {
 #ifdef HAVE_ENCODERS
 		switch (opt_format) {
@ -446,7 +521,7 @@ coder_init(file_pair *pair)

 		case FORMAT_XZ:
 #	ifdef MYTHREAD_ENABLED
-			if (hardware_threads_get() > 1)
+			if (hardware_threads_is_mt())
 				ret = lzma_stream_encoder_mt(
 						&strm, &mt_options);
 			else
@ -459,6 +534,14 @@ coder_init(file_pair *pair)
 			ret = lzma_alone_encoder(&strm, filters[0].options);
 			break;

+#	ifdef HAVE_LZIP_DECODER
+		case FORMAT_LZIP:
+			// args.c should disallow this.
+			assert(0);
+			ret = LZMA_PROG_ERROR;
+			break;
+#	endif
+
 		case FORMAT_RAW:
 			ret = lzma_raw_encoder(&strm, filters);
 			break;
@ -475,7 +558,9 @@ coder_init(file_pair *pair)
 		else
 			flags |= LZMA_TELL_UNSUPPORTED_CHECK;

-		if (!opt_single_stream)
+		if (opt_single_stream)
+			allow_trailing_input = true;
+		else
 			flags |= LZMA_CONCATENATED;

 		// We abuse FORMAT_AUTO to indicate unknown file format,
@ -484,8 +569,14 @@ coder_init(file_pair *pair)

 		switch (opt_format) {
 		case FORMAT_AUTO:
+			// .lz is checked before .lzma since .lzma detection
+			// is more complicated (no magic bytes).
 			if (is_format_xz())
 				init_format = FORMAT_XZ;
+#	ifdef HAVE_LZIP_DECODER
+			else if (is_format_lzip())
+				init_format = FORMAT_LZIP;
+#	endif
 			else if (is_format_lzma())
 				init_format = FORMAT_LZMA;
 			break;
@ -500,6 +591,13 @@ coder_init(file_pair *pair)
 				init_format = FORMAT_LZMA;
 			break;

+#	ifdef HAVE_LZIP_DECODER
+		case FORMAT_LZIP:
+			if (is_format_lzip())
+				init_format = FORMAT_LZIP;
+			break;
+#	endif
+
 		case FORMAT_RAW:
 			init_format = FORMAT_RAW;
 			break;
@ -524,9 +622,31 @@ coder_init(file_pair *pair)
 			break;

 		case FORMAT_XZ:
+#	ifdef MYTHREAD_ENABLED
+			mt_options.flags = flags;
+
+			mt_options.threads = hardware_threads_get();
+			mt_options.memlimit_stop
+				= hardware_memlimit_get(MODE_DECOMPRESS);
+
+			// If single-threaded mode was requested, set the
+			// memlimit for threading to zero. This forces the
+			// decoder to use single-threaded mode which matches
+			// the behavior of lzma_stream_decoder().
+			//
+			// Otherwise use the limit for threaded decompression
+			// which has a sane default (users are still free to
+			// make it insanely high though).
+			mt_options.memlimit_threading
+					= mt_options.threads == 1
+					? 0 : hardware_memlimit_mtdec_get();
+
+			ret = lzma_stream_decoder_mt(&strm, &mt_options);
+#	else
 			ret = lzma_stream_decoder(&strm,
 					hardware_memlimit_get(
 						MODE_DECOMPRESS), flags);
+#	endif
 			break;

 		case FORMAT_LZMA:
@ -535,6 +655,15 @@ coder_init(file_pair *pair)
 						MODE_DECOMPRESS));
 			break;

+#	ifdef HAVE_LZIP_DECODER
+		case FORMAT_LZIP:
+			allow_trailing_input = true;
+			ret = lzma_lzip_decoder(&strm,
+					hardware_memlimit_get(
+						MODE_DECOMPRESS), flags);
+			break;
+#	endif
+
 		case FORMAT_RAW:
 			// Memory usage has already been checked in
 			// coder_set_compression_settings().
@ -598,7 +727,7 @@ split_block(uint64_t *block_remaining,
 {
 	if (*next_block_remaining > 0) {
 		// The Block at *list_pos has previously been split up.
-		assert(hardware_threads_get() == 1);
+		assert(!hardware_threads_is_mt());
 		assert(opt_block_size > 0);
 		assert(opt_block_list != NULL);

@ -626,7 +755,7 @@ split_block(uint64_t *block_remaining,
 		// If in single-threaded mode, split up the Block if needed.
 		// This is not needed in multi-threaded mode because liblzma
 		// will do this due to how threaded encoding works.
-		if (hardware_threads_get() == 1 && opt_block_size > 0
+		if (!hardware_threads_is_mt() && opt_block_size > 0
 				&& *block_remaining > opt_block_size) {
 			*next_block_remaining
 					= *block_remaining - opt_block_size;
@ -686,7 +815,7 @@ coder_normal(file_pair *pair)
 		// --block-size doesn't do anything here in threaded mode,
 		// because the threaded encoder will take care of splitting
 		// to fixed-sized Blocks.
-		if (hardware_threads_get() == 1 && opt_block_size > 0)
+		if (!hardware_threads_is_mt() && opt_block_size > 0)
 			block_remaining = opt_block_size;

 		// If --block-list was used, start with the first size.
@ -700,7 +829,7 @@ coder_normal(file_pair *pair)
 		// mode the size info isn't written into Block Headers.
 		if (opt_block_list != NULL) {
 			if (block_remaining < opt_block_list[list_pos]) {
-				assert(hardware_threads_get() == 1);
+				assert(!hardware_threads_is_mt());
 				next_block_remaining = opt_block_list[list_pos]
 						- block_remaining;
 			} else {
@ -764,7 +893,7 @@ coder_normal(file_pair *pair)
 			} else {
 				// Start a new Block after LZMA_FULL_BARRIER.
 				if (opt_block_list == NULL) {
-					assert(hardware_threads_get() == 1);
+					assert(!hardware_threads_is_mt());
 					assert(opt_block_size > 0);
 					block_remaining = opt_block_size;
 				} else {
@ -795,7 +924,7 @@ coder_normal(file_pair *pair)
 			}

 			if (ret == LZMA_STREAM_END) {
-				if (opt_single_stream) {
+				if (allow_trailing_input) {
 					io_fix_src_pos(pair, strm.avail_in);
 					success = true;
 					break;
@ -803,7 +932,9 @@ coder_normal(file_pair *pair)

 				// Check that there is no trailing garbage.
 				// This is needed for LZMA_Alone and raw
-				// streams.
+				// streams. This is *not* done with .lz files
+				// as that format specifically requires
+				// allowing trailing garbage.
 				if (strm.avail_in == 0 && !pair->src_eof) {
 					// Try reading one more byte.
 					// Hopefully we don't get any more
--- a/src/xz/coder.h
+++ b/src/xz/coder.h
@ -23,7 +23,9 @@ enum format_type {
 	FORMAT_AUTO,
 	FORMAT_XZ,
 	FORMAT_LZMA,
-	// HEADER_GZIP,
+#ifdef HAVE_LZIP_DECODER
+	FORMAT_LZIP,
+#endif
 	FORMAT_RAW,
 };

--- a/src/xz/file_io.c
+++ b/src/xz/file_io.c
@ -212,6 +212,17 @@ io_sandbox_enter(int src_fd)
 	if (cap_enter())
 		goto error;

+#elif defined(HAVE_PLEDGE)
+	// pledge() was introduced in OpenBSD 5.9.
+	//
+	// main() unconditionally calls pledge() with fairly relaxed
+	// promises which work in all situations. Here we make the
+	// sandbox more strict.
+	if (pledge("stdio", ""))
+		goto error;
+
+	(void)src_fd;
+
 #else
 #	error ENABLE_SANDBOX is defined but no sandboxing method was found.
 #endif
@ -221,7 +232,7 @@ io_sandbox_enter(int src_fd)
 	return;

 error:
-	message(V_DEBUG, _("Failed to enable the sandbox"));
+	message_fatal(_("Failed to enable the sandbox"));
 }
 #endif // ENABLE_SANDBOX

@ -748,8 +759,10 @@ io_open_src_real(file_pair *pair)
 extern file_pair *
 io_open_src(const char *src_name)
 {
-	if (is_empty_filename(src_name))
+	if (src_name[0] == '\0') {
+		message_error(_("Empty filename, skipping"));
 		return NULL;
+	}

 	// Since we have only one file open at a time, we can use
 	// a statically allocated structure.
@ -1195,16 +1208,36 @@ io_read(file_pair *pair, io_buf *buf, size_t size)


 extern bool
-io_pread(file_pair *pair, io_buf *buf, size_t size, off_t pos)
+io_seek_src(file_pair *pair, uint64_t pos)
 {
-	// Using lseek() and read() is more portable than pread() and
-	// for us it is as good as real pread().
-	if (lseek(pair->src_fd, pos, SEEK_SET) != pos) {
+	// Caller must not attempt to seek past the end of the input file
+	// (seeking to 100 in a 100-byte file is seeking to the end of
+	// the file, not past the end of the file, and thus that is allowed).
+	//
+	// This also validates that pos can be safely cast to off_t.
+	if (pos > (uint64_t)(pair->src_st.st_size))
+		message_bug();
+
+	if (lseek(pair->src_fd, (off_t)(pos), SEEK_SET) == -1) {
 		message_error(_("%s: Error seeking the file: %s"),
 				pair->src_name, strerror(errno));
 		return true;
 	}

+	pair->src_eof = false;
+
+	return false;
+}
+
+
+extern bool
+io_pread(file_pair *pair, io_buf *buf, size_t size, uint64_t pos)
+{
+	// Using lseek() and read() is more portable than pread() and
+	// for us it is as good as real pread().
+	if (io_seek_src(pair, pos))
+		return true;
+
 	const size_t amount = io_read(pair, buf, size);
 	if (amount == SIZE_MAX)
 		return true;
--- a/src/xz/file_io.h
+++ b/src/xz/file_io.h
@ -139,6 +139,19 @@ extern size_t io_read(file_pair *pair, io_buf *buf, size_t size);
 extern void io_fix_src_pos(file_pair *pair, size_t rewind_size);


+/// \brief      Seek to the given absolute position in the source file
+///
+/// This calls lseek() and also clears pair->src_eof.
+///
+/// \param      pair    Seekable source file
+/// \param      pos     Offset relative to the beginning of the file,
+///                     from which the data should be read.
+///
+/// \return     On success, false is returned. On error, error message
+///             is printed and true is returned.
+extern bool io_seek_src(file_pair *pair, uint64_t pos);
+
+
 /// \brief      Read from source file from given offset to a buffer
 ///
 /// This is remotely similar to standard pread(). This uses lseek() though,
@ -152,7 +165,7 @@ extern void io_fix_src_pos(file_pair *pair, size_t rewind_size);
 ///
 /// \return     On success, false is returned. On error, error message
 ///             is printed and true is returned.
-extern bool io_pread(file_pair *pair, io_buf *buf, size_t size, off_t pos);
+extern bool io_pread(file_pair *pair, io_buf *buf, size_t size, uint64_t pos);


 /// \brief      Writes a buffer to the destination file
--- a/src/xz/hardware.c
+++ b/src/xz/hardware.c
@ -17,11 +17,42 @@
 /// the --threads=NUM command line option.
 static uint32_t threads_max = 1;

+/// True when the number of threads is automatically determined based
+/// on the available hardware threads.
+static bool threads_are_automatic = false;
+
+/// If true, then try to use multi-threaded mode (if memlimit allows)
+/// even if only one thread was requested explicitly (-T+1).
+static bool use_mt_mode_with_one_thread = false;
+
 /// Memory usage limit for compression
-static uint64_t memlimit_compress;
+static uint64_t memlimit_compress = 0;

 /// Memory usage limit for decompression
-static uint64_t memlimit_decompress;
+static uint64_t memlimit_decompress = 0;
+
+/// Default memory usage for multithreaded modes:
+///
+///   - Default value for --memlimit-compress when automatic number of threads
+///     is used. However, if the limit wouldn't allow even one thread then
+///     the limit is ignored in coder.c and one thread will be used anyway.
+///     This mess is a compromise: we wish to prevent -T0 from using too
+///     many threads but we also don't want xz to give an error due to
+///     a memlimit that the user didn't explicitly set.
+///
+///   - Default value for --memlimit-mt-decompress
+///
+/// This value is caluclated in hardware_init() and cannot be changed later.
+static uint64_t memlimit_mt_default;
+
+/// Memory usage limit for multithreaded decompression. This is a soft limit:
+/// if reducing the number of threads to one isn't enough to keep memory
+/// usage below this limit, then one thread is used and this limit is ignored.
+/// memlimit_decompress is still obeyed.
+///
+/// This can be set with --memlimit-mt-decompress. The default value for
+/// this is memlimit_mt_default.
+static uint64_t memlimit_mtdec;

 /// Total amount of physical RAM
 static uint64_t total_ram;
@ -30,8 +61,17 @@ static uint64_t total_ram;
 extern void
 hardware_threads_set(uint32_t n)
 {
+	// Reset these to false first and set them to true when appropriate.
+	threads_are_automatic = false;
+	use_mt_mode_with_one_thread = false;
+
 	if (n == 0) {
 		// Automatic number of threads was requested.
+		// If there is only one hardware thread, multi-threaded
+		// mode will still be used if memory limit allows.
+		threads_are_automatic = true;
+		use_mt_mode_with_one_thread = true;
+
 		// If threading support was enabled at build time,
 		// use the number of available CPU cores. Otherwise
 		// use one thread since disabling threading support
@ -43,6 +83,9 @@ hardware_threads_set(uint32_t n)
 #else
 		threads_max = 1;
 #endif
+	} else if (n == UINT32_MAX) {
+		use_mt_mode_with_one_thread = true;
+		threads_max = 1;
 	} else {
 		threads_max = n;
 	}
@ -58,9 +101,21 @@ hardware_threads_get(void)
 }


+extern bool
+hardware_threads_is_mt(void)
+{
+#ifdef MYTHREAD_ENABLED
+	return threads_max > 1 || use_mt_mode_with_one_thread;
+#else
+	return false;
+#endif
+}
+
+
 extern void
 hardware_memlimit_set(uint64_t new_memlimit,
-		bool set_compress, bool set_decompress, bool is_percentage)
+		bool set_compress, bool set_decompress, bool set_mtdec,
+		bool is_percentage)
 {
 	if (is_percentage) {
 		assert(new_memlimit > 0);
@ -110,6 +165,9 @@ hardware_memlimit_set(uint64_t new_memlimit,
 	if (set_decompress)
 		memlimit_decompress = new_memlimit;

+	if (set_mtdec)
+		memlimit_mtdec = new_memlimit;
+
 	return;
 }

@ -117,32 +175,69 @@ hardware_memlimit_set(uint64_t new_memlimit,
 extern uint64_t
 hardware_memlimit_get(enum operation_mode mode)
 {
-	// Zero is a special value that indicates the default. Currently
-	// the default simply disables the limit. Once there is threading
-	// support, this might be a little more complex, because there will
-	// probably be a special case where a user asks for "optimal" number
-	// of threads instead of a specific number (this might even become
-	// the default mode). Each thread may use a significant amount of
-	// memory. When there are no memory usage limits set, we need some
-	// default soft limit for calculating the "optimal" number of
-	// threads.
+	// 0 is a special value that indicates the default.
+	// It disables the limit in single-threaded mode.
+	//
+	// NOTE: For multithreaded decompression, this is the hard limit
+	// (memlimit_stop). hardware_memlimit_mtdec_get() gives the
+	// soft limit (memlimit_threaded).
 	const uint64_t memlimit = mode == MODE_COMPRESS
 			? memlimit_compress : memlimit_decompress;
 	return memlimit != 0 ? memlimit : UINT64_MAX;
 }


+extern uint64_t
+hardware_memlimit_mtenc_get(void)
+{
+	return hardware_memlimit_mtenc_is_default()
+			? memlimit_mt_default
+			: hardware_memlimit_get(MODE_COMPRESS);
+}
+
+
+extern bool
+hardware_memlimit_mtenc_is_default(void)
+{
+	return memlimit_compress == 0 && threads_are_automatic;
+}
+
+
+extern uint64_t
+hardware_memlimit_mtdec_get(void)
+{
+	uint64_t m = memlimit_mtdec != 0
+			? memlimit_mtdec
+			: memlimit_mt_default;
+
+	// Cap the value to memlimit_decompress if it has been specified.
+	// This is nice for --info-memory. It wouldn't be needed for liblzma
+	// since it does this anyway.
+	if (memlimit_decompress != 0 && m > memlimit_decompress)
+		m = memlimit_decompress;
+
+	return m;
+}
+
+
 /// Helper for hardware_memlimit_show() to print one human-readable info line.
 static void
-memlimit_show(const char *str, uint64_t value)
+memlimit_show(const char *str, size_t str_columns, uint64_t value)
 {
+	// Calculate the field width so that str will be padded to take
+	// str_columns on the terminal.
+	//
+	// NOTE: If the string is invalid, this will be -1. Using -1 as
+	// the field width is fine here so it's not handled specially.
+	const int fw = tuklib_mbstr_fw(str, (int)(str_columns));
+
 	// The memory usage limit is considered to be disabled if value
 	// is 0 or UINT64_MAX. This might get a bit more complex once there
 	// is threading support. See the comment in hardware_memlimit_get().
 	if (value == 0 || value == UINT64_MAX)
-		printf("%s %s\n", str, _("Disabled"));
+		printf("  %-*s  %s\n", fw, str, _("Disabled"));
 	else
-		printf("%s %s MiB (%s B)\n", str,
+		printf("  %-*s  %s MiB (%s B)\n", fw, str,
 				uint64_to_str(round_up_to_mib(value), 0),
 				uint64_to_str(value, 1));

@ -153,18 +248,60 @@ memlimit_show(const char *str, uint64_t value)
 extern void
 hardware_memlimit_show(void)
 {
+	uint32_t cputhreads = 1;
+#ifdef MYTHREAD_ENABLED
+	cputhreads = lzma_cputhreads();
+	if (cputhreads == 0)
+		cputhreads = 1;
+#endif
+
 	if (opt_robot) {
-		printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\n", total_ram,
-				memlimit_compress, memlimit_decompress);
+		printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
+				"\t%" PRIu64 "\t%" PRIu32 "\n",
+				total_ram,
+				memlimit_compress,
+				memlimit_decompress,
+				hardware_memlimit_mtdec_get(),
+				memlimit_mt_default,
+				cputhreads);
 	} else {
-		// TRANSLATORS: Test with "xz --info-memory" to see if
-		// the alignment looks nice.
-		memlimit_show(_("Total amount of physical memory (RAM): "),
-				total_ram);
-		memlimit_show(_("Memory usage limit for compression:    "),
-				memlimit_compress);
-		memlimit_show(_("Memory usage limit for decompression:  "),
-				memlimit_decompress);
+		const char *msgs[] = {
+			_("Amount of physical memory (RAM):"),
+			_("Number of processor threads:"),
+			_("Compression:"),
+			_("Decompression:"),
+			_("Multi-threaded decompression:"),
+			_("Default for -T0:"),
+		};
+
+		size_t width_max = 1;
+		for (unsigned i = 0; i < ARRAY_SIZE(msgs); ++i) {
+			size_t w = tuklib_mbstr_width(msgs[i], NULL);
+
+			// When debugging, catch invalid strings with
+			// an assertion. Otherwise fallback to 1 so
+			// that the columns just won't be aligned.
+			assert(w != (size_t)-1);
+			if (w == (size_t)-1)
+				w = 1;
+
+			if (width_max < w)
+				width_max = w;
+		}
+
+		puts(_("Hardware information:"));
+		memlimit_show(msgs[0], width_max, total_ram);
+		printf("  %-*s  %" PRIu32 "\n",
+				tuklib_mbstr_fw(msgs[1], (int)(width_max)),
+				msgs[1], cputhreads);
+
+		putchar('\n');
+		puts(_("Memory usage limits:"));
+		memlimit_show(msgs[2], width_max, memlimit_compress);
+		memlimit_show(msgs[3], width_max, memlimit_decompress);
+		memlimit_show(msgs[4], width_max,
+				hardware_memlimit_mtdec_get());
+		memlimit_show(msgs[5], width_max, memlimit_mt_default);
 	}

 	tuklib_exit(E_SUCCESS, E_ERROR, message_verbosity_get() != V_SILENT);
@ -180,7 +317,22 @@ hardware_init(void)
 	if (total_ram == 0)
 		total_ram = (uint64_t)(ASSUME_RAM) * 1024 * 1024;

-	// Set the defaults.
-	hardware_memlimit_set(0, true, true, false);
+	// FIXME? There may be better methods to determine the default value.
+	// One Linux-specific suggestion is to use MemAvailable from
+	// /proc/meminfo as the starting point.
+	memlimit_mt_default = total_ram / 4;
+
+#if SIZE_MAX == UINT32_MAX
+	// A too high value may cause 32-bit xz to run out of address space.
+	// Use a conservative maximum value here. A few typical address space
+	// sizes with Linux:
+	//   - x86-64 with 32-bit xz: 4 GiB
+	//   - x86: 3 GiB
+	//   - MIPS32: 2 GiB
+	const size_t mem_ceiling = 1400U << 20;
+	if (memlimit_mt_default > mem_ceiling)
+		memlimit_mt_default = mem_ceiling;
+#endif
+
 	return;
 }
--- a/src/xz/hardware.h
+++ b/src/xz/hardware.h
@ -16,22 +16,59 @@ extern void hardware_init(void);


 /// Set the maximum number of worker threads.
+/// A special value of UINT32_MAX sets one thread in multi-threaded mode.
 extern void hardware_threads_set(uint32_t threadlimit);

 /// Get the maximum number of worker threads.
 extern uint32_t hardware_threads_get(void);

+/// Returns true if multithreaded mode should be used for .xz compression.
+/// This can be true even if the number of threads is one.
+extern bool hardware_threads_is_mt(void);

-/// Set the memory usage limit. There are separate limits for compression
-/// and decompression (the latter includes also --list), one or both can
-/// be set with a single call to this function. Zero indicates resetting
-/// the limit back to the defaults. The limit can also be set as a percentage
-/// of installed RAM; the percentage must be in the range [1, 100].
+
+/// Set the memory usage limit. There are separate limits for compression,
+/// decompression (also includes --list), and multithreaded decompression.
+/// Any combination of these can be set with a single call to this function.
+/// Zero indicates resetting the limit back to the defaults.
+/// The limit can also be set as a percentage of installed RAM; the
+/// percentage must be in the range [1, 100].
 extern void hardware_memlimit_set(uint64_t new_memlimit,
-		bool set_compress, bool set_decompress, bool is_percentage);
+		bool set_compress, bool set_decompress, bool set_mtdec,
+		bool is_percentage);

 /// Get the current memory usage limit for compression or decompression.
+/// This is a hard limit that will not be exceeded. This is obeyed in
+/// both single-threaded and multithreaded modes.
 extern uint64_t hardware_memlimit_get(enum operation_mode mode);

+/// This returns a system-specific default value if all of the following
+/// conditions are true:
+///
+///   - An automatic number of threads was requested (--threads=0).
+///
+///   - --memlimit-compress wasn't used or it was reset to the default
+///     value by setting it to 0.
+///
+/// Otherwise this is identical to hardware_memlimit_get(MODE_COMPRESS).
+///
+/// The idea is to keep automatic thread count reasonable so that too
+/// high memory usage is avoided and, with 32-bit xz, running out of
+/// address space is avoided.
+extern uint64_t hardware_memlimit_mtenc_get(void);
+
+/// Returns true if the value returned by hardware_memlimit_mtenc_get() is
+/// a system-specific default value. coder.c uses this to ignore the default
+/// memlimit in case it's too small even for a single thread in multithreaded
+/// mode. This way the default limit will never make xz fail or affect the
+/// compressed output; it will only make xz reduce the number of threads.
+extern bool hardware_memlimit_mtenc_is_default(void);
+
+/// Get the current memory usage limit for multithreaded decompression.
+/// This is only used to reduce the number of threads. This limit can be
+/// exceeded if the number of threads are reduce to one. Then the value
+/// from hardware_memlimit_get() will be honored like in single-threaded mode.
+extern uint64_t hardware_memlimit_mtdec_get(void);
+
 /// Display the amount of RAM and memory usage limits and exit.
 extern void hardware_memlimit_show(void) lzma_attribute((__noreturn__));
--- a/src/xz/list.c
+++ b/src/xz/list.c
@ -52,23 +52,126 @@ typedef struct {
 	uint64_t memusage;

 	/// The filter chain of this Block in human-readable form
-	char filter_chain[FILTERS_STR_SIZE];
+	char *filter_chain;

 } block_header_info;

+#define BLOCK_HEADER_INFO_INIT { .filter_chain = NULL }
+#define block_header_info_end(bhi) free((bhi)->filter_chain)
+
+
+/// Strings ending in a colon. These are used for lines like
+/// "  Foo:   123 MiB". These are grouped because translated strings
+/// may have different maximum string length, and we want to pad all
+/// strings so that the values are aligned nicely.
+static const char *colon_strs[] = {
+	N_("Streams:"),
+	N_("Blocks:"),
+	N_("Compressed size:"),
+	N_("Uncompressed size:"),
+	N_("Ratio:"),
+	N_("Check:"),
+	N_("Stream Padding:"),
+	N_("Memory needed:"),
+	N_("Sizes in headers:"),
+	// This won't be aligned because it's so long:
+	//N_("Minimum XZ Utils version:"),
+	N_("Number of files:"),
+};
+
+/// Enum matching the above strings.
+enum {
+	COLON_STR_STREAMS,
+	COLON_STR_BLOCKS,
+	COLON_STR_COMPRESSED_SIZE,
+	COLON_STR_UNCOMPRESSED_SIZE,
+	COLON_STR_RATIO,
+	COLON_STR_CHECK,
+	COLON_STR_STREAM_PADDING,
+	COLON_STR_MEMORY_NEEDED,
+	COLON_STR_SIZES_IN_HEADERS,
+	//COLON_STR_MINIMUM_XZ_VERSION,
+	COLON_STR_NUMBER_OF_FILES,
+};
+
+/// Field widths to use with printf to pad the strings to use the same number
+/// of columns on a terminal.
+static int colon_strs_fw[ARRAY_SIZE(colon_strs)];
+
+/// Convenience macro to get the translated string and its field width
+/// using a COLON_STR_foo enum.
+#define COLON_STR(num) colon_strs_fw[num], _(colon_strs[num])
+
+
+/// Column headings
+static struct {
+	/// Table column heading string
+	const char *str;
+
+	/// Number of terminal-columns to use for this table-column.
+	/// If a translated string is longer than the initial value,
+	/// this value will be increased in init_headings().
+	int columns;
+
+	/// Field width to use for printf() to pad "str" to use "columns"
+	/// number of columns on a terminal. This is calculated in
+	/// init_headings().
+	int fw;
+
+} headings[] = {
+	{ N_("Stream"), 6, 0 },
+	{ N_("Block"), 9, 0 },
+	{ N_("Blocks"), 9, 0 },
+	{ N_("CompOffset"), 15, 0 },
+	{ N_("UncompOffset"), 15, 0 },
+	{ N_("CompSize"), 15, 0 },
+	{ N_("UncompSize"), 15, 0 },
+	{ N_("TotalSize"), 15, 0 },
+	{ N_("Ratio"), 5, 0 },
+	{ N_("Check"), 10, 0 },
+	{ N_("CheckVal"), 1, 0 },
+	{ N_("Padding"), 7, 0 },
+	{ N_("Header"), 5, 0 },
+	{ N_("Flags"), 2, 0 },
+	{ N_("MemUsage"), 7 + 4, 0 }, // +4 is for " MiB"
+	{ N_("Filters"), 1, 0 },
+};
+
+/// Enum matching the above strings.
+enum {
+	HEADING_STREAM,
+	HEADING_BLOCK,
+	HEADING_BLOCKS,
+	HEADING_COMPOFFSET,
+	HEADING_UNCOMPOFFSET,
+	HEADING_COMPSIZE,
+	HEADING_UNCOMPSIZE,
+	HEADING_TOTALSIZE,
+	HEADING_RATIO,
+	HEADING_CHECK,
+	HEADING_CHECKVAL,
+	HEADING_PADDING,
+	HEADING_HEADERSIZE,
+	HEADING_HEADERFLAGS,
+	HEADING_MEMUSAGE,
+	HEADING_FILTERS,
+};
+
+#define HEADING_STR(num) headings[num].fw, _(headings[num].str)
+

 /// Check ID to string mapping
 static const char check_names[LZMA_CHECK_ID_MAX + 1][12] = {
 	// TRANSLATORS: Indicates that there is no integrity check.
-	// This string is used in tables, so the width must not
-	// exceed ten columns with a fixed-width font.
+	// This string is used in tables. In older xz version this
+	// string was limited to ten columns in a fixed-width font, but
+	// nowadays there is no strict length restriction anymore.
 	N_("None"),
 	"CRC32",
 	// TRANSLATORS: Indicates that integrity check name is not known,
-	// but the Check ID is known (here 2). This and other "Unknown-N"
-	// strings are used in tables, so the width must not exceed ten
-	// columns with a fixed-width font. It's OK to omit the dash if
-	// you need space for one extra letter, but don't use spaces.
+	// but the Check ID is known (here 2). In older xz version these
+	// strings were limited to ten columns in a fixed-width font, but
+	// nowadays there is no strict length restriction anymore.
 	N_("Unknown-2"),
 	N_("Unknown-3"),
 	"CRC64",
@ -112,6 +215,104 @@ static struct {
 } totals = { 0, 0, 0, 0, 0, 0, 0, 0, 50000002, true };


+/// Initialize colon_strs_fw[].
+static void
+init_colon_strs(void)
+{
+	// Lengths of translated strings as bytes.
+	size_t lens[ARRAY_SIZE(colon_strs)];
+
+	// Lengths of translated strings as columns.
+	size_t widths[ARRAY_SIZE(colon_strs)];
+
+	// Maximum number of columns needed by a translated string.
+	size_t width_max = 0;
+
+	for (unsigned i = 0; i < ARRAY_SIZE(colon_strs); ++i) {
+		widths[i] = tuklib_mbstr_width(_(colon_strs[i]), &lens[i]);
+
+		// If debugging is enabled, catch invalid strings with
+		// an assertion. However, when not debugging, use the
+		// byte count as the fallback width. This shouldn't
+		// ever happen unless there is a bad string in the
+		// translations, but in such case I guess it's better
+		// to try to print something useful instead of failing
+		// completely.
+		assert(widths[i] != (size_t)-1);
+		if (widths[i] == (size_t)-1)
+			widths[i] = lens[i];
+
+		if (widths[i] > width_max)
+			width_max = widths[i];
+	}
+
+	// Calculate the field width for printf("%*s") so that the strings
+	// will use width_max columns on a terminal.
+	for (unsigned i = 0; i < ARRAY_SIZE(colon_strs); ++i)
+		colon_strs_fw[i] = (int)(lens[i] + width_max - widths[i]);
+
+	return;
+}
+
+
+/// Initialize headings[].
+static void
+init_headings(void)
+{
+	// Before going through the heading strings themselves, treat
+	// the Check heading specially: Look at the widths of the various
+	// check names and increase the width of the Check column if needed.
+	// The width of the heading name "Check" will then be handled normally
+	// with other heading names in the second loop in this function.
+	for (unsigned i = 0; i < ARRAY_SIZE(check_names); ++i) {
+		size_t len;
+		size_t w = tuklib_mbstr_width(_(check_names[i]), &len);
+
+		// Error handling like in init_colon_strs().
+		assert(w != (size_t)-1);
+		if (w == (size_t)-1)
+			w = len;
+
+		// If the translated string is wider than the minimum width
+		// set at compile time, increase the width.
+		if ((size_t)(headings[HEADING_CHECK].columns) < w)
+			headings[HEADING_CHECK].columns = w;
+	}
+
+	for (unsigned i = 0; i < ARRAY_SIZE(headings); ++i) {
+		size_t len;
+		size_t w = tuklib_mbstr_width(_(headings[i].str), &len);
+
+		// Error handling like in init_colon_strs().
+		assert(w != (size_t)-1);
+		if (w == (size_t)-1)
+			w = len;
+
+		// If the translated string is wider than the minimum width
+		// set at compile time, increase the width.
+		if ((size_t)(headings[i].columns) < w)
+			headings[i].columns = w;
+
+		// Calculate the field width for printf("%*s") so that
+		// the string uses .columns number of columns on a terminal.
+		headings[i].fw = (int)(len + (size_t)headings[i].columns - w);
+	}
+
+	return;
+}
+
+
+/// Initialize the printf field widths that are needed to get nicely aligned
+/// output with translated strings.
+static void
+init_field_widths(void)
+{
+	init_colon_strs();
+	init_headings();
+	return;
+}
+
+
 /// Convert XZ Utils version number to a string.
 static const char *
 xz_ver_to_str(uint32_t ver)
@ -143,9 +344,6 @@ xz_ver_to_str(uint32_t ver)
 ///
 /// \return     On success, false is returned. On error, true is returned.
 ///
-// TODO: This function is pretty big. liblzma should have a function that
-// takes a callback function to parse the Index(es) from a .xz file to make
-// it easy for applications.
 static bool
 parse_indexes(xz_file_info *xfi, file_pair *pair)
 {
@ -161,238 +359,74 @@ parse_indexes(xz_file_info *xfi, file_pair *pair)
 	}

 	io_buf buf;
-	lzma_stream_flags header_flags;
-	lzma_stream_flags footer_flags;
-	lzma_ret ret;
-
-	// lzma_stream for the Index decoder
 	lzma_stream strm = LZMA_STREAM_INIT;
+	lzma_index *idx = NULL;

-	// All Indexes decoded so far
-	lzma_index *combined_index = NULL;
-
-	// The Index currently being decoded
-	lzma_index *this_index = NULL;
-
-	// Current position in the file. We parse the file backwards so
-	// initialize it to point to the end of the file.
-	off_t pos = pair->src_st.st_size;
-
-	// Each loop iteration decodes one Index.
-	do {
-		// Check that there is enough data left to contain at least
-		// the Stream Header and Stream Footer. This check cannot
-		// fail in the first pass of this loop.
-		if (pos < 2 * LZMA_STREAM_HEADER_SIZE) {
-			message_error("%s: %s", pair->src_name,
-					message_strm(LZMA_DATA_ERROR));
-			goto error;
-		}
-
-		pos -= LZMA_STREAM_HEADER_SIZE;
-		lzma_vli stream_padding = 0;
-
-		// Locate the Stream Footer. There may be Stream Padding which
-		// we must skip when reading backwards.
-		while (true) {
-			if (pos < LZMA_STREAM_HEADER_SIZE) {
-				message_error("%s: %s", pair->src_name,
-						message_strm(
-							LZMA_DATA_ERROR));
-				goto error;
-			}
-
-			if (io_pread(pair, &buf,
-					LZMA_STREAM_HEADER_SIZE, pos))
-				goto error;
-
-			// Stream Padding is always a multiple of four bytes.
-			int i = 2;
-			if (buf.u32[i] != 0)
-				break;
-
-			// To avoid calling io_pread() for every four bytes
-			// of Stream Padding, take advantage that we read
-			// 12 bytes (LZMA_STREAM_HEADER_SIZE) already and
-			// check them too before calling io_pread() again.
-			do {
-				stream_padding += 4;
-				pos -= 4;
-				--i;
-			} while (i >= 0 && buf.u32[i] == 0);
-		}
-
-		// Decode the Stream Footer.
-		ret = lzma_stream_footer_decode(&footer_flags, buf.u8);
-		if (ret != LZMA_OK) {
-			message_error("%s: %s", pair->src_name,
-					message_strm(ret));
-			goto error;
-		}
-
-		// Check that the Stream Footer doesn't specify something
-		// that we don't support. This can only happen if the xz
-		// version is older than liblzma and liblzma supports
-		// something new.
-		//
-		// It is enough to check Stream Footer. Stream Header must
-		// match when it is compared against Stream Footer with
-		// lzma_stream_flags_compare().
-		if (footer_flags.version != 0) {
-			message_error("%s: %s", pair->src_name,
-					message_strm(LZMA_OPTIONS_ERROR));
-			goto error;
-		}
-
-		// Check that the size of the Index field looks sane.
-		lzma_vli index_size = footer_flags.backward_size;
-		if ((lzma_vli)(pos) < index_size + LZMA_STREAM_HEADER_SIZE) {
-			message_error("%s: %s", pair->src_name,
-					message_strm(LZMA_DATA_ERROR));
-			goto error;
-		}
-
-		// Set pos to the beginning of the Index.
-		pos -= index_size;
-
-		// See how much memory we can use for decoding this Index.
-		uint64_t memlimit = hardware_memlimit_get(MODE_LIST);
-		uint64_t memused = 0;
-		if (combined_index != NULL) {
-			memused = lzma_index_memused(combined_index);
-			if (memused > memlimit)
-				message_bug();
-
-			memlimit -= memused;
-		}
-
-		// Decode the Index.
-		ret = lzma_index_decoder(&strm, &this_index, memlimit);
-		if (ret != LZMA_OK) {
-			message_error("%s: %s", pair->src_name,
-					message_strm(ret));
-			goto error;
-		}
-
-		do {
-			// Don't give the decoder more input than the
-			// Index size.
-			strm.avail_in = my_min(IO_BUFFER_SIZE, index_size);
-			if (io_pread(pair, &buf, strm.avail_in, pos))
-				goto error;
-
-			pos += strm.avail_in;
-			index_size -= strm.avail_in;
+	lzma_ret ret = lzma_file_info_decoder(&strm, &idx,
+			hardware_memlimit_get(MODE_LIST),
+			(uint64_t)(pair->src_st.st_size));
+	if (ret != LZMA_OK) {
+		message_error("%s: %s", pair->src_name, message_strm(ret));
+		return true;
+	}

+	while (true) {
+		if (strm.avail_in == 0) {
 			strm.next_in = buf.u8;
-			ret = lzma_code(&strm, LZMA_RUN);
+			strm.avail_in = io_read(pair, &buf, IO_BUFFER_SIZE);
+			if (strm.avail_in == SIZE_MAX)
+				goto error;
+		}

-		} while (ret == LZMA_OK);
+		ret = lzma_code(&strm, LZMA_RUN);

-		// If the decoding seems to be successful, check also that
-		// the Index decoder consumed as much input as indicated
-		// by the Backward Size field.
-		if (ret == LZMA_STREAM_END)
-			if (index_size != 0 || strm.avail_in != 0)
-				ret = LZMA_DATA_ERROR;
+		switch (ret) {
+		case LZMA_OK:
+			break;

-		if (ret != LZMA_STREAM_END) {
-			// LZMA_BUFFER_ERROR means that the Index decoder
-			// would have liked more input than what the Index
-			// size should be according to Stream Footer.
-			// The message for LZMA_DATA_ERROR makes more
-			// sense in that case.
-			if (ret == LZMA_BUF_ERROR)
-				ret = LZMA_DATA_ERROR;
+		case LZMA_SEEK_NEEDED:
+			// liblzma won't ask us to seek past the known size
+			// of the input file.
+			assert(strm.seek_pos
+					<= (uint64_t)(pair->src_st.st_size));
+			if (io_seek_src(pair, strm.seek_pos))
+				goto error;

+			// avail_in must be zero so that we will read new
+			// input.
+			strm.avail_in = 0;
+			break;
+
+		case LZMA_STREAM_END: {
+			lzma_end(&strm);
+			xfi->idx = idx;
+
+			// Calculate xfi->stream_padding.
+			lzma_index_iter iter;
+			lzma_index_iter_init(&iter, xfi->idx);
+			while (!lzma_index_iter_next(&iter,
+					LZMA_INDEX_ITER_STREAM))
+				xfi->stream_padding += iter.stream.padding;
+
+			return false;
+		}
+
+		default:
 			message_error("%s: %s", pair->src_name,
 					message_strm(ret));

 			// If the error was too low memory usage limit,
 			// show also how much memory would have been needed.
-			if (ret == LZMA_MEMLIMIT_ERROR) {
-				uint64_t needed = lzma_memusage(&strm);
-				if (UINT64_MAX - needed < memused)
-					needed = UINT64_MAX;
-				else
-					needed += memused;
-
-				message_mem_needed(V_ERROR, needed);
-			}
+			if (ret == LZMA_MEMLIMIT_ERROR)
+				message_mem_needed(V_ERROR,
+						lzma_memusage(&strm));

 			goto error;
 		}
-
-		// Decode the Stream Header and check that its Stream Flags
-		// match the Stream Footer.
-		pos -= footer_flags.backward_size + LZMA_STREAM_HEADER_SIZE;
-		if ((lzma_vli)(pos) < lzma_index_total_size(this_index)) {
-			message_error("%s: %s", pair->src_name,
-					message_strm(LZMA_DATA_ERROR));
-			goto error;
-		}
-
-		pos -= lzma_index_total_size(this_index);
-		if (io_pread(pair, &buf, LZMA_STREAM_HEADER_SIZE, pos))
-			goto error;
-
-		ret = lzma_stream_header_decode(&header_flags, buf.u8);
-		if (ret != LZMA_OK) {
-			message_error("%s: %s", pair->src_name,
-					message_strm(ret));
-			goto error;
-		}
-
-		ret = lzma_stream_flags_compare(&header_flags, &footer_flags);
-		if (ret != LZMA_OK) {
-			message_error("%s: %s", pair->src_name,
-					message_strm(ret));
-			goto error;
-		}
-
-		// Store the decoded Stream Flags into this_index. This is
-		// needed so that we can print which Check is used in each
-		// Stream.
-		ret = lzma_index_stream_flags(this_index, &footer_flags);
-		if (ret != LZMA_OK)
-			message_bug();
-
-		// Store also the size of the Stream Padding field. It is
-		// needed to show the offsets of the Streams correctly.
-		ret = lzma_index_stream_padding(this_index, stream_padding);
-		if (ret != LZMA_OK)
-			message_bug();
-
-		if (combined_index != NULL) {
-			// Append the earlier decoded Indexes
-			// after this_index.
-			ret = lzma_index_cat(
-					this_index, combined_index, NULL);
-			if (ret != LZMA_OK) {
-				message_error("%s: %s", pair->src_name,
-						message_strm(ret));
-				goto error;
-			}
-		}
-
-		combined_index = this_index;
-		this_index = NULL;
-
-		xfi->stream_padding += stream_padding;
-
-	} while (pos > 0);
-
-	lzma_end(&strm);
-
-	// All OK. Make combined_index available to the caller.
-	xfi->idx = combined_index;
-	return false;
+	}

 error:
-	// Something went wrong, free the allocated memory.
 	lzma_end(&strm);
-	lzma_index_end(combined_index, NULL);
-	lzma_index_end(this_index, NULL);
 	return true;
 }

@ -454,6 +488,10 @@ parse_block_header(file_pair *pair, const lzma_index_iter *iter,
 	// Check the Block Flags. These must be done before calling
 	// lzma_block_compressed_size(), because it overwrites
 	// block.compressed_size.
+	//
+	// NOTE: If you add new characters here, update the minimum number of
+	// columns in headings[HEADING_HEADERFLAGS] to match the number of
+	// characters used here.
 	bhi->flags[0] = block.compressed_size != LZMA_VLI_UNKNOWN
 			? 'c' : '-';
 	bhi->flags[1] = block.uncompressed_size != LZMA_VLI_UNKNOWN
@ -488,9 +526,7 @@ parse_block_header(file_pair *pair, const lzma_index_iter *iter,

 	case LZMA_DATA_ERROR:
 		// Free the memory allocated by lzma_block_header_decode().
-		for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i)
-			free(filters[i].options);
-
+		lzma_filters_free(filters, NULL);
 		goto data_error;

 	default:
@ -509,25 +545,42 @@ parse_block_header(file_pair *pair, const lzma_index_iter *iter,

 	// Determine the minimum XZ Utils version that supports this Block.
 	//
-	// Currently the only thing that 5.0.0 doesn't support is empty
-	// LZMA2 Block. This decoder bug was fixed in 5.0.2.
-	{
+	//   - ARM64 filter needs 5.4.0.
+	//
+	//   - 5.0.0 doesn't support empty LZMA2 streams and thus empty
+	//     Blocks that use LZMA2. This decoder bug was fixed in 5.0.2.
+	if (xfi->min_version < 50040002U) {
+		for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) {
+			if (filters[i].id == LZMA_FILTER_ARM64) {
+				xfi->min_version = 50040002U;
+				break;
+			}
+		}
+	}
+
+	if (xfi->min_version < 50000022U) {
 		size_t i = 0;
 		while (filters[i + 1].id != LZMA_VLI_UNKNOWN)
 			++i;

 		if (filters[i].id == LZMA_FILTER_LZMA2
-				&& iter->block.uncompressed_size == 0
-				&& xfi->min_version < 50000022U)
+				&& iter->block.uncompressed_size == 0)
 			xfi->min_version = 50000022U;
 	}

 	// Convert the filter chain to human readable form.
-	message_filters_to_str(bhi->filter_chain, filters, false);
+	const lzma_ret str_ret = lzma_str_from_filters(
+			&bhi->filter_chain, filters,
+			LZMA_STR_DECODER | LZMA_STR_GETOPT_LONG, NULL);

 	// Free the memory allocated by lzma_block_header_decode().
-	for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i)
-		free(filters[i].options);
+	lzma_filters_free(filters, NULL);
+
+	// Check if the stringification succeeded.
+	if (str_ret != LZMA_OK) {
+		message_error("%s: %s", pair->src_name, message_strm(str_ret));
+		return true;
+	}

 	return false;

@ -553,7 +606,7 @@ parse_check_value(file_pair *pair, const lzma_index_iter *iter)

 	// Locate and read the Check field.
 	const uint32_t size = lzma_check_size(iter->stream.flags->check);
-	const off_t offset = iter->block.compressed_file_offset
+	const uint64_t offset = iter->block.compressed_file_offset
 			+ iter->block.total_size - size;
 	io_buf buf;
 	if (io_pread(pair, &buf, size, offset))
@ -714,20 +767,20 @@ print_adv_helper(uint64_t stream_count, uint64_t block_count,
 	char checks_str[CHECKS_STR_SIZE];
 	get_check_names(checks_str, checks, true);

-	printf(_("  Streams:            %s\n"),
+	printf("  %-*s %s\n", COLON_STR(COLON_STR_STREAMS),
 			uint64_to_str(stream_count, 0));
-	printf(_("  Blocks:             %s\n"),
+	printf("  %-*s %s\n", COLON_STR(COLON_STR_BLOCKS),
 			uint64_to_str(block_count, 0));
-	printf(_("  Compressed size:    %s\n"),
+	printf("  %-*s %s\n", COLON_STR(COLON_STR_COMPRESSED_SIZE),
 			uint64_to_nicestr(compressed_size,
 				NICESTR_B, NICESTR_TIB, true, 0));
-	printf(_("  Uncompressed size:  %s\n"),
+	printf("  %-*s %s\n", COLON_STR(COLON_STR_UNCOMPRESSED_SIZE),
 			uint64_to_nicestr(uncompressed_size,
 				NICESTR_B, NICESTR_TIB, true, 0));
-	printf(_("  Ratio:              %s\n"),
+	printf("  %-*s %s\n", COLON_STR(COLON_STR_RATIO),
 			get_ratio(compressed_size, uncompressed_size));
-	printf(_("  Check:              %s\n"), checks_str);
-	printf(_("  Stream padding:     %s\n"),
+	printf("  %-*s %s\n", COLON_STR(COLON_STR_CHECK), checks_str);
+	printf("  %-*s %s\n", COLON_STR(COLON_STR_STREAM_PADDING),
 			uint64_to_nicestr(stream_padding,
 				NICESTR_B, NICESTR_TIB, true, 0));
 	return;
@ -752,13 +805,19 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)

 	// Print information about the Streams.
 	//
-	// TRANSLATORS: The second line is column headings. All except
-	// Check are right aligned; Check is left aligned. Test with
-	// "xz -lv foo.xz".
-	puts(_("  Streams:\n    Stream    Blocks"
-			"      CompOffset    UncompOffset"
-			"        CompSize      UncompSize  Ratio"
-			"  Check      Padding"));
+	// All except Check are right aligned; Check is left aligned.
+	// Test with "xz -lv foo.xz".
+	printf("  %s\n    %*s %*s %*s %*s %*s %*s  %*s  %-*s %*s\n",
+			_(colon_strs[COLON_STR_STREAMS]),
+			HEADING_STR(HEADING_STREAM),
+			HEADING_STR(HEADING_BLOCKS),
+			HEADING_STR(HEADING_COMPOFFSET),
+			HEADING_STR(HEADING_UNCOMPOFFSET),
+			HEADING_STR(HEADING_COMPSIZE),
+			HEADING_STR(HEADING_UNCOMPSIZE),
+			HEADING_STR(HEADING_RATIO),
+			HEADING_STR(HEADING_CHECK),
+			HEADING_STR(HEADING_PADDING));

 	lzma_index_iter iter;
 	lzma_index_iter_init(&iter, xfi->idx);
@ -771,10 +830,18 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
 			uint64_to_str(iter.stream.uncompressed_offset, 3),
 		};
 		printf("    %*s %*s %*s %*s ",
-				tuklib_mbstr_fw(cols1[0], 6), cols1[0],
-				tuklib_mbstr_fw(cols1[1], 9), cols1[1],
-				tuklib_mbstr_fw(cols1[2], 15), cols1[2],
-				tuklib_mbstr_fw(cols1[3], 15), cols1[3]);
+			tuklib_mbstr_fw(cols1[0],
+				headings[HEADING_STREAM].columns),
+			cols1[0],
+			tuklib_mbstr_fw(cols1[1],
+				headings[HEADING_BLOCKS].columns),
+			cols1[1],
+			tuklib_mbstr_fw(cols1[2],
+				headings[HEADING_COMPOFFSET].columns),
+			cols1[2],
+			tuklib_mbstr_fw(cols1[3],
+				headings[HEADING_UNCOMPOFFSET].columns),
+			cols1[3]);

 		const char *cols2[5] = {
 			uint64_to_str(iter.stream.compressed_size, 0),
@ -785,11 +852,21 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
 			uint64_to_str(iter.stream.padding, 2),
 		};
 		printf("%*s %*s  %*s  %-*s %*s\n",
-				tuklib_mbstr_fw(cols2[0], 15), cols2[0],
-				tuklib_mbstr_fw(cols2[1], 15), cols2[1],
-				tuklib_mbstr_fw(cols2[2], 5), cols2[2],
-				tuklib_mbstr_fw(cols2[3], 10), cols2[3],
-				tuklib_mbstr_fw(cols2[4], 7), cols2[4]);
+			tuklib_mbstr_fw(cols2[0],
+				headings[HEADING_COMPSIZE].columns),
+			cols2[0],
+			tuklib_mbstr_fw(cols2[1],
+				headings[HEADING_UNCOMPSIZE].columns),
+			cols2[1],
+			tuklib_mbstr_fw(cols2[2],
+				headings[HEADING_RATIO].columns),
+			cols2[2],
+			tuklib_mbstr_fw(cols2[3],
+				headings[HEADING_CHECK].columns),
+			cols2[3],
+			tuklib_mbstr_fw(cols2[4],
+				headings[HEADING_PADDING].columns),
+			cols2[4]);

 		// Update the maximum Check size.
 		if (lzma_check_size(iter.stream.flags->check) > check_max)
@ -799,32 +876,47 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
 	// Cache the verbosity level to a local variable.
 	const bool detailed = message_verbosity_get() >= V_DEBUG;

-	// Information collected from Block Headers
-	block_header_info bhi;
-
 	// Print information about the Blocks but only if there is
 	// at least one Block.
 	if (lzma_index_block_count(xfi->idx) > 0) {
-		// Calculate the width of the CheckVal field.
-		const int checkval_width = my_max(8, 2 * check_max);
+		// Calculate the width of the CheckVal column. This can be
+		// used as is as the field width for printf() when printing
+		// the actual check value as it is hexadecimal. However, to
+		// print the column heading, further calculation is needed
+		// to handle a translated string (it's done a few lines later).
+		assert(check_max <= LZMA_CHECK_SIZE_MAX);
+		const int checkval_width = my_max(
+				headings[HEADING_CHECKVAL].columns,
+				(int)(2 * check_max));

-		// TRANSLATORS: The second line is column headings. All
-		// except Check are right aligned; Check is left aligned.
-		printf(_("  Blocks:\n    Stream     Block"
-			"      CompOffset    UncompOffset"
-			"       TotalSize      UncompSize  Ratio  Check"));
+		// All except Check are right aligned; Check is left aligned.
+		printf("  %s\n    %*s %*s %*s %*s %*s %*s  %*s  %-*s",
+				_(colon_strs[COLON_STR_BLOCKS]),
+				HEADING_STR(HEADING_STREAM),
+				HEADING_STR(HEADING_BLOCK),
+				HEADING_STR(HEADING_COMPOFFSET),
+				HEADING_STR(HEADING_UNCOMPOFFSET),
+				HEADING_STR(HEADING_TOTALSIZE),
+				HEADING_STR(HEADING_UNCOMPSIZE),
+				HEADING_STR(HEADING_RATIO),
+				detailed ? headings[HEADING_CHECK].fw : 1,
+				_(headings[HEADING_CHECK].str));

 		if (detailed) {
-			// TRANSLATORS: These are additional column headings
-			// for the most verbose listing mode. CheckVal
-			// (Check value), Flags, and Filters are left aligned.
-			// Header (Block Header Size), CompSize, and MemUsage
-			// are right aligned. %*s is replaced with 0-120
-			// spaces to make the CheckVal column wide enough.
-			// Test with "xz -lvv foo.xz".
-			printf(_("      CheckVal %*s Header  Flags        "
-					"CompSize    MemUsage  Filters"),
-					checkval_width - 8, "");
+			// CheckVal (Check value), Flags, and Filters are
+			// left aligned. Block Header Size, CompSize, and
+			// MemUsage are right aligned. Test with
+			// "xz -lvv foo.xz".
+			printf(" %-*s  %*s  %-*s %*s %*s  %s",
+				headings[HEADING_CHECKVAL].fw
+					+ checkval_width
+					- headings[HEADING_CHECKVAL].columns,
+				_(headings[HEADING_CHECKVAL].str),
+				HEADING_STR(HEADING_HEADERSIZE),
+				HEADING_STR(HEADING_HEADERFLAGS),
+				HEADING_STR(HEADING_COMPSIZE),
+				HEADING_STR(HEADING_MEMUSAGE),
+				_(headings[HEADING_FILTERS].str));
 		}

 		putchar('\n');
@ -833,8 +925,11 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)

 		// Iterate over the Blocks.
 		while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_BLOCK)) {
+			// If in detailed mode, collect the information from
+			// Block Header before starting to print the next line.
+			block_header_info bhi = BLOCK_HEADER_INFO_INIT;
 			if (detailed && parse_details(pair, &iter, &bhi, xfi))
-					return true;
+				return true;

 			const char *cols1[4] = {
 				uint64_to_str(iter.stream.number, 0),
@ -846,10 +941,18 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
 					iter.block.uncompressed_file_offset, 3)
 			};
 			printf("    %*s %*s %*s %*s ",
-				tuklib_mbstr_fw(cols1[0], 6), cols1[0],
-				tuklib_mbstr_fw(cols1[1], 9), cols1[1],
-				tuklib_mbstr_fw(cols1[2], 15), cols1[2],
-				tuklib_mbstr_fw(cols1[3], 15), cols1[3]);
+				tuklib_mbstr_fw(cols1[0],
+					headings[HEADING_STREAM].columns),
+				cols1[0],
+				tuklib_mbstr_fw(cols1[1],
+					headings[HEADING_BLOCK].columns),
+				cols1[1],
+				tuklib_mbstr_fw(cols1[2],
+					headings[HEADING_COMPOFFSET].columns),
+				cols1[2],
+				tuklib_mbstr_fw(cols1[3], headings[
+					HEADING_UNCOMPOFFSET].columns),
+				cols1[3]);

 			const char *cols2[4] = {
 				uint64_to_str(iter.block.total_size, 0),
@ -860,11 +963,18 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
 				_(check_names[iter.stream.flags->check])
 			};
 			printf("%*s %*s  %*s  %-*s",
-				tuklib_mbstr_fw(cols2[0], 15), cols2[0],
-				tuklib_mbstr_fw(cols2[1], 15), cols2[1],
-				tuklib_mbstr_fw(cols2[2], 5), cols2[2],
-				tuklib_mbstr_fw(cols2[3], detailed ? 11 : 1),
-					cols2[3]);
+				tuklib_mbstr_fw(cols2[0],
+					headings[HEADING_TOTALSIZE].columns),
+				cols2[0],
+				tuklib_mbstr_fw(cols2[1],
+					headings[HEADING_UNCOMPSIZE].columns),
+				cols2[1],
+				tuklib_mbstr_fw(cols2[2],
+					headings[HEADING_RATIO].columns),
+				cols2[2],
+				tuklib_mbstr_fw(cols2[3], detailed
+					? headings[HEADING_CHECK].columns : 1),
+				cols2[3]);

 			if (detailed) {
 				const lzma_vli compressed_size
@ -885,25 +995,35 @@ print_info_adv(xz_file_info *xfi, file_pair *pair)
 				};
 				// Show MiB for memory usage, because it
 				// is the only size which is not in bytes.
-				printf("%-*s  %*s  %-5s %*s %*s MiB  %s",
+				printf(" %-*s  %*s  %-*s %*s %*s MiB  %s",
 					checkval_width, cols3[0],
-					tuklib_mbstr_fw(cols3[1], 6), cols3[1],
+					tuklib_mbstr_fw(cols3[1], headings[
+						HEADING_HEADERSIZE].columns),
+					cols3[1],
+					tuklib_mbstr_fw(cols3[2], headings[
+						HEADING_HEADERFLAGS].columns),
 					cols3[2],
-					tuklib_mbstr_fw(cols3[3], 15),
-						cols3[3],
-					tuklib_mbstr_fw(cols3[4], 7), cols3[4],
+					tuklib_mbstr_fw(cols3[3], headings[
+						HEADING_COMPSIZE].columns),
+					cols3[3],
+					tuklib_mbstr_fw(cols3[4], headings[
+						HEADING_MEMUSAGE].columns - 4),
+					cols3[4],
 					cols3[5]);
 			}

 			putchar('\n');
+			block_header_info_end(&bhi);
 		}
 	}

 	if (detailed) {
-		printf(_("  Memory needed:      %s MiB\n"), uint64_to_str(
+		printf("  %-*s %s MiB\n", COLON_STR(COLON_STR_MEMORY_NEEDED),
+				uint64_to_str(
 				round_up_to_mib(xfi->memusage_max), 0));
-		printf(_("  Sizes in headers:   %s\n"),
+		printf("  %-*s %s\n", COLON_STR(COLON_STR_SIZES_IN_HEADERS),
 				xfi->all_have_sizes ? _("Yes") : _("No"));
+		//printf("  %-*s %s\n", COLON_STR(COLON_STR_MINIMUM_XZ_VERSION),
 		printf(_("  Minimum XZ Utils version: %s\n"),
 				xz_ver_to_str(xfi->min_version));
 	}
@ -951,9 +1071,9 @@ print_info_robot(xz_file_info *xfi, file_pair *pair)
 				iter.stream.padding);

 		lzma_index_iter_rewind(&iter);
-		block_header_info bhi;

 		while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_BLOCK)) {
+			block_header_info bhi = BLOCK_HEADER_INFO_INIT;
 			if (message_verbosity_get() >= V_DEBUG
 					&& parse_details(
 						pair, &iter, &bhi, xfi))
@ -984,6 +1104,7 @@ print_info_robot(xz_file_info *xfi, file_pair *pair)
 						bhi.filter_chain);

 			putchar('\n');
+			block_header_info_end(&bhi);
 		}
 	}

@ -1068,17 +1189,19 @@ print_totals_adv(void)
 {
 	putchar('\n');
 	puts(_("Totals:"));
-	printf(_("  Number of files:    %s\n"),
+	printf("  %-*s %s\n", COLON_STR(COLON_STR_NUMBER_OF_FILES),
 			uint64_to_str(totals.files, 0));
 	print_adv_helper(totals.streams, totals.blocks,
 			totals.compressed_size, totals.uncompressed_size,
 			totals.checks, totals.stream_padding);

 	if (message_verbosity_get() >= V_DEBUG) {
-		printf(_("  Memory needed:      %s MiB\n"), uint64_to_str(
+		printf("  %-*s %s MiB\n", COLON_STR(COLON_STR_MEMORY_NEEDED),
+				uint64_to_str(
 				round_up_to_mib(totals.memusage_max), 0));
-		printf(_("  Sizes in headers:   %s\n"),
+		printf("  %-*s %s\n", COLON_STR(COLON_STR_SIZES_IN_HEADERS),
 				totals.all_have_sizes ? _("Yes") : _("No"));
+		//printf("  %-*s %s\n", COLON_STR(COLON_STR_MINIMUM_XZ_VERSION),
 		printf(_("  Minimum XZ Utils version: %s\n"),
 				xz_ver_to_str(totals.min_version));
 	}
@ -1154,6 +1277,8 @@ list_file(const char *filename)
 		return;
 	}

+	init_field_widths();
+
 	// Unset opt_stdout so that io_open_src() won't accept special files.
 	// Set opt_force so that io_open_src() will follow symlinks.
 	opt_stdout = false;
--- a/src/xz/main.c
+++ b/src/xz/main.c
@ -142,6 +142,20 @@ read_name(const args_info *args)
 int
 main(int argc, char **argv)
 {
+#ifdef HAVE_PLEDGE
+	// OpenBSD's pledge(2) sandbox
+	//
+	// Unconditionally enable sandboxing with fairly relaxed promises.
+	// This is still way better than having no sandbox at all. :-)
+	// More strict promises will be made later in file_io.c if possible.
+	if (pledge("stdio rpath wpath cpath fattr", "")) {
+		// Don't translate the string or use message_fatal() as
+		// those haven't been initialized yet.
+		fprintf(stderr, "%s: Failed to enable the sandbox\n", argv[0]);
+		return E_ERROR;
+	}
+#endif
+
 #if defined(_WIN32) && !defined(__CYGWIN__)
 	InitializeCriticalSection(&exit_status_cs);
 #endif
--- a/src/xz/message.c
+++ b/src/xz/message.c
@ -829,6 +829,15 @@ message_strm(lzma_ret code)
 	case LZMA_STREAM_END:
 	case LZMA_GET_CHECK:
 	case LZMA_PROG_ERROR:
+	case LZMA_SEEK_NEEDED:
+	case LZMA_RET_INTERNAL1:
+	case LZMA_RET_INTERNAL2:
+	case LZMA_RET_INTERNAL3:
+	case LZMA_RET_INTERNAL4:
+	case LZMA_RET_INTERNAL5:
+	case LZMA_RET_INTERNAL6:
+	case LZMA_RET_INTERNAL7:
+	case LZMA_RET_INTERNAL8:
 		// Without "default", compiler will warn if new constants
 		// are added to lzma_ret, it is not too easy to forget to
 		// add the new constants to this function.
@ -891,167 +900,20 @@ message_mem_needed(enum message_verbosity v, uint64_t memusage)
 }


-/// \brief      Convert uint32_t to a nice string for --lzma[12]=dict=SIZE
-///
-/// The idea is to use KiB or MiB suffix when possible.
-static const char *
-uint32_to_optstr(uint32_t num)
-{
-	static char buf[16];
-
-	if ((num & ((UINT32_C(1) << 20) - 1)) == 0)
-		snprintf(buf, sizeof(buf), "%" PRIu32 "MiB", num >> 20);
-	else if ((num & ((UINT32_C(1) << 10) - 1)) == 0)
-		snprintf(buf, sizeof(buf), "%" PRIu32 "KiB", num >> 10);
-	else
-		snprintf(buf, sizeof(buf), "%" PRIu32, num);
-
-	return buf;
-}
-
-
-extern void
-message_filters_to_str(char buf[FILTERS_STR_SIZE],
-		const lzma_filter *filters, bool all_known)
-{
-	char *pos = buf;
-	size_t left = FILTERS_STR_SIZE;
-
-	for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) {
-		// Add the dashes for the filter option. A space is
-		// needed after the first and later filters.
-		my_snprintf(&pos, &left, "%s", i == 0 ? "--" : " --");
-
-		switch (filters[i].id) {
-		case LZMA_FILTER_LZMA1:
-		case LZMA_FILTER_LZMA2: {
-			const lzma_options_lzma *opt = filters[i].options;
-			const char *mode = NULL;
-			const char *mf = NULL;
-
-			if (all_known) {
-				switch (opt->mode) {
-				case LZMA_MODE_FAST:
-					mode = "fast";
-					break;
-
-				case LZMA_MODE_NORMAL:
-					mode = "normal";
-					break;
-
-				default:
-					mode = "UNKNOWN";
-					break;
-				}
-
-				switch (opt->mf) {
-				case LZMA_MF_HC3:
-					mf = "hc3";
-					break;
-
-				case LZMA_MF_HC4:
-					mf = "hc4";
-					break;
-
-				case LZMA_MF_BT2:
-					mf = "bt2";
-					break;
-
-				case LZMA_MF_BT3:
-					mf = "bt3";
-					break;
-
-				case LZMA_MF_BT4:
-					mf = "bt4";
-					break;
-
-				default:
-					mf = "UNKNOWN";
-					break;
-				}
-			}
-
-			// Add the filter name and dictionary size, which
-			// is always known.
-			my_snprintf(&pos, &left, "lzma%c=dict=%s",
-					filters[i].id == LZMA_FILTER_LZMA2
-						? '2' : '1',
-					uint32_to_optstr(opt->dict_size));
-
-			// With LZMA1 also lc/lp/pb are known when
-			// decompressing, but this function is never
-			// used to print information about .lzma headers.
-			assert(filters[i].id == LZMA_FILTER_LZMA2
-					|| all_known);
-
-			// Print the rest of the options, which are known
-			// only when compressing.
-			if (all_known)
-				my_snprintf(&pos, &left,
-					",lc=%" PRIu32 ",lp=%" PRIu32
-					",pb=%" PRIu32
-					",mode=%s,nice=%" PRIu32 ",mf=%s"
-					",depth=%" PRIu32,
-					opt->lc, opt->lp, opt->pb,
-					mode, opt->nice_len, mf, opt->depth);
-			break;
-		}
-
-		case LZMA_FILTER_X86:
-		case LZMA_FILTER_POWERPC:
-		case LZMA_FILTER_IA64:
-		case LZMA_FILTER_ARM:
-		case LZMA_FILTER_ARMTHUMB:
-		case LZMA_FILTER_SPARC: {
-			static const char bcj_names[][9] = {
-				"x86",
-				"powerpc",
-				"ia64",
-				"arm",
-				"armthumb",
-				"sparc",
-			};
-
-			const lzma_options_bcj *opt = filters[i].options;
-			my_snprintf(&pos, &left, "%s", bcj_names[filters[i].id
-					- LZMA_FILTER_X86]);
-
-			// Show the start offset only when really needed.
-			if (opt != NULL && opt->start_offset != 0)
-				my_snprintf(&pos, &left, "=start=%" PRIu32,
-						opt->start_offset);
-
-			break;
-		}
-
-		case LZMA_FILTER_DELTA: {
-			const lzma_options_delta *opt = filters[i].options;
-			my_snprintf(&pos, &left, "delta=dist=%" PRIu32,
-					opt->dist);
-			break;
-		}
-
-		default:
-			// This should be possible only if liblzma is
-			// newer than the xz tool.
-			my_snprintf(&pos, &left, "UNKNOWN");
-			break;
-		}
-	}
-
-	return;
-}
-
-
 extern void
 message_filters_show(enum message_verbosity v, const lzma_filter *filters)
 {
 	if (v > verbosity)
 		return;

-	char buf[FILTERS_STR_SIZE];
-	message_filters_to_str(buf, filters, true);
+	char *buf;
+	const lzma_ret ret = lzma_str_from_filters(&buf, filters,
+			LZMA_STR_ENCODER | LZMA_STR_GETOPT_LONG, NULL);
+	if (ret != LZMA_OK)
+		message_fatal("%s", message_strm(ret));
+
 	fprintf(stderr, _("%s: Filter chain: %s\n"), progname, buf);
+	free(buf);
 	return;
 }

@ -1134,7 +996,7 @@ message_help(bool long_help)
 		puts(_("\n Basic file format and compression options:\n"));
 		puts(_(
 "  -F, --format=FMT    file format to encode or decode; possible values are\n"
-"                      `auto' (default), `xz', `lzma', and `raw'\n"
+"                      `auto' (default), `xz', `lzma', `lzip', and `raw'\n"
 "  -C, --check=CHECK   integrity check type: `none' (use with caution),\n"
 "                      `crc32', `crc64' (default), or `sha256'"));
 		puts(_(
@ -1171,9 +1033,11 @@ message_help(bool long_help)
 		puts(_( // xgettext:no-c-format
 "      --memlimit-compress=LIMIT\n"
 "      --memlimit-decompress=LIMIT\n"
+"      --memlimit-mt-decompress=LIMIT\n"
 "  -M, --memlimit=LIMIT\n"
 "                      set memory usage limit for compression, decompression,\n"
-"                      or both; LIMIT is in bytes, % of RAM, or 0 for defaults"));
+"                      threaded decompression, or all of these; LIMIT is in\n"
+"                      bytes, % of RAM, or 0 for defaults"));

 		puts(_(
 "      --no-adjust     if compression settings exceed the memory usage limit,\n"
@ -1208,10 +1072,11 @@ message_help(bool long_help)
 		puts(_(
 "\n"
 "  --x86[=OPTS]        x86 BCJ filter (32-bit and 64-bit)\n"
+"  --arm[=OPTS]        ARM BCJ filter\n"
+"  --armthumb[=OPTS]   ARM-Thumb BCJ filter\n"
+"  --arm64[=OPTS]      ARM64 BCJ filter\n"
 "  --powerpc[=OPTS]    PowerPC BCJ filter (big endian only)\n"
 "  --ia64[=OPTS]       IA-64 (Itanium) BCJ filter\n"
-"  --arm[=OPTS]        ARM BCJ filter (little endian only)\n"
-"  --armthumb[=OPTS]   ARM-Thumb BCJ filter (little endian only)\n"
 "  --sparc[=OPTS]      SPARC BCJ filter\n"
 "                      Valid OPTS for all BCJ filters:\n"
 "                        start=NUM  start offset for conversions (default=0)"));
--- a/src/xz/message.h
+++ b/src/xz/message.h
@ -90,22 +90,6 @@ extern const char *message_strm(lzma_ret code);
 extern void message_mem_needed(enum message_verbosity v, uint64_t memusage);


-/// Buffer size for message_filters_to_str()
-#define FILTERS_STR_SIZE 512
-
-
-/// \brief      Get the filter chain as a string
-///
-/// \param      buf         Pointer to caller allocated buffer to hold
-///                         the filter chain string
-/// \param      filters     Pointer to the filter chain
-/// \param      all_known   If true, all filter options are printed.
-///                         If false, only the options that get stored
-///                         into .xz headers are printed.
-extern void message_filters_to_str(char buf[FILTERS_STR_SIZE],
-		const lzma_filter *filters, bool all_known);
-
-
 /// Print the filter chain.
 extern void message_filters_show(
 		enum message_verbosity v, const lzma_filter *filters);
--- a/src/xz/options.c
+++ b/src/xz/options.c
@ -354,10 +354,5 @@ options_lzma(const char *str)
 	if (options->lc + options->lp > LZMA_LCLP_MAX)
 		message_fatal(_("The sum of lc and lp must not exceed 4"));

-	const uint32_t nice_len_min = options->mf & 0x0F;
-	if (options->nice_len < nice_len_min)
-		message_fatal(_("The selected match finder requires at "
-				"least nice=%" PRIu32), nice_len_min);
-
 	return options;
 }
--- a/src/xz/private.h
+++ b/src/xz/private.h
@ -45,7 +45,7 @@
 #	define STDERR_FILENO (fileno(stderr))
 #endif

-#ifdef HAVE_CAPSICUM
+#if defined(HAVE_CAPSICUM) || defined(HAVE_PLEDGE)
 #	define ENABLE_SANDBOX 1
 #endif

--- a/src/xz/suffix.c
+++ b/src/xz/suffix.c
@ -119,9 +119,10 @@ uncompressed_name(const char *src_name, const size_t src_len)
 #ifdef __DJGPP__
 		{ ".lzm",   "" },
 #endif
-		{ ".tlz",   ".tar" },
-		// { ".gz",    "" },
-		// { ".tgz",   ".tar" },
+		{ ".tlz",   ".tar" }, // Both .tar.lzma and .tar.lz
+#ifdef HAVE_LZIP_DECODER
+		{ ".lz",    "" },
+#endif
 	};

 	const char *new_suffix = "";
@ -208,12 +209,15 @@ compressed_name(const char *src_name, size_t src_len)
 #endif
 			".tlz",
 			NULL
-/*
+#ifdef HAVE_LZIP_DECODER
+		// This is needed to keep the table indexing in sync with
+		// enum format_type from coder.h.
 		}, {
-			".gz",
-			".tgz",
-			NULL
+/*
+			".lz",
 */
+			NULL
+#endif
 		}, {
 			// --format=raw requires specifying the suffix
 			// manually or using stdout.
@ -221,8 +225,11 @@ compressed_name(const char *src_name, size_t src_len)
 		}
 	};

-	// args.c ensures this.
+	// args.c ensures these.
 	assert(opt_format != FORMAT_AUTO);
+#ifdef HAVE_LZIP_DECODER
+	assert(opt_format != FORMAT_LZIP);
+#endif

 	const size_t format = opt_format - 1;
 	const char *const *suffixes = all_suffixes[format];
@ -299,9 +306,11 @@ compressed_name(const char *src_name, size_t src_len)
 			// xz foo.tar          -> foo.txz
 			// xz -F lzma foo.tar  -> foo.tlz
 			static const char *const tar_suffixes[] = {
-				".txz",
-				".tlz",
-				// ".tgz",
+				".txz", // .tar.xz
+				".tlz", // .tar.lzma
+/*
+				".tlz", // .tar.lz
+*/
 			};
 			suffix = tar_suffixes[format];
 			suffix_len = 4;
--- a/src/xz/util.c
+++ b/src/xz/util.c
@ -260,18 +260,6 @@ my_snprintf(char **pos, size_t *left, const char *fmt, ...)
 }


-extern bool
-is_empty_filename(const char *filename)
-{
-	if (filename[0] == '\0') {
-		message_error(_("Empty filename, skipping"));
-		return true;
-	}
-
-	return false;
-}
-
-
 extern bool
 is_tty_stdin(void)
 {
--- a/src/xz/util.h
+++ b/src/xz/util.h
@ -105,10 +105,6 @@ extern void my_snprintf(char **pos, size_t *left, const char *fmt, ...)
 		lzma_attribute((__format__(__printf__, 3, 4)));


-/// \brief      Check if filename is empty and print an error message
-extern bool is_empty_filename(const char *filename);
-
-
 /// \brief      Test if stdin is a terminal
 ///
 /// If stdin is a terminal, an error message is printed and exit status set
--- a/src/xz/xz.1
+++ b/src/xz/xz.1
@ -5,7 +5,7 @@
 .\" This file has been put into the public domain.
 .\" You can do whatever you want with this file.
 .\"
-.TH XZ 1 "2022-10-25" "Tukaani" "XZ Utils"
+.TH XZ 1 "2022-12-01" "Tukaani" "XZ Utils"
 .
 .SH NAME
 xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files
@ -62,6 +62,11 @@ format, but the legacy
 format used by LZMA Utils and
 raw compressed streams with no container format headers
 are also supported.
+In addition, decompression of the
+.B .lz
+format used by
+.B lzip
+is supported.
 .PP
 .B xz
 compresses or decompresses each
@ -102,9 +107,10 @@ or
 is appended to the source filename to get the target filename.
 .IP \(bu 3
 When decompressing, the
-.B .xz
+.BR .xz ,
+.BR .lzma ,
 or
-.B .lzma
+.B .lz
 suffix is removed from the filename to get the target filename.
 .B xz
 also recognizes the suffixes
@ -158,8 +164,9 @@ doesn't have a suffix of any of the supported file formats
 .RB ( .xz ,
 .BR .txz ,
 .BR .lzma ,
+.BR .tlz ,
 or
-.BR .tlz ).
+.BR .lz ).
 .PP
 After successfully compressing or decompressing the
 .IR file ,
@ -507,8 +514,9 @@ in addition to files with the
 .BR .xz ,
 .BR .txz ,
 .BR .lzma ,
+.BR .tlz ,
 or
-.B .tlz
+.B .lz
 suffix.
 If the source file has the suffix
 .IR .suf ,
@ -575,6 +583,34 @@ The alternative name
 .B alone
 is provided for backwards compatibility with LZMA Utils.
 .TP
+.B lzip
+Accept only
+.B .lz
+files when decompressing.
+Compression is not supported.
+.IP ""
+The
+.B .lz
+format version 0 and the unextended version 1 are supported.
+Version 0 files were produced by
+.B lzip
+1.3 and older.
+Such files aren't common but may be found from file archives
+as a few source packages were released in this format.
+People might have old personal files in this format too.
+Decompression support for the format version 0 was removed in
+.B lzip
+1.18.
+.IP ""
+.B lzip
+1.4 and later create files in the format version 1.
+The sync flush marker extension to the format version 1 was added in
+.B lzip
+1.6.
+This extension is rarely used and isn't supported by
+.B xz
+(diagnosed as corrupt input).
+.TP
 .B raw
 Compress or uncompress a raw stream (no headers).
 This is meant for advanced users only.
@ -965,15 +1001,28 @@ the last one takes effect.
 If the compression settings exceed the
 .IR limit ,
 .B xz
-will adjust the settings downwards so that
+will attempt to adjust the settings downwards so that
 the limit is no longer exceeded and display a notice that
 automatic adjustment was done.
-Such adjustments are not made when compressing with
+The adjustments are done in this order:
+reducing the number of threads,
+switching to single-threaded mode
+if even one thread in multi-threaded mode exceeds the
+.IR limit ,
+and finally reducing the LZMA2 dictionary size.
+.IP ""
+When compressing with
 .B \-\-format=raw
 or if
 .B \-\-no\-adjust
-has been specified.
-In those cases, an error is displayed and
+has been specified,
+only the number of threads may be reduced
+since it can be done without affecting the compressed output.
+.IP ""
+If the
+.I limit
+cannot be met even with the adjustments described above,
+an error is displayed and
 .B xz
 will exit with exit status 1.
 .IP ""
@ -1012,16 +1061,6 @@ This is currently equivalent to setting the
 to
 .B max
 (no memory usage limit).
-Once multithreading support has been implemented,
-there may be a difference between
-.B 0
-and
-.B max
-for the multithreaded case, so it is recommended to use
-.B 0
-instead of
-.B max
-until the details have been decided.
 .RE
 .IP ""
 For 32-bit
@ -1064,16 +1103,80 @@ See
 for possible ways to specify the
 .IR limit .
 .TP
+.BI \-\-memlimit\-mt\-decompress= limit
+Set a memory usage limit for multi-threaded decompression.
+This can only affect the number of threads;
+this will never make
+.B xz
+refuse to decompress a file.
+If
+.I limit
+is too low to allow any multi-threading, the
+.I limit
+is ignored and
+.B xz
+will continue in single-threaded mode.
+Note that if also
+.B \-\-memlimit\-decompress
+is used,
+it will always apply to both single-threaded and multi-threaded modes,
+and so the effective
+.I limit
+for multi-threading will never be higher than the limit set with
+.BR \-\-memlimit\-decompress .
+.IP ""
+In contrast to the other memory usage limit options,
+.BI \-\-memlimit\-mt\-decompress= limit
+has a system-specific default
+.IR limit .
+.B "xz \-\-info\-memory"
+can be used to see the current value.
+.IP ""
+This option and its default value exist
+because without any limit the threaded decompressor
+could end up allocating an insane amount of memory with some input files.
+If the default
+.I limit
+is too low on your system,
+feel free to increase the
+.I limit
+but never set it to a value larger than the amount of usable RAM
+as with appropriate input files
+.B xz
+will attempt to use that amount of memory
+even with a low number of threads.
+Running out of memory or swapping
+will not improve decompression performance.
+.IP ""
+See
+.BI \-\-memlimit\-compress= limit
+for possible ways to specify the
+.IR limit .
+Setting
+.I limit
+to
+.B 0
+resets the
+.I limit
+to the default system-specific value.
+.IP ""
+.TP
 \fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit
 This is equivalent to specifying
 .BI \-\-memlimit\-compress= limit
-\fB\-\-memlimit\-decompress=\fIlimit\fR.
+.BI \-\-memlimit-decompress= limit
+\fB\-\-memlimit\-mt\-decompress=\fIlimit\fR.
 .TP
 .B \-\-no\-adjust
-Display an error and exit if the compression settings exceed
-the memory usage limit.
-The default is to adjust the settings downwards so
-that the memory usage limit is not exceeded.
+Display an error and exit if the memory usage limit cannot be
+met without adjusting settings that affect the compressed output.
+That is, this prevents
+.B xz
+from switching the encoder from multi-threaded mode to single-threaded mode
+and from reducing the LZMA2 dictionary size.
+Even when this option is used the number of threads may be reduced
+to meet the memory usage limit as that won't affect the compressed output.
+.IP ""
 Automatic adjusting is always disabled when creating raw streams
 .RB ( \-\-format=raw ).
 .TP
@ -1085,13 +1188,66 @@ to a special value
 .B 0
 makes
 .B xz
-use as many threads as there are CPU cores on the system.
-The actual number of threads can be less than
+use up to as many threads as the processor(s) on the system support.
+The actual number of threads can be fewer than
 .I threads
 if the input file is not big enough
 for threading with the given settings or
 if using more threads would exceed the memory usage limit.
 .IP ""
+The single-threaded and multi-threaded compressors produce different output.
+Single-threaded compressor will give the smallest file size but
+only the output from the multi-threaded compressor can be decompressed
+using multiple threads.
+Setting
+.I threads
+to
+.B 1
+will use the single-threaded mode.
+Setting
+.I threads
+to any other value, including
+.BR 0 ,
+will use the multi-threaded compressor
+even if the system supports only one hardware thread.
+.RB ( xz
+5.2.x
+used single-threaded mode in this situation.)
+.IP ""
+To use multi-threaded mode with only one thread, set
+.I threads
+to
+.BR +1 .
+The
+.B +
+prefix has no effect with values other than
+.BR 1 .
+A memory usage limit can still make
+.B xz
+switch to single-threaded mode unless
+.B \-\-no\-adjust
+is used.
+Support for the
+.B +
+prefix was added in
+.B xz
+5.4.0.
+.IP ""
+If an automatic number of threads has been requested and
+no memory usage limit has been specified,
+then a system-specific default soft limit will be used to possibly
+limit the number of threads.
+It is a soft limit in sense that it is ignored
+if the number of threads becomes one,
+thus a soft limit will never stop
+.B xz
+from compressing or decompressing.
+This default soft limit will not make
+.B xz
+switch from multi-threaded mode to single-threaded mode.
+The active limits can be seen with
+.BR "xz \-\-info\-memory" .
+.IP ""
 Currently the only threading method is to split the input into
 blocks and compress them independently from each other.
 The default block size depends on the compression level and
@ -1099,13 +1255,13 @@ can be overridden with the
 .BI \-\-block\-size= size
 option.
 .IP ""
-Threaded decompression hasn't been implemented yet.
-It will only work on files that contain multiple blocks
-with size information in block headers.
-All files compressed in multi-threaded mode meet this condition,
+Threaded decompression only works on files that contain
+multiple blocks with size information in block headers.
+All large enough files compressed in multi-threaded mode
+meet this condition,
 but files compressed in single-threaded mode don't even if
 .BI \-\-block\-size= size
-is used.
+has been used.
 .
 .SS "Custom compressor filter chains"
 A custom filter chain allows specifying
@ -1537,14 +1693,16 @@ and
 \fB\-\-x86\fR[\fB=\fIoptions\fR]
 .PD 0
 .TP
-\fB\-\-powerpc\fR[\fB=\fIoptions\fR]
-.TP
-\fB\-\-ia64\fR[\fB=\fIoptions\fR]
-.TP
 \fB\-\-arm\fR[\fB=\fIoptions\fR]
 .TP
 \fB\-\-armthumb\fR[\fB=\fIoptions\fR]
 .TP
+\fB\-\-arm64\fR[\fB=\fIoptions\fR]
+.TP
+\fB\-\-powerpc\fR[\fB=\fIoptions\fR]
+.TP
+\fB\-\-ia64\fR[\fB=\fIoptions\fR]
+.TP
 \fB\-\-sparc\fR[\fB=\fIoptions\fR]
 .PD
 Add a branch/call/jump (BCJ) filter to the filter chain.
@ -1553,7 +1711,7 @@ in the filter chain.
 .IP ""
 A BCJ filter converts relative addresses in
 the machine code to their absolute counterparts.
-This doesn't change the size of the data,
+This doesn't change the size of the data
 but it increases redundancy,
 which can help LZMA2 to produce 0\(en15\ % smaller
 .B .xz
@ -1562,21 +1720,8 @@ The BCJ filters are always reversible,
 so using a BCJ filter for wrong type of data
 doesn't cause any data loss, although it may make
 the compression ratio slightly worse.
-.IP ""
-It is fine to apply a BCJ filter on a whole executable;
-there's no need to apply it only on the executable section.
-Applying a BCJ filter on an archive that contains both executable
-and non-executable files may or may not give good results,
-so it generally isn't good to blindly apply a BCJ filter when
-compressing binary packages for distribution.
-.IP ""
-These BCJ filters are very fast and
-use insignificant amount of memory.
-If a BCJ filter improves compression ratio of a file,
-it can improve decompression speed at the same time.
-This is because, on the same hardware,
-the decompression speed of LZMA2 is roughly
-a fixed number of bytes of compressed data per second.
+The BCJ filters are very fast and
+use an insignificant amount of memory.
 .IP ""
 These BCJ filters have known problems related to
 the compression ratio:
@ -1588,21 +1733,20 @@ have the addresses in the instructions filled with filler values.
 These BCJ filters will still do the address conversion,
 which will make the compression worse with these files.
 .IP \(bu 3
-Applying a BCJ filter on an archive containing multiple similar
-executables can make the compression ratio worse than not using
-a BCJ filter.
-This is because the BCJ filter doesn't detect the boundaries
-of the executable files, and doesn't reset
-the address conversion counter for each executable.
+If a BCJ filter is applied on an archive,
+it is possible that it makes the compression ratio
+worse than not using a BCJ filter.
+For example, if there are similar or even identical executables
+then filtering will likely make the files less similar
+and thus compression is worse.
+The contents of non-executable files in the same archive can matter too.
+In practice one has to try with and without a BCJ filter to see
+which is better in each situation.
 .RE
 .IP ""
-Both of the above problems will be fixed
-in the future in a new filter.
-The old BCJ filters will still be useful in embedded systems,
-because the decoder of the new filter will be bigger
-and use more memory.
-.IP ""
 Different instruction sets have different alignment:
+the executable file must be aligned to a multiple of
+this value in the input data to make the filter work.
 .RS
 .RS
 .PP
@ -1612,11 +1756,12 @@ l n l
 l n l.
 Filter;Alignment;Notes
 x86;1;32-bit or 64-bit x86
+ARM;4;
+ARM-Thumb;2;
+ARM64;4;4096-byte alignment is best
 PowerPC;4;Big endian only
-ARM;4;Little endian only
-ARM-Thumb;2;Little endian only
-IA-64;16;Big or little endian
-SPARC;4;Big or little endian
+IA-64;16;Itanium
+SPARC;4;
 .TE
 .RE
 .RE
@ -1627,6 +1772,8 @@ the LZMA2 options are set to match the
 alignment of the selected BCJ filter.
 For example, with the IA-64 filter, it's good to set
 .B pb=4
+or even
+.B pb=4,lp=4,lc=0
 with LZMA2 (2^4=16).
 The x86 filter is an exception;
 it's usually good to stick to LZMA2's default
@ -1774,6 +1921,7 @@ for details.
 .TP
 .B \-\-info\-memory
 Display, in human-readable format, how much physical memory (RAM)
+and how many processor threads
 .B xz
 thinks the system has and the memory usage limits for compression
 and decompression, and exit successfully.
@ -1858,15 +2006,50 @@ and
 .B "xz \-\-robot \-\-info\-memory"
 prints a single line with three tab-separated columns:
 .IP 1. 4
-Total amount of physical memory (RAM) in bytes
+Total amount of physical memory (RAM) in bytes.
 .IP 2. 4
-Memory usage limit for compression in bytes.
-A special value of zero indicates the default setting,
+Memory usage limit for compression in bytes
+.RB ( \-\-memlimit\-compress ).
+A special value of
+.B 0
+indicates the default setting
 which for single-threaded mode is the same as no limit.
 .IP 3. 4
-Memory usage limit for decompression in bytes.
-A special value of zero indicates the default setting,
+Memory usage limit for decompression in bytes
+.RB ( \-\-memlimit\-decompress ).
+A special value of
+.B 0
+indicates the default setting
 which for single-threaded mode is the same as no limit.
+.IP 4. 4
+Since
+.B xz
+5.3.4alpha:
+Memory usage for multi-threaded decompression in bytes
+.RB ( \-\-memlimit\-mt\-decompress ).
+This is never zero because a system-specific default value
+shown in the column 5
+is used if no limit has been specified explicitly.
+This is also never greater than the value in the column 3
+even if a larger value has been specified with
+.BR \-\-memlimit\-mt\-decompress .
+.IP 5. 4
+Since
+.B xz
+5.3.4alpha:
+A system-specific default memory usage limit
+that is used to limit the number of threads
+when compressing with an automatic number of threads
+.RB ( \-\-threads=0 )
+and no memory usage limit has been specified
+.RB ( \-\-memlimit\-compress ).
+This is also used as the default value for
+.BR \-\-memlimit\-mt\-decompress .
+.IP 6. 4
+Since
+.B xz
+5.3.4alpha:
+Number of available processor threads.
 .PP
 In the future, the output of
 .B "xz \-\-robot \-\-info\-memory"