1
0
mirror of https://github.com/git/git synced 2024-07-05 00:58:49 +00:00
Commit Graph

5 Commits

Author SHA1 Message Date
Derrick Stolee
953d2001fa chunk-format: parse trailing table of contents
The new read_trailing_table_of_contents() mimics
read_table_of_contents() except that it reads the table of contents in
reverse from the end of the given hashfile. The file is given as a
memory-mapped section of memory and a size. Automatically calculate the
start of the trailing hash and read the table of contents in revers from
that position.

The errors come along from those in read_table_of_contents(). The one
exception is that the chunk_offset cannot be checked as going into the
table of contents since we do not have that length automatically. That
may have some surprising results for some narrow forms of corruption.
However, we do still limit the size to the size of the file plus the
part of the table of contents read so far. At minimum, the given sizes
can be used to limit parsing within the file itself.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-07 13:53:52 -05:00
Derrick Stolee
7056e7f0c8 chunk-format: allow trailing table of contents
The existing chunk formats use the table of contents at the beginning of
the file. This is intended as a way to speed up the initial loading of
the file, but comes at a cost during writes. Each example needs to fully
compute how big each chunk will be in advance, which usually requires
storing the full file contents in memory.

Future file formats may want to use the chunk format API in cases where
the writing stage is critical to performance, so we may want to stream
updates from an existing file and then only write the table of contents
at the end.

Add a new 'flags' parameter to write_chunkfile() that allows this
behavior. When this is specified, the defensive programming that checks
that the chunks are written with the precomputed sizes is disabled.
Then, the table of contents is written in reverse order at the end of
the hashfile, so a parser can read the chunk list starting from the end
of the file (minus the hash).

The parsing of these table of contents will come in a later change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-07 13:53:52 -05:00
Taylor Blau
d9fef9d90d chunk-format.h: extract oid_version()
There are three definitions of an identical function which converts
`the_hash_algo` into either 1 (for SHA-1) or 2 (for SHA-256). There is a
copy of this function for writing both the commit-graph and
multi-pack-index file, and another inline definition used to write the
.rev header.

Consolidate these into a single definition in chunk-format.h. It's not
clear that this is the best header to define this function in, but it
should do for now.

(Worth noting, the .rev caller expects a 4-byte unsigned, but the other
two callers work with a single unsigned byte. The consolidated version
uses the latter type, and lets the compiler widen it when required).

Another caller will be added in a subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Derrick Stolee
5f0879f54b chunk-format: create read chunk API
Add the capability to read the table of contents, then pair the chunks
with necessary logic using read_chunk_fn pointers. Callers will be added
in future changes, but the typical outline will be:

 1. initialize a 'struct chunkfile' with init_chunkfile(NULL).
 2. call read_table_of_contents().
 3. for each chunk to parse,
    a. call pair_chunk() to assign a pointer with the chunk position, or
    b. call read_chunk() to run a callback on the chunk start and size.
 4. call free_chunkfile() to clear the 'struct chunkfile' data.

We are re-using the anonymous 'struct chunkfile' data, as it is internal
to the chunk-format API. This gives it essentially two modes: write and
read. If the same struct instance was used for both reads and writes,
then there would be failures.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00
Derrick Stolee
570df42610 chunk-format: create chunk format write API
In anticipation of combining the logic from the commit-graph and
multi-pack-index file formats, create a new chunk-format API. Use a
'struct chunkfile' pointer to keep track of data that has been
registered for writes. This struct is anonymous outside of
chunk-format.c to ensure no user attempts to interfere with the data.

The next change will use this API in commit-graph.c, but the general
approach is:

 1. initialize the chunkfile with init_chunkfile(f).
 2. add chunks in the intended writing order with add_chunk().
 3. write any header information to the hashfile f.
 4. write the chunkfile data using write_chunkfile().
 5. free the chunkfile struct using free_chunkfile().

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00