git/reftable/reftable-reader.h
Patrick Steinhardt d857469d85 reftable/reader: introduce refcounting
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:47 -07:00

73 lines
2.5 KiB
C

/*
Copyright 2020 Google LLC
Use of this source code is governed by a BSD-style
license that can be found in the LICENSE file or at
https://developers.google.com/open-source/licenses/bsd
*/
#ifndef REFTABLE_READER_H
#define REFTABLE_READER_H
#include "reftable-iterator.h"
#include "reftable-blocksource.h"
/*
* Reading single tables
*
* The follow routines are for reading single files. For an
* application-level interface, skip ahead to struct
* reftable_merged_table and struct reftable_stack.
*/
/* The reader struct is a handle to an open reftable file. */
struct reftable_reader;
/* reftable_reader_new opens a reftable for reading. If successful,
* returns 0 code and sets pp. The name is used for creating a
* stack. Typically, it is the basename of the file. The block source
* `src` is owned by the reader, and is closed on calling
* reftable_reader_destroy(). On error, the block source `src` is
* closed as well.
*/
int reftable_reader_new(struct reftable_reader **pp,
struct reftable_block_source *src, const char *name);
/*
* Manage the reference count of the reftable reader. A newly initialized
* reader starts with a refcount of 1 and will be deleted once the refcount has
* reached 0.
*
* This is required because readers may have longer lifetimes than the stack
* they belong to. The stack may for example be reloaded while the old tables
* are still being accessed by an iterator.
*/
void reftable_reader_incref(struct reftable_reader *reader);
void reftable_reader_decref(struct reftable_reader *reader);
/* Initialize a reftable iterator for reading refs. */
void reftable_reader_init_ref_iterator(struct reftable_reader *r,
struct reftable_iterator *it);
/* Initialize a reftable iterator for reading logs. */
void reftable_reader_init_log_iterator(struct reftable_reader *r,
struct reftable_iterator *it);
/* returns the hash ID used in this table. */
uint32_t reftable_reader_hash_id(struct reftable_reader *r);
/* return an iterator for the refs pointing to `oid`. */
int reftable_reader_refs_for(struct reftable_reader *r,
struct reftable_iterator *it, uint8_t *oid);
/* return the max_update_index for a table */
uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
/* return the min_update_index for a table */
uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
/* print blocks onto stdout for debugging. */
int reftable_reader_print_blocks(const char *tablename);
#endif