global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 06:50:23 +00:00
|
|
|
#define USE_THE_REPOSITORY_VARIABLE
|
|
|
|
|
2023-04-22 20:17:23 +00:00
|
|
|
#include "git-compat-util.h"
|
2022-08-09 13:11:40 +00:00
|
|
|
#include "bundle-uri.h"
|
|
|
|
#include "bundle.h"
|
2023-04-22 20:17:12 +00:00
|
|
|
#include "copy.h"
|
2023-03-21 06:26:03 +00:00
|
|
|
#include "environment.h"
|
2023-03-21 06:25:54 +00:00
|
|
|
#include "gettext.h"
|
2022-08-09 13:11:40 +00:00
|
|
|
#include "refs.h"
|
|
|
|
#include "run-command.h"
|
bundle-uri: create bundle_list struct and helpers
It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.
In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.
Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.
The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:
1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
together. The client should download all of the advertised data to
have a complete copy of the data.
2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
copy of the data. The client can choose arbitrarily from these
options. In the future, the client may use pings to find the closest
URI among geodistributed replicas, or use some other heuristic
information added to the format.
This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:29 +00:00
|
|
|
#include "hashmap.h"
|
|
|
|
#include "pkt-line.h"
|
2022-10-12 12:52:30 +00:00
|
|
|
#include "config.h"
|
2024-06-19 04:07:33 +00:00
|
|
|
#include "fetch-pack.h"
|
2022-12-22 15:14:15 +00:00
|
|
|
#include "remote.h"
|
bundle-uri: create bundle_list struct and helpers
It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.
In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.
Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.
The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:
1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
together. The client should download all of the advertised data to
have a complete copy of the data.
2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
copy of the data. The client can choose arbitrarily from these
options. In the future, the client may use pings to find the closest
URI among geodistributed replicas, or use some other heuristic
information added to the format.
This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:29 +00:00
|
|
|
|
2023-01-31 13:29:12 +00:00
|
|
|
static struct {
|
|
|
|
enum bundle_list_heuristic heuristic;
|
|
|
|
const char *name;
|
|
|
|
} heuristics[BUNDLE_HEURISTIC__COUNT] = {
|
|
|
|
{ BUNDLE_HEURISTIC_NONE, ""},
|
|
|
|
{ BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" },
|
|
|
|
};
|
|
|
|
|
2023-08-29 23:45:39 +00:00
|
|
|
static int compare_bundles(const void *hashmap_cmp_fn_data UNUSED,
|
bundle-uri: create bundle_list struct and helpers
It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.
In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.
Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.
The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:
1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
together. The client should download all of the advertised data to
have a complete copy of the data.
2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
copy of the data. The client can choose arbitrarily from these
options. In the future, the client may use pings to find the closest
URI among geodistributed replicas, or use some other heuristic
information added to the format.
This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:29 +00:00
|
|
|
const struct hashmap_entry *he1,
|
|
|
|
const struct hashmap_entry *he2,
|
|
|
|
const void *id)
|
|
|
|
{
|
|
|
|
const struct remote_bundle_info *e1 =
|
|
|
|
container_of(he1, const struct remote_bundle_info, ent);
|
|
|
|
const struct remote_bundle_info *e2 =
|
|
|
|
container_of(he2, const struct remote_bundle_info, ent);
|
|
|
|
|
|
|
|
return strcmp(e1->id, id ? (const char *)id : e2->id);
|
|
|
|
}
|
|
|
|
|
|
|
|
void init_bundle_list(struct bundle_list *list)
|
|
|
|
{
|
|
|
|
memset(list, 0, sizeof(*list));
|
|
|
|
|
|
|
|
/* Implied defaults. */
|
|
|
|
list->mode = BUNDLE_MODE_ALL;
|
|
|
|
list->version = 1;
|
|
|
|
|
|
|
|
hashmap_init(&list->bundles, compare_bundles, NULL, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
|
2023-08-29 23:45:39 +00:00
|
|
|
void *data UNUSED)
|
bundle-uri: create bundle_list struct and helpers
It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.
In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.
Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.
The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:
1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
together. The client should download all of the advertised data to
have a complete copy of the data.
2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
copy of the data. The client can choose arbitrarily from these
options. In the future, the client may use pings to find the closest
URI among geodistributed replicas, or use some other heuristic
information added to the format.
This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:29 +00:00
|
|
|
{
|
|
|
|
FREE_AND_NULL(bundle->id);
|
|
|
|
FREE_AND_NULL(bundle->uri);
|
2022-10-12 12:52:36 +00:00
|
|
|
FREE_AND_NULL(bundle->file);
|
|
|
|
bundle->unbundled = 0;
|
bundle-uri: create bundle_list struct and helpers
It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.
In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.
Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.
The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:
1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
together. The client should download all of the advertised data to
have a complete copy of the data.
2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
copy of the data. The client can choose arbitrarily from these
options. In the future, the client may use pings to find the closest
URI among geodistributed replicas, or use some other heuristic
information added to the format.
This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:29 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void clear_bundle_list(struct bundle_list *list)
|
|
|
|
{
|
|
|
|
if (!list)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
|
|
|
|
hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
|
2022-12-22 15:14:15 +00:00
|
|
|
free(list->baseURI);
|
bundle-uri: create bundle_list struct and helpers
It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.
In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.
Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.
The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:
1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
together. The client should download all of the advertised data to
have a complete copy of the data.
2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
copy of the data. The client can choose arbitrarily from these
options. In the future, the client may use pings to find the closest
URI among geodistributed replicas, or use some other heuristic
information added to the format.
This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
int for_all_bundles_in_list(struct bundle_list *list,
|
|
|
|
bundle_iterator iter,
|
|
|
|
void *data)
|
|
|
|
{
|
|
|
|
struct remote_bundle_info *info;
|
|
|
|
struct hashmap_iter i;
|
|
|
|
|
|
|
|
hashmap_for_each_entry(&list->bundles, &i, info, ent) {
|
|
|
|
int result = iter(info, data);
|
|
|
|
|
|
|
|
if (result)
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2022-08-09 13:11:40 +00:00
|
|
|
|
2022-10-12 12:52:32 +00:00
|
|
|
static int summarize_bundle(struct remote_bundle_info *info, void *data)
|
|
|
|
{
|
|
|
|
FILE *fp = data;
|
|
|
|
fprintf(fp, "[bundle \"%s\"]\n", info->id);
|
|
|
|
fprintf(fp, "\turi = %s\n", info->uri);
|
2023-01-31 13:29:13 +00:00
|
|
|
|
|
|
|
if (info->creationToken)
|
|
|
|
fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken);
|
2022-10-12 12:52:32 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void print_bundle_list(FILE *fp, struct bundle_list *list)
|
|
|
|
{
|
|
|
|
const char *mode;
|
|
|
|
|
|
|
|
switch (list->mode) {
|
|
|
|
case BUNDLE_MODE_ALL:
|
|
|
|
mode = "all";
|
|
|
|
break;
|
|
|
|
|
|
|
|
case BUNDLE_MODE_ANY:
|
|
|
|
mode = "any";
|
|
|
|
break;
|
|
|
|
|
|
|
|
case BUNDLE_MODE_NONE:
|
|
|
|
default:
|
|
|
|
mode = "<unknown>";
|
|
|
|
}
|
|
|
|
|
|
|
|
fprintf(fp, "[bundle]\n");
|
|
|
|
fprintf(fp, "\tversion = %d\n", list->version);
|
|
|
|
fprintf(fp, "\tmode = %s\n", mode);
|
|
|
|
|
2023-01-31 13:29:12 +00:00
|
|
|
if (list->heuristic) {
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
|
|
|
|
if (heuristics[i].heuristic == list->heuristic) {
|
|
|
|
printf("\theuristic = %s\n",
|
|
|
|
heuristics[list->heuristic].name);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:32 +00:00
|
|
|
for_all_bundles_in_list(list, summarize_bundle, fp);
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:30 +00:00
|
|
|
/**
|
|
|
|
* Given a key-value pair, update the state of the given bundle list.
|
|
|
|
* Returns 0 if the key-value pair is understood. Returns -1 if the key
|
|
|
|
* is not understood or the value is malformed.
|
|
|
|
*/
|
|
|
|
static int bundle_list_update(const char *key, const char *value,
|
|
|
|
struct bundle_list *list)
|
|
|
|
{
|
|
|
|
struct strbuf id = STRBUF_INIT;
|
|
|
|
struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
|
|
|
|
struct remote_bundle_info *bundle;
|
|
|
|
const char *subsection, *subkey;
|
|
|
|
size_t subsection_len;
|
|
|
|
|
|
|
|
if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey))
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
if (!subsection_len) {
|
|
|
|
if (!strcmp(subkey, "version")) {
|
|
|
|
int version;
|
|
|
|
if (!git_parse_int(value, &version))
|
|
|
|
return -1;
|
|
|
|
if (version != 1)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
list->version = version;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!strcmp(subkey, "mode")) {
|
|
|
|
if (!strcmp(value, "all"))
|
|
|
|
list->mode = BUNDLE_MODE_ALL;
|
|
|
|
else if (!strcmp(value, "any"))
|
|
|
|
list->mode = BUNDLE_MODE_ANY;
|
|
|
|
else
|
|
|
|
return -1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-01-31 13:29:12 +00:00
|
|
|
if (!strcmp(subkey, "heuristic")) {
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
|
|
|
|
if (heuristics[i].heuristic &&
|
|
|
|
heuristics[i].name &&
|
|
|
|
!strcmp(value, heuristics[i].name)) {
|
|
|
|
list->heuristic = heuristics[i].heuristic;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Ignore unknown heuristics. */
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:30 +00:00
|
|
|
/* Ignore other unknown global keys. */
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
strbuf_add(&id, subsection, subsection_len);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check for an existing bundle with this <id>, or create one
|
|
|
|
* if necessary.
|
|
|
|
*/
|
|
|
|
lookup.id = id.buf;
|
|
|
|
hashmap_entry_init(&lookup.ent, strhash(lookup.id));
|
|
|
|
if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
|
|
|
|
CALLOC_ARRAY(bundle, 1);
|
|
|
|
bundle->id = strbuf_detach(&id, NULL);
|
|
|
|
hashmap_entry_init(&bundle->ent, strhash(bundle->id));
|
|
|
|
hashmap_add(&list->bundles, &bundle->ent);
|
|
|
|
}
|
|
|
|
strbuf_release(&id);
|
|
|
|
|
|
|
|
if (!strcmp(subkey, "uri")) {
|
|
|
|
if (bundle->uri)
|
|
|
|
return -1;
|
2022-12-22 15:14:15 +00:00
|
|
|
bundle->uri = relative_url(list->baseURI, value, NULL);
|
2022-10-12 12:52:30 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-01-31 13:29:13 +00:00
|
|
|
if (!strcmp(subkey, "creationtoken")) {
|
|
|
|
if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
|
|
|
|
warning(_("could not parse bundle list key %s with value '%s'"),
|
|
|
|
"creationToken", value);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:30 +00:00
|
|
|
/*
|
|
|
|
* At this point, we ignore any information that we don't
|
|
|
|
* understand, assuming it to be hints for a heuristic the client
|
|
|
|
* does not currently understand.
|
|
|
|
*/
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 19:26:22 +00:00
|
|
|
static int config_to_bundle_list(const char *key, const char *value,
|
|
|
|
const struct config_context *ctx UNUSED,
|
|
|
|
void *data)
|
2022-10-12 12:52:33 +00:00
|
|
|
{
|
|
|
|
struct bundle_list *list = data;
|
|
|
|
return bundle_list_update(key, value, list);
|
|
|
|
}
|
|
|
|
|
|
|
|
int bundle_uri_parse_config_format(const char *uri,
|
|
|
|
const char *filename,
|
|
|
|
struct bundle_list *list)
|
|
|
|
{
|
|
|
|
int result;
|
|
|
|
struct config_options opts = {
|
|
|
|
.error_action = CONFIG_ERROR_ERROR,
|
|
|
|
};
|
|
|
|
|
2022-12-22 15:14:15 +00:00
|
|
|
if (!list->baseURI) {
|
|
|
|
struct strbuf baseURI = STRBUF_INIT;
|
|
|
|
strbuf_addstr(&baseURI, uri);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the URI does not end with a trailing slash, then
|
|
|
|
* remove the filename portion of the path. This is
|
|
|
|
* important for relative URIs.
|
|
|
|
*/
|
|
|
|
strbuf_strip_file_from_path(&baseURI);
|
|
|
|
list->baseURI = strbuf_detach(&baseURI, NULL);
|
|
|
|
}
|
2022-10-12 12:52:33 +00:00
|
|
|
result = git_config_from_file_with_options(config_to_bundle_list,
|
|
|
|
filename, list,
|
2023-06-28 19:26:24 +00:00
|
|
|
CONFIG_SCOPE_UNKNOWN,
|
2022-10-12 12:52:33 +00:00
|
|
|
&opts);
|
|
|
|
|
|
|
|
if (!result && list->mode == BUNDLE_MODE_NONE) {
|
|
|
|
warning(_("bundle list at '%s' has no mode"), uri);
|
|
|
|
result = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:28 +00:00
|
|
|
static char *find_temp_filename(void)
|
2022-08-09 13:11:40 +00:00
|
|
|
{
|
|
|
|
int fd;
|
2022-10-12 12:52:28 +00:00
|
|
|
struct strbuf name = STRBUF_INIT;
|
2022-08-09 13:11:40 +00:00
|
|
|
/*
|
|
|
|
* Find a temporary filename that is available. This is briefly
|
|
|
|
* racy, but unlikely to collide.
|
|
|
|
*/
|
2022-10-12 12:52:28 +00:00
|
|
|
fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX");
|
2022-08-09 13:11:40 +00:00
|
|
|
if (fd < 0) {
|
|
|
|
warning(_("failed to create temporary file"));
|
2022-10-12 12:52:28 +00:00
|
|
|
return NULL;
|
2022-08-09 13:11:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
close(fd);
|
2022-10-12 12:52:28 +00:00
|
|
|
unlink(name.buf);
|
|
|
|
return strbuf_detach(&name, NULL);
|
2022-08-09 13:11:40 +00:00
|
|
|
}
|
|
|
|
|
2022-08-09 13:11:42 +00:00
|
|
|
static int download_https_uri_to_file(const char *file, const char *uri)
|
2022-08-09 13:11:40 +00:00
|
|
|
{
|
2022-08-09 13:11:42 +00:00
|
|
|
int result = 0;
|
|
|
|
struct child_process cp = CHILD_PROCESS_INIT;
|
|
|
|
FILE *child_in = NULL, *child_out = NULL;
|
|
|
|
struct strbuf line = STRBUF_INIT;
|
|
|
|
int found_get = 0;
|
|
|
|
|
|
|
|
strvec_pushl(&cp.args, "git-remote-https", uri, NULL);
|
2022-10-12 12:52:39 +00:00
|
|
|
cp.err = -1;
|
2022-08-09 13:11:42 +00:00
|
|
|
cp.in = -1;
|
|
|
|
cp.out = -1;
|
|
|
|
|
|
|
|
if (start_command(&cp))
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
child_in = fdopen(cp.in, "w");
|
|
|
|
if (!child_in) {
|
|
|
|
result = 1;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
child_out = fdopen(cp.out, "r");
|
|
|
|
if (!child_out) {
|
|
|
|
result = 1;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
fprintf(child_in, "capabilities\n");
|
|
|
|
fflush(child_in);
|
|
|
|
|
|
|
|
while (!strbuf_getline(&line, child_out)) {
|
|
|
|
if (!line.len)
|
|
|
|
break;
|
|
|
|
if (!strcmp(line.buf, "get"))
|
|
|
|
found_get = 1;
|
|
|
|
}
|
|
|
|
strbuf_release(&line);
|
|
|
|
|
|
|
|
if (!found_get) {
|
|
|
|
result = error(_("insufficient capabilities"));
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
fprintf(child_in, "get %s %s\n\n", uri, file);
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
if (child_in)
|
|
|
|
fclose(child_in);
|
|
|
|
if (finish_command(&cp))
|
|
|
|
return 1;
|
|
|
|
if (child_out)
|
|
|
|
fclose(child_out);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int copy_uri_to_file(const char *filename, const char *uri)
|
|
|
|
{
|
|
|
|
const char *out;
|
|
|
|
|
|
|
|
if (starts_with(uri, "https:") ||
|
|
|
|
starts_with(uri, "http:"))
|
|
|
|
return download_https_uri_to_file(filename, uri);
|
|
|
|
|
|
|
|
if (skip_prefix(uri, "file://", &out))
|
|
|
|
uri = out;
|
|
|
|
|
|
|
|
/* Copy as a file */
|
|
|
|
return copy_file(filename, uri, 0);
|
2022-08-09 13:11:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int unbundle_from_file(struct repository *r, const char *file)
|
|
|
|
{
|
|
|
|
int result = 0;
|
|
|
|
int bundle_fd;
|
|
|
|
struct bundle_header header = BUNDLE_HEADER_INIT;
|
|
|
|
struct string_list_item *refname;
|
|
|
|
struct strbuf bundle_ref = STRBUF_INIT;
|
|
|
|
size_t bundle_prefix_len;
|
|
|
|
|
|
|
|
if ((bundle_fd = read_bundle_header(file, &header)) < 0)
|
|
|
|
return 1;
|
|
|
|
|
2022-10-12 12:52:37 +00:00
|
|
|
/*
|
|
|
|
* Skip the reachability walk here, since we will be adding
|
|
|
|
* a reachable ref pointing to the new tips, which will reach
|
|
|
|
* the prerequisite commits.
|
|
|
|
*/
|
2022-10-12 12:52:38 +00:00
|
|
|
if ((result = unbundle(r, &header, bundle_fd, NULL,
|
2024-06-19 04:07:33 +00:00
|
|
|
VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0))))
|
2022-08-09 13:11:40 +00:00
|
|
|
return 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Convert all refs/heads/ from the bundle into refs/bundles/
|
|
|
|
* in the local repository.
|
|
|
|
*/
|
|
|
|
strbuf_addstr(&bundle_ref, "refs/bundles/");
|
|
|
|
bundle_prefix_len = bundle_ref.len;
|
|
|
|
|
|
|
|
for_each_string_list_item(refname, &header.references) {
|
|
|
|
struct object_id *oid = refname->util;
|
|
|
|
struct object_id old_oid;
|
|
|
|
const char *branch_name;
|
|
|
|
int has_old;
|
|
|
|
|
|
|
|
if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
strbuf_setlen(&bundle_ref, bundle_prefix_len);
|
|
|
|
strbuf_addstr(&bundle_ref, branch_name);
|
|
|
|
|
2024-05-07 07:11:53 +00:00
|
|
|
has_old = !refs_read_ref(get_main_ref_store(the_repository),
|
|
|
|
bundle_ref.buf, &old_oid);
|
|
|
|
refs_update_ref(get_main_ref_store(the_repository),
|
|
|
|
"fetched bundle", bundle_ref.buf, oid,
|
|
|
|
has_old ? &old_oid : NULL,
|
bundle-uri: verify oid before writing refs
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.
To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:
git bundle create base.bundle main
Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":
git bundle create incr.bundle A..main
Now, generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.
So why the "refs/bundles/main" is not discovered? After some digging I
found that:
1. Bundles in bundle list are downloaded to local files via
`bundle-uri.c:download_bundle_list` or via
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
is called by `bundle-uri.c:unbundle_all_bundles` or called within
`bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
inside `bundle-uri.c:unbundle_from_file` to by calling
`bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
`bundle.c:verify_bundle` to ensure the repository contains all the
prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
invokes `packfile.c:prepare_packed_git` or
`packfile.c:reprepare_packed_git`, filling
`raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
`refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
bundle refs which can target arbitrary objects are written to the
repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
`fetch-pack.c:mark_complete_and_common_ref` and
`fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
prevents `packfile.c:reprepare_packed_git` from being called,
resulting in failures to parse OIDs that reside only in the latest
bundle.
In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.
Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing
bundle refs. When `refs.c:refs_update_ref` is called to write the
corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`.
This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls
`transaction_prepare` of the refs storage backend. For files backend, it
is `files-backend.c:files_transaction_prepare`, and for reftable
backend, it is `reftable-backend.c:reftable_be_transaction_prepare`.
Both functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.
A set of negotiation-related tests for cloning with bundle-uri has been
included to demonstrate that downloaded bundles are utilized to
accelerate fetching.
Additionally, another test has been added to show that bundles with
incorrect headers, where refs point to non-existent objects, do not
result in any bundle refs being created in the repository.
Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 04:07:31 +00:00
|
|
|
0, UPDATE_REFS_MSG_ON_ERR);
|
2022-08-09 13:11:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
bundle_header_release(&header);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
struct bundle_list_context {
|
|
|
|
struct repository *r;
|
|
|
|
struct bundle_list *list;
|
|
|
|
enum bundle_list_mode mode;
|
|
|
|
int count;
|
|
|
|
int depth;
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This early definition is necessary because we use indirect recursion:
|
|
|
|
*
|
|
|
|
* While iterating through a bundle list that was downloaded as part
|
|
|
|
* of fetch_bundle_uri_internal(), iterator methods eventually call it
|
|
|
|
* again, but with depth + 1.
|
|
|
|
*/
|
|
|
|
static int fetch_bundle_uri_internal(struct repository *r,
|
|
|
|
struct remote_bundle_info *bundle,
|
|
|
|
int depth,
|
|
|
|
struct bundle_list *list);
|
|
|
|
|
|
|
|
static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
|
|
|
|
{
|
|
|
|
int res;
|
|
|
|
struct bundle_list_context *ctx = data;
|
|
|
|
|
|
|
|
if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Only increment count if the download succeeded. If our mode is
|
|
|
|
* BUNDLE_MODE_ANY, then we will want to try other URIs in the
|
|
|
|
* list in case they work instead.
|
|
|
|
*/
|
|
|
|
if (!res)
|
|
|
|
ctx->count++;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* To be opportunistic as possible, we continue iterating and
|
|
|
|
* download as many bundles as we can, so we can apply the ones
|
|
|
|
* that work, even in BUNDLE_MODE_ALL mode.
|
|
|
|
*/
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
struct bundles_for_sorting {
|
|
|
|
struct remote_bundle_info **items;
|
|
|
|
size_t alloc;
|
|
|
|
size_t nr;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int append_bundle(struct remote_bundle_info *bundle, void *data)
|
|
|
|
{
|
|
|
|
struct bundles_for_sorting *list = data;
|
|
|
|
list->items[list->nr++] = bundle;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* For use in QSORT() to get a list sorted by creationToken
|
|
|
|
* in decreasing order.
|
|
|
|
*/
|
|
|
|
static int compare_creation_token_decreasing(const void *va, const void *vb)
|
|
|
|
{
|
|
|
|
const struct remote_bundle_info * const *a = va;
|
|
|
|
const struct remote_bundle_info * const *b = vb;
|
|
|
|
|
|
|
|
if ((*a)->creationToken > (*b)->creationToken)
|
|
|
|
return -1;
|
|
|
|
if ((*a)->creationToken < (*b)->creationToken)
|
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int fetch_bundles_by_token(struct repository *r,
|
|
|
|
struct bundle_list *list)
|
|
|
|
{
|
|
|
|
int cur;
|
|
|
|
int move_direction = 0;
|
2023-01-31 13:29:18 +00:00
|
|
|
const char *creationTokenStr;
|
|
|
|
uint64_t maxCreationToken = 0, newMaxCreationToken = 0;
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
struct bundle_list_context ctx = {
|
|
|
|
.r = r,
|
|
|
|
.list = list,
|
|
|
|
.mode = list->mode,
|
|
|
|
};
|
|
|
|
struct bundles_for_sorting bundles = {
|
|
|
|
.alloc = hashmap_get_size(&list->bundles),
|
|
|
|
};
|
|
|
|
|
|
|
|
ALLOC_ARRAY(bundles.items, bundles.alloc);
|
|
|
|
|
|
|
|
for_all_bundles_in_list(list, append_bundle, &bundles);
|
|
|
|
|
2023-01-31 13:29:18 +00:00
|
|
|
if (!bundles.nr) {
|
|
|
|
free(bundles.items);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
|
|
|
|
|
2023-01-31 13:29:18 +00:00
|
|
|
/*
|
|
|
|
* If fetch.bundleCreationToken exists, parses to a uint64t, and
|
|
|
|
* is not strictly smaller than the maximum creation token in the
|
|
|
|
* bundle list, then do not download any bundles.
|
|
|
|
*/
|
|
|
|
if (!repo_config_get_value(r,
|
|
|
|
"fetch.bundlecreationtoken",
|
|
|
|
&creationTokenStr) &&
|
|
|
|
sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
|
|
|
|
bundles.items[0]->creationToken <= maxCreationToken) {
|
|
|
|
free(bundles.items);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
/*
|
|
|
|
* Attempt to download and unbundle the minimum number of bundles by
|
|
|
|
* creationToken in decreasing order. If we fail to unbundle (after
|
|
|
|
* a successful download) then move to the next non-downloaded bundle
|
|
|
|
* and attempt downloading. Once we succeed in applying a bundle,
|
|
|
|
* move to the previous unapplied bundle and attempt to unbundle it
|
|
|
|
* again.
|
|
|
|
*
|
|
|
|
* In the case of a fresh clone, we will likely download all of the
|
|
|
|
* bundles before successfully unbundling the oldest one, then the
|
|
|
|
* rest of the bundles unbundle successfully in increasing order
|
|
|
|
* of creationToken.
|
|
|
|
*
|
|
|
|
* If there are existing objects, then this process may terminate
|
|
|
|
* early when all required commits from "new" bundles exist in the
|
|
|
|
* repo's object store.
|
|
|
|
*/
|
|
|
|
cur = 0;
|
|
|
|
while (cur >= 0 && cur < bundles.nr) {
|
|
|
|
struct remote_bundle_info *bundle = bundles.items[cur];
|
2023-01-31 13:29:18 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we need to dig into bundles below the previous
|
|
|
|
* creation token value, then likely we are in an erroneous
|
|
|
|
* state due to missing or invalid bundles. Halt the process
|
|
|
|
* instead of continuing to download extra data.
|
|
|
|
*/
|
|
|
|
if (bundle->creationToken <= maxCreationToken)
|
|
|
|
break;
|
|
|
|
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
if (!bundle->file) {
|
|
|
|
/*
|
|
|
|
* Not downloaded yet. Try downloading.
|
|
|
|
*
|
|
|
|
* Note that bundle->file is non-NULL if a download
|
|
|
|
* was attempted, even if it failed to download.
|
|
|
|
*/
|
|
|
|
if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
|
|
|
|
/* Mark as unbundled so we do not retry. */
|
|
|
|
bundle->unbundled = 1;
|
|
|
|
|
|
|
|
/* Try looking deeper in the list. */
|
|
|
|
move_direction = 1;
|
|
|
|
goto move;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* We expect bundles when using creationTokens. */
|
|
|
|
if (!is_bundle(bundle->file, 1)) {
|
|
|
|
warning(_("file downloaded from '%s' is not a bundle"),
|
|
|
|
bundle->uri);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (bundle->file && !bundle->unbundled) {
|
|
|
|
/*
|
|
|
|
* This was downloaded, but not successfully
|
|
|
|
* unbundled. Try unbundling again.
|
|
|
|
*/
|
|
|
|
if (unbundle_from_file(ctx.r, bundle->file)) {
|
|
|
|
/* Try looking deeper in the list. */
|
|
|
|
move_direction = 1;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Succeeded in unbundle. Retry bundles
|
|
|
|
* that previously failed to unbundle.
|
|
|
|
*/
|
|
|
|
move_direction = -1;
|
|
|
|
bundle->unbundled = 1;
|
2023-01-31 13:29:18 +00:00
|
|
|
|
|
|
|
if (bundle->creationToken > newMaxCreationToken)
|
|
|
|
newMaxCreationToken = bundle->creationToken;
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Else case: downloaded and unbundled successfully.
|
|
|
|
* Skip this by moving in the same direction as the
|
|
|
|
* previous step.
|
|
|
|
*/
|
|
|
|
|
|
|
|
move:
|
|
|
|
/* Move in the specified direction and repeat. */
|
|
|
|
cur += move_direction;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We succeed if the loop terminates because 'cur' drops below
|
|
|
|
* zero. The other case is that we terminate because 'cur'
|
|
|
|
* reaches the end of the list, so we have a failure no matter
|
|
|
|
* which bundles we apply from the list.
|
|
|
|
*/
|
2023-01-31 13:29:18 +00:00
|
|
|
if (cur < 0) {
|
|
|
|
struct strbuf value = STRBUF_INIT;
|
|
|
|
strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken);
|
|
|
|
if (repo_config_set_multivar_gently(ctx.r,
|
|
|
|
"fetch.bundleCreationToken",
|
|
|
|
value.buf, NULL, 0))
|
|
|
|
warning(_("failed to store maximum creation token"));
|
|
|
|
|
|
|
|
strbuf_release(&value);
|
|
|
|
}
|
|
|
|
|
|
|
|
free(bundles.items);
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
return cur >= 0;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
static int download_bundle_list(struct repository *r,
|
|
|
|
struct bundle_list *local_list,
|
|
|
|
struct bundle_list *global_list,
|
|
|
|
int depth)
|
|
|
|
{
|
|
|
|
struct bundle_list_context ctx = {
|
|
|
|
.r = r,
|
|
|
|
.list = global_list,
|
|
|
|
.depth = depth + 1,
|
|
|
|
.mode = local_list->mode,
|
|
|
|
};
|
|
|
|
|
|
|
|
return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int fetch_bundle_list_in_config_format(struct repository *r,
|
|
|
|
struct bundle_list *global_list,
|
|
|
|
struct remote_bundle_info *bundle,
|
|
|
|
int depth)
|
|
|
|
{
|
|
|
|
int result;
|
|
|
|
struct bundle_list list_from_bundle;
|
|
|
|
|
|
|
|
init_bundle_list(&list_from_bundle);
|
|
|
|
|
|
|
|
if ((result = bundle_uri_parse_config_format(bundle->uri,
|
|
|
|
bundle->file,
|
|
|
|
&list_from_bundle)))
|
|
|
|
goto cleanup;
|
|
|
|
|
|
|
|
if (list_from_bundle.mode == BUNDLE_MODE_NONE) {
|
|
|
|
warning(_("unrecognized bundle mode from URI '%s'"),
|
|
|
|
bundle->uri);
|
|
|
|
result = -1;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
/*
|
|
|
|
* If this list uses the creationToken heuristic, then the URIs
|
|
|
|
* it advertises are expected to be bundles, not nested lists.
|
|
|
|
* We can drop 'global_list' and 'depth'.
|
|
|
|
*/
|
|
|
|
if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
|
|
|
|
result = fetch_bundles_by_token(r, &list_from_bundle);
|
|
|
|
global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
|
|
|
|
} else if ((result = download_bundle_list(r, &list_from_bundle,
|
2022-10-12 12:52:36 +00:00
|
|
|
global_list, depth)))
|
|
|
|
goto cleanup;
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
clear_bundle_list(&list_from_bundle);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:34 +00:00
|
|
|
/**
|
|
|
|
* This limits the recursion on fetch_bundle_uri_internal() when following
|
|
|
|
* bundle lists.
|
|
|
|
*/
|
|
|
|
static int max_bundle_uri_depth = 4;
|
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
/**
|
|
|
|
* Recursively download all bundles advertised at the given URI
|
|
|
|
* to files. If the file is a bundle, then add it to the given
|
|
|
|
* 'list'. Otherwise, expect a bundle list and recurse on the
|
|
|
|
* URIs in that list according to the list mode (ANY or ALL).
|
|
|
|
*/
|
2022-10-12 12:52:34 +00:00
|
|
|
static int fetch_bundle_uri_internal(struct repository *r,
|
2022-10-12 12:52:36 +00:00
|
|
|
struct remote_bundle_info *bundle,
|
|
|
|
int depth,
|
|
|
|
struct bundle_list *list)
|
2022-08-09 13:11:40 +00:00
|
|
|
{
|
|
|
|
int result = 0;
|
2022-10-12 12:52:36 +00:00
|
|
|
struct remote_bundle_info *bcopy;
|
2022-08-09 13:11:40 +00:00
|
|
|
|
2022-10-12 12:52:34 +00:00
|
|
|
if (depth >= max_bundle_uri_depth) {
|
|
|
|
warning(_("exceeded bundle URI recursion limit (%d)"),
|
|
|
|
max_bundle_uri_depth);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
if (!bundle->file &&
|
|
|
|
!(bundle->file = find_temp_filename())) {
|
2022-10-12 12:52:28 +00:00
|
|
|
result = -1;
|
2022-08-09 13:11:40 +00:00
|
|
|
goto cleanup;
|
2022-10-12 12:52:28 +00:00
|
|
|
}
|
2022-08-09 13:11:40 +00:00
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
if ((result = copy_uri_to_file(bundle->file, bundle->uri))) {
|
|
|
|
warning(_("failed to download bundle from URI '%s'"), bundle->uri);
|
2022-08-09 13:11:40 +00:00
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
if ((result = !is_bundle(bundle->file, 1))) {
|
|
|
|
result = fetch_bundle_list_in_config_format(
|
|
|
|
r, list, bundle, depth);
|
|
|
|
if (result)
|
|
|
|
warning(_("file at URI '%s' is not a bundle or bundle list"),
|
|
|
|
bundle->uri);
|
2022-08-09 13:11:40 +00:00
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
/* Copy the bundle and insert it into the global list. */
|
|
|
|
CALLOC_ARRAY(bcopy, 1);
|
|
|
|
bcopy->id = xstrdup(bundle->id);
|
|
|
|
bcopy->file = xstrdup(bundle->file);
|
|
|
|
hashmap_entry_init(&bcopy->ent, strhash(bcopy->id));
|
|
|
|
hashmap_add(&list->bundles, &bcopy->ent);
|
2022-08-09 13:11:40 +00:00
|
|
|
|
|
|
|
cleanup:
|
2022-10-12 12:52:36 +00:00
|
|
|
if (result && bundle->file)
|
|
|
|
unlink(bundle->file);
|
2022-08-09 13:11:40 +00:00
|
|
|
return result;
|
|
|
|
}
|
2022-10-12 12:52:31 +00:00
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
/**
|
|
|
|
* This loop iterator breaks the loop with nonzero return code on the
|
|
|
|
* first successful unbundling of a bundle.
|
|
|
|
*/
|
|
|
|
static int attempt_unbundle(struct remote_bundle_info *info, void *data)
|
|
|
|
{
|
|
|
|
struct repository *r = data;
|
|
|
|
|
|
|
|
if (!info->file || info->unbundled)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!unbundle_from_file(r, info->file)) {
|
|
|
|
info->unbundled = 1;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int unbundle_all_bundles(struct repository *r,
|
|
|
|
struct bundle_list *list)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Iterate through all bundles looking for ones that can
|
|
|
|
* successfully unbundle. If any succeed, then perhaps another
|
|
|
|
* will succeed in the next attempt.
|
|
|
|
*
|
|
|
|
* Keep in mind that a non-zero result for the loop here means
|
|
|
|
* the loop terminated early on a successful unbundling, which
|
|
|
|
* signals that we can try again.
|
|
|
|
*/
|
|
|
|
while (for_all_bundles_in_list(list, attempt_unbundle, r)) ;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-08-29 23:45:39 +00:00
|
|
|
static int unlink_bundle(struct remote_bundle_info *info, void *data UNUSED)
|
2022-10-12 12:52:36 +00:00
|
|
|
{
|
|
|
|
if (info->file)
|
|
|
|
unlink_or_warn(info->file);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-01-31 13:29:15 +00:00
|
|
|
int fetch_bundle_uri(struct repository *r, const char *uri,
|
|
|
|
int *has_heuristic)
|
2022-10-12 12:52:34 +00:00
|
|
|
{
|
2022-10-12 12:52:36 +00:00
|
|
|
int result;
|
|
|
|
struct bundle_list list;
|
|
|
|
struct remote_bundle_info bundle = {
|
|
|
|
.uri = xstrdup(uri),
|
|
|
|
.id = xstrdup(""),
|
|
|
|
};
|
|
|
|
|
|
|
|
init_bundle_list(&list);
|
|
|
|
|
2023-03-31 15:59:04 +00:00
|
|
|
/*
|
2023-04-22 13:56:46 +00:00
|
|
|
* Do not fetch an empty bundle URI. An empty bundle URI
|
2023-03-31 15:59:04 +00:00
|
|
|
* could signal that a configured bundle URI has been disabled.
|
|
|
|
*/
|
2023-04-22 13:56:46 +00:00
|
|
|
if (!*uri) {
|
2023-03-31 15:59:04 +00:00
|
|
|
result = 0;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:36 +00:00
|
|
|
/* If a bundle is added to this global list, then it is required. */
|
|
|
|
list.mode = BUNDLE_MODE_ALL;
|
|
|
|
|
|
|
|
if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list)))
|
|
|
|
goto cleanup;
|
|
|
|
|
|
|
|
result = unbundle_all_bundles(r, &list);
|
|
|
|
|
|
|
|
cleanup:
|
2023-01-31 13:29:15 +00:00
|
|
|
if (has_heuristic)
|
|
|
|
*has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE);
|
2022-10-12 12:52:36 +00:00
|
|
|
for_all_bundles_in_list(&list, unlink_bundle, NULL);
|
|
|
|
clear_bundle_list(&list);
|
|
|
|
clear_remote_bundle_info(&bundle, NULL);
|
|
|
|
return result;
|
2022-10-12 12:52:34 +00:00
|
|
|
}
|
|
|
|
|
2022-12-22 15:14:16 +00:00
|
|
|
int fetch_bundle_list(struct repository *r, struct bundle_list *list)
|
|
|
|
{
|
|
|
|
int result;
|
|
|
|
struct bundle_list global_list;
|
|
|
|
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
/*
|
|
|
|
* If the creationToken heuristic is used, then the URIs
|
|
|
|
* advertised by 'list' are not nested lists and instead
|
|
|
|
* direct bundles. We do not need to use global_list.
|
|
|
|
*/
|
|
|
|
if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
|
|
|
|
return fetch_bundles_by_token(r, list);
|
|
|
|
|
2022-12-22 15:14:16 +00:00
|
|
|
init_bundle_list(&global_list);
|
|
|
|
|
|
|
|
/* If a bundle is added to this global list, then it is required. */
|
|
|
|
global_list.mode = BUNDLE_MODE_ALL;
|
|
|
|
|
|
|
|
if ((result = download_bundle_list(r, list, &global_list, 0)))
|
|
|
|
goto cleanup;
|
|
|
|
|
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
|
|
|
if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
|
|
|
|
result = fetch_bundles_by_token(r, list);
|
|
|
|
else
|
|
|
|
result = unbundle_all_bundles(r, &global_list);
|
2022-12-22 15:14:16 +00:00
|
|
|
|
|
|
|
cleanup:
|
|
|
|
for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
|
|
|
|
clear_bundle_list(&global_list);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2022-12-22 15:14:07 +00:00
|
|
|
/**
|
|
|
|
* API for serve.c.
|
|
|
|
*/
|
|
|
|
|
|
|
|
int bundle_uri_advertise(struct repository *r, struct strbuf *value UNUSED)
|
|
|
|
{
|
|
|
|
static int advertise_bundle_uri = -1;
|
|
|
|
|
|
|
|
if (advertise_bundle_uri != -1)
|
|
|
|
goto cached;
|
|
|
|
|
|
|
|
advertise_bundle_uri = 0;
|
|
|
|
repo_config_get_maybe_bool(r, "uploadpack.advertisebundleuris", &advertise_bundle_uri);
|
|
|
|
|
|
|
|
cached:
|
|
|
|
return advertise_bundle_uri;
|
|
|
|
}
|
|
|
|
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 19:26:22 +00:00
|
|
|
static int config_to_packet_line(const char *key, const char *value,
|
|
|
|
const struct config_context *ctx UNUSED,
|
|
|
|
void *data)
|
2022-12-22 15:14:13 +00:00
|
|
|
{
|
|
|
|
struct packet_reader *writer = data;
|
|
|
|
|
convert trivial uses of strncmp() to starts_with()
It's more readable to use starts_with() instead of strncmp() to match a
prefix, as the latter requires a manually-computed length, and has the
funny "matching is zero" return value common to cmp functions. This
patch converts several cases which were found with:
git grep 'strncmp(.*, [0-9]*)'
But note that it doesn't convert all such cases. There are several where
the magic length number is repeated elsewhere in the code, like:
/* handle "buf" which isn't NUL-terminated and might be too small */
if (len >= 3 && !strncmp(buf, "foo", 3))
or:
/* exact match for "foo", but within a larger string */
if (end - buf == 3 && !strncmp(buf, "foo", 3))
While it would not produce the wrong outcome to use starts_with() in
these cases, we'd still be left with one instance of "3". We're better
to leave them for now, as the repeated "3" makes it clear that the two
are linked (there may be other refactorings that handle both, but
they're out of scope for this patch).
A few things to note while reading the patch:
- all cases but one are trying to match, and so lose the extra "!".
The case in the first hunk of urlmatch.c is not-matching, and hence
gains a "!".
- the case in remote-fd.c is matching the beginning of "connect foo",
but we never look at str+8 to parse the "foo" part (which would make
this a candidate for skip_prefix(), not starts_with()). This seems
at first glance like a bug, but is a limitation of how remote-fd
works.
- the second hunk in urlmatch.c shows some cases adjacent to other
strncmp() calls that are left. These are of the "exact match within
a larger string" type, as described above.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-07 13:26:18 +00:00
|
|
|
if (starts_with(key, "bundle."))
|
2022-12-22 15:14:13 +00:00
|
|
|
packet_write_fmt(writer->fd, "%s=%s", key, value);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-12-22 15:14:07 +00:00
|
|
|
int bundle_uri_command(struct repository *r,
|
|
|
|
struct packet_reader *request)
|
|
|
|
{
|
|
|
|
struct packet_writer writer;
|
|
|
|
packet_writer_init(&writer, 1);
|
|
|
|
|
|
|
|
while (packet_reader_read(request) == PACKET_READ_NORMAL)
|
|
|
|
die(_("bundle-uri: unexpected argument: '%s'"), request->line);
|
|
|
|
if (request->status != PACKET_READ_FLUSH)
|
|
|
|
die(_("bundle-uri: expected flush after arguments"));
|
|
|
|
|
2022-12-22 15:14:13 +00:00
|
|
|
/*
|
|
|
|
* Read all "bundle.*" config lines to the client as key=value
|
|
|
|
* packet lines.
|
|
|
|
*/
|
2023-02-24 06:38:10 +00:00
|
|
|
repo_config(r, config_to_packet_line, &writer);
|
2022-12-22 15:14:07 +00:00
|
|
|
|
|
|
|
packet_writer_flush(&writer);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-10-12 12:52:31 +00:00
|
|
|
/**
|
|
|
|
* General API for {transport,connect}.c etc.
|
|
|
|
*/
|
|
|
|
int bundle_uri_parse_line(struct bundle_list *list, const char *line)
|
|
|
|
{
|
|
|
|
int result;
|
|
|
|
const char *equals;
|
|
|
|
struct strbuf key = STRBUF_INIT;
|
|
|
|
|
|
|
|
if (!strlen(line))
|
|
|
|
return error(_("bundle-uri: got an empty line"));
|
|
|
|
|
|
|
|
equals = strchr(line, '=');
|
|
|
|
|
|
|
|
if (!equals)
|
|
|
|
return error(_("bundle-uri: line is not of the form 'key=value'"));
|
|
|
|
if (line == equals || !*(equals + 1))
|
|
|
|
return error(_("bundle-uri: line has empty key or value"));
|
|
|
|
|
|
|
|
strbuf_add(&key, line, equals - line);
|
|
|
|
result = bundle_list_update(key.buf, equals + 1, list);
|
|
|
|
strbuf_release(&key);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|