git/transport.h

239 lines
7.6 KiB
C
Raw Normal View History

#ifndef TRANSPORT_H
#define TRANSPORT_H
#include "cache.h"
#include "run-command.h"
#include "remote.h"
struct git_transport_options {
unsigned thin : 1;
unsigned keep : 1;
unsigned followtags : 1;
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-26 01:16:17 +00:00
unsigned check_self_contained_and_connected : 1;
unsigned self_contained_and_connected : 1;
unsigned update_shallow : 1;
int depth;
const char *uploadpack;
const char *receivepack;
struct push_cas_option *cas;
};
enum transport_family {
TRANSPORT_FAMILY_ALL = 0,
TRANSPORT_FAMILY_IPV4,
TRANSPORT_FAMILY_IPV6
};
struct transport {
struct remote *remote;
const char *url;
void *data;
const struct ref *remote_refs;
/**
* Indicates whether we already called get_refs_list(); set by
* transport.c::transport_get_remote_refs().
*/
unsigned got_remote_refs : 1;
fetch: work around "transport-take-over" hack A Git-aware "connect" transport allows the "transport_take_over" to redirect generic transport requests like fetch(), push_refs() and get_refs_list() to the native Git transport handling methods. The take-over process replaces transport->data with a fake data that these method implementations understand. While this hack works OK for a single request, it breaks when the transport needs to make more than one requests. transport->data that used to hold necessary information for the specific helper to work correctly is destroyed during the take-over process. One codepath that this matters is "git fetch" in auto-follow mode; when it does not get all the tags that ought to point at the history it got (which can be determined by looking at the peeled tags in the initial advertisement) from the primary transfer, it internally makes a second request to complete the fetch. Because "take-over" hack has already destroyed the data necessary to talk to the transport helper by the time this happens, the second request cannot make a request to the helper to make another connection to fetch these additional tags. Mark such a transport as "cannot_reuse", and use a separate transport to perform the backfill fetch in order to work around this breakage. Note that this problem does not manifest itself when running t5802, because our upload-pack gives you all the necessary auto-followed tags during the primary transfer. You would need to step through "git fetch" in a debugger, stop immediately after the primary transfer finishes and writes these auto-followed tags, remove the tag references and repack/prune the repository to convince the "find-non-local-tags" procedure that the primary transfer failed to give us all the necessary tags, and then let it continue, in order to trigger the bug in the secondary transfer this patch fixes. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-07 22:47:18 +00:00
/*
* Transports that call take-over destroys the data specific to
* the transport type while doing so, and cannot be reused.
*/
unsigned cannot_reuse : 1;
/*
* A hint from caller that it will be performing a clone, not
* normal fetch. IOW the repository is guaranteed empty.
*/
unsigned cloning : 1;
/**
* Returns 0 if successful, positive if the option is not
* recognized or is inapplicable, and negative if the option
* is applicable but the value is invalid.
**/
int (*set_option)(struct transport *connection, const char *name,
const char *value);
/**
* Returns a list of the remote side's refs. In order to allow
* the transport to try to share connections, for_push is a
* hint as to whether the ultimate operation is a push or a fetch.
*
* If the transport is able to determine the remote hash for
* the ref without a huge amount of effort, it should store it
* in the ref's old_sha1 field; otherwise it should be all 0.
**/
struct ref *(*get_refs_list)(struct transport *transport, int for_push);
/**
* Fetch the objects for the given refs. Note that this gets
* an array, and should ignore the list structure.
*
* If the transport did not get hashes for refs in
* get_refs_list(), it should set the old_sha1 fields in the
* provided refs now.
**/
int (*fetch)(struct transport *transport, int refs_nr, struct ref **refs);
/**
* Push the objects and refs. Send the necessary objects, and
* then, for any refs where peer_ref is set and
* peer_ref->new_oid is different from old_oid, tell the
* remote side to update each ref in the list from old_oid to
* peer_ref->new_oid.
*
* Where possible, set the status for each ref appropriately.
*
* The transport must modify new_sha1 in the ref to the new
* value if the remote accepted the change. Note that this
* could be a different value from peer_ref->new_oid if the
* process involved generating new commits.
**/
int (*push_refs)(struct transport *transport, struct ref *refs, int flags);
int (*push)(struct transport *connection, int refspec_nr, const char **refspec, int flags);
int (*connect)(struct transport *connection, const char *name,
const char *executable, int fd[2]);
/** get_refs_list(), fetch(), and push_refs() can keep
* resources (such as a connection) reserved for further
* use. disconnect() releases these resources.
**/
int (*disconnect)(struct transport *connection);
char *pack_lockfile;
signed verbose : 3;
/**
* Transports should not set this directly, and should use this
* value without having to check isatty(2), -q/--quiet
* (transport->verbose < 0), etc. - checking has already been done
* in transport_set_verbosity().
**/
unsigned progress : 1;
/*
* If transport is at least potentially smart, this points to
* git_transport_options structure to use in case transport
* actually turns out to be smart.
*/
struct git_transport_options *smart_options;
enum transport_family family;
};
#define TRANSPORT_PUSH_ALL 1
#define TRANSPORT_PUSH_FORCE 2
#define TRANSPORT_PUSH_DRY_RUN 4
#define TRANSPORT_PUSH_MIRROR 8
#define TRANSPORT_PUSH_PORCELAIN 16
#define TRANSPORT_PUSH_SET_UPSTREAM 32
#define TRANSPORT_RECURSE_SUBMODULES_CHECK 64
#define TRANSPORT_PUSH_PRUNE 128
#define TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND 256
#define TRANSPORT_PUSH_NO_HOOK 512
#define TRANSPORT_PUSH_FOLLOW_TAGS 1024
#define TRANSPORT_PUSH_CERT_ALWAYS 2048
#define TRANSPORT_PUSH_CERT_IF_ASKED 4096
#define TRANSPORT_PUSH_ATOMIC 8192
#define TRANSPORT_SUMMARY_WIDTH (2 * DEFAULT_ABBREV + 3)
#define TRANSPORT_SUMMARY(x) (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - gettext_width(x)), (x)
/* Returns a transport suitable for the url */
struct transport *transport_get(struct remote *, const char *);
/*
* Check whether a transport is allowed by the environment. Type should
* generally be the URL scheme, as described in Documentation/git.txt
*/
int is_transport_allowed(const char *type);
transport: add a protocol-whitelist environment variable If we are cloning an untrusted remote repository into a sandbox, we may also want to fetch remote submodules in order to get the complete view as intended by the other side. However, that opens us up to attacks where a malicious user gets us to clone something they would not otherwise have access to (this is not necessarily a problem by itself, but we may then act on the cloned contents in a way that exposes them to the attacker). Ideally such a setup would sandbox git entirely away from high-value items, but this is not always practical or easy to set up (e.g., OS network controls may block multiple protocols, and we would want to enable some but not others). We can help this case by providing a way to restrict particular protocols. We use a whitelist in the environment. This is more annoying to set up than a blacklist, but defaults to safety if the set of protocols git supports grows). If no whitelist is specified, we continue to default to allowing all protocols (this is an "unsafe" default, but since the minority of users will want this sandboxing effect, it is the only sensible one). A note on the tests: ideally these would all be in a single test file, but the git-daemon and httpd test infrastructure is an all-or-nothing proposition rather than a test-by-test prerequisite. By putting them all together, we would be unable to test the file-local code on machines without apache. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-16 17:12:52 +00:00
/*
* Check whether a transport is allowed by the environment,
* and die otherwise.
transport: add a protocol-whitelist environment variable If we are cloning an untrusted remote repository into a sandbox, we may also want to fetch remote submodules in order to get the complete view as intended by the other side. However, that opens us up to attacks where a malicious user gets us to clone something they would not otherwise have access to (this is not necessarily a problem by itself, but we may then act on the cloned contents in a way that exposes them to the attacker). Ideally such a setup would sandbox git entirely away from high-value items, but this is not always practical or easy to set up (e.g., OS network controls may block multiple protocols, and we would want to enable some but not others). We can help this case by providing a way to restrict particular protocols. We use a whitelist in the environment. This is more annoying to set up than a blacklist, but defaults to safety if the set of protocols git supports grows). If no whitelist is specified, we continue to default to allowing all protocols (this is an "unsafe" default, but since the minority of users will want this sandboxing effect, it is the only sensible one). A note on the tests: ideally these would all be in a single test file, but the git-daemon and httpd test infrastructure is an all-or-nothing proposition rather than a test-by-test prerequisite. By putting them all together, we would be unable to test the file-local code on machines without apache. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-16 17:12:52 +00:00
*/
void transport_check_allowed(const char *type);
/*
* Returns true if the user has attempted to turn on protocol
* restrictions at all.
*/
int transport_restrict_protocols(void);
/* Transport options which apply to git:// and scp-style URLs */
/* The program to use on the remote side to send a pack */
#define TRANS_OPT_UPLOADPACK "uploadpack"
/* The program to use on the remote side to receive a pack */
#define TRANS_OPT_RECEIVEPACK "receivepack"
/* Transfer the data as a thin pack if not null */
#define TRANS_OPT_THIN "thin"
/* Check the current value of the remote ref */
#define TRANS_OPT_CAS "cas"
/* Keep the pack that was transferred if not null */
#define TRANS_OPT_KEEP "keep"
/* Limit the depth of the fetch if not null */
#define TRANS_OPT_DEPTH "depth"
/* Aggressively fetch annotated tags if possible */
#define TRANS_OPT_FOLLOWTAGS "followtags"
/* Accept refs that may update .git/shallow without --depth */
#define TRANS_OPT_UPDATE_SHALLOW "updateshallow"
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 18:17:07 +00:00
/* Send push certificates */
#define TRANS_OPT_PUSH_CERT "pushcert"
/**
* Returns 0 if the option was used, non-zero otherwise. Prints a
* message to stderr if the option is not used.
**/
int transport_set_option(struct transport *transport, const char *name,
const char *value);
void transport_set_verbosity(struct transport *transport, int verbosity,
int force_progress);
#define REJECT_NON_FF_HEAD 0x01
#define REJECT_NON_FF_OTHER 0x02
#define REJECT_ALREADY_EXISTS 0x04
push: introduce REJECT_FETCH_FIRST and REJECT_NEEDS_FORCE When we push to update an existing ref, if: * the object at the tip of the remote is not a commit; or * the object we are pushing is not a commit, it won't be correct to suggest to fetch, integrate and push again, as the old and new objects will not "merge". We should explain that the push must be forced when there is a non-committish object is involved in such a case. If we do not have the current object at the tip of the remote, we do not even know that object, when fetched, is something that can be merged. In such a case, suggesting to pull first just like non-fast-forward case may not be technically correct, but in practice, most such failures are seen when you try to push your work to a branch without knowing that somebody else already pushed to update the same branch since you forked, so "pull first" would work as a suggestion most of the time. And if the object at the tip is not a commit, "pull first" will fail, without making any permanent damage. As a side effect, it also makes the error message the user will get during the next "push" attempt easier to understand, now the user is aware that a non-commit object is involved. In these cases, the current code already rejects such a push on the client end, but we used the same error and advice messages as the ones used when rejecting a non-fast-forward push, i.e. pull from there and integrate before pushing again. Introduce new rejection reasons and reword the messages appropriately. [jc: with help by Peff on message details] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-23 21:55:30 +00:00
#define REJECT_FETCH_FIRST 0x08
#define REJECT_NEEDS_FORCE 0x10
int transport_push(struct transport *connection,
int refspec_nr, const char **refspec, int flags,
unsigned int * reject_reasons);
const struct ref *transport_get_remote_refs(struct transport *transport);
int transport_fetch_refs(struct transport *transport, struct ref *refs);
void transport_unlock_pack(struct transport *transport);
int transport_disconnect(struct transport *transport);
char *transport_anonymize_url(const char *url);
void transport_take_over(struct transport *transport,
struct child_process *child);
int transport_connect(struct transport *transport, const char *name,
const char *exec, int fd[2]);
/* Transport methods defined outside transport.c */
int transport_helper_init(struct transport *transport, const char *name);
int bidirectional_transfer_loop(int input, int output);
/* common methods used by transport.c and builtin/send-pack.c */
void transport_verify_remote_names(int nr_heads, const char **heads);
void transport_update_tracking_ref(struct remote *remote, struct ref *ref, int verbose);
int transport_refs_pushed(struct ref *ref);
void transport_print_push_status(const char *dest, struct ref *refs,
int verbose, int porcelain, unsigned int *reject_reasons);
typedef void alternate_ref_fn(const struct ref *, void *);
extern void for_each_alternate_ref(alternate_ref_fn, void *);
#endif