development/git - HydraGit

mirror of https://github.com/git/git synced 2024-07-17 02:57:48 +00:00

Author	SHA1	Message	Date
Jeff King	beed336c3e	http: never use curl_easy_perform We currently don't reuse http connections when fetching via the smart-http protocol. This is bad because the TCP handshake introduces latency, and especially because SSL connection setup may be non-trivial. We can fix it by consistently using curl's "multi" interface. The reason is rather complicated: Our http code has two ways of being used: queuing many "slots" to be fetched in parallel, or fetching a single request in a blocking manner. The parallel code is built on curl's "multi" interface. Most of the single-request code uses http_request, which is built on top of the parallel code (we just feed it one slot, and wait until it finishes). However, one could also accomplish the single-request scheme by avoiding curl's multi interface entirely and just using curl_easy_perform. This is simpler, and is used by post_rpc in the smart-http protocol. It does work to use the same curl handle in both contexts, as long as it is not at the same time. However, internally curl may not share all of the cached resources between both contexts. In particular, a connection formed using the "multi" code will go into a reuse pool connected to the "multi" object. Further requests using the "easy" interface will not be able to reuse that connection. The smart http protocol does ref discovery via http_request, which uses the "multi" interface, and then follows up with the "easy" interface for its rpc calls. As a result, we make two HTTP connections rather than reusing a single one. We could teach the ref discovery to use the "easy" interface. But it is only once we have done this discovery that we know whether the protocol will be smart or dumb. If it is dumb, then our further requests, which want to fetch objects in parallel, will not be able to reuse the same connection. Instead, this patch switches post_rpc to build on the parallel interface, which means that we use it consistently everywhere. It's a little more complicated to use, but since we have the infrastructure already, it doesn't add any code; we can just factor out the relevant bits from http_request. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-18 15:50:57 -08:00
Junio C Hamano	92251b1b5b	Merge branch 'nd/shallow-clone' Fetching from a shallow-cloned repository used to be forbidden, primarily because the codepaths involved were not carefully vetted and we did not bother supporting such usage. This attempts to allow object transfer out of a shallow-cloned repository in a controlled way (i.e. the receiver become a shallow repository with truncated history). * nd/shallow-clone: (31 commits) t5537: fix incorrect expectation in test case 10 shallow: remove unused code send-pack.c: mark a file-local function static git-clone.txt: remove shallow clone limitations prune: clean .git/shallow after pruning objects clone: use git protocol for cloning shallow repo locally send-pack: support pushing from a shallow clone via http receive-pack: support pushing to a shallow clone via http smart-http: support shallow fetch/clone remote-curl: pass ref SHA-1 to fetch-pack as well send-pack: support pushing to a shallow clone receive-pack: allow pushes that update .git/shallow connected.c: add new variant that runs with --shallow-file add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses receive/send-pack: support pushing from a shallow clone receive-pack: reorder some code in unpack() fetch: add --update-shallow to accept refs that update .git/shallow upload-pack: make sure deepening preserves shallow roots fetch: support fetching from a shallow repository clone: support remote shallow repository ...	2014-01-17 12:21:20 -08:00
Junio C Hamano	ad70448576	Merge branch 'cc/starts-n-ends-with' Remove a few duplicate implementations of prefix/suffix comparison functions, and rename them to starts_with and ends_with. * cc/starts-n-ends-with: replace {pre,suf}fixcmp() with {starts,ends}_with() strbuf: introduce starts_with() and ends_with() builtin/remote: remove postfixcmp() and use suffixcmp() instead environment: normalize use of prefixcmp() by removing " != 0"	2013-12-17 12:02:44 -08:00
Nguyễn Thái Ngọc Duy	16094885ca	smart-http: support shallow fetch/clone Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-10 16:14:18 -08:00
Nguyễn Thái Ngọc Duy	58f2ed051f	remote-curl: pass ref SHA-1 to fetch-pack as well Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-10 16:14:18 -08:00
Nguyễn Thái Ngọc Duy	b06dcd7d68	connect.c: teach get_remote_heads to parse "shallow" lines No callers pass a non-empty pointer as shallow_points at this stage. As a result, all clients still refuse to talk to shallow repository on the other end. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-10 16:14:16 -08:00
Christian Couder	5955654823	replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-05 14:13:21 -08:00
Junio C Hamano	c5a77e8f92	Merge branch 'bc/http-100-continue' Issue "100 Continue" responses to help use of GSS-Negotiate authentication scheme over HTTP transport when needed. * bc/http-100-continue: remote-curl: fix large pushes with GSSAPI remote-curl: pass curl slot_results back through run_slot http: return curl's AUTHAVAIL via slot_results	2013-12-05 12:58:59 -08:00
Brian M. Carlson	c80d96ca0c	remote-curl: fix large pushes with GSSAPI Due to an interaction between the way libcurl handles GSSAPI authentication over HTTP and the way git uses libcurl, large pushes (those over http.postBuffer bytes) would fail due to an authentication failure requiring a rewind of the curl buffer. Such a rewind was not possible because the data did not fit into the entire buffer. Enable the use of the Expect: 100-continue header for large requests where the server offers GSSAPI authentication to avoid this issue, since the request would otherwise fail. This allows git to get the authentication data right before sending the pack contents. Existing cases where pushes would succeed, including small requests using GSSAPI, still disable the use of 100 Continue, as it causes problems for some remote HTTP implementations (servers and proxies). Signed-off-by: Brian M. Carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2013-10-31 10:13:40 -07:00
Jeff King	3a347ed707	remote-curl: pass curl slot_results back through run_slot Some callers may want to know more than just the integer error code we return. Let them optionally pass a slot_results struct to fill in (or NULL if they do not care). In either case we continue to return the integer code. We can also give probe_rpc the same treatment (since it builds directly on run_slot). Signed-off-by: Jeff King <peff@peff.net>	2013-10-31 10:05:59 -07:00
Junio C Hamano	177f0a4009	Merge branch 'jk/http-auth-redirects' Handle the case where http transport gets redirected during the authorization request better. * jk/http-auth-redirects: http.c: Spell the null pointer as NULL remote-curl: rewrite base url from info/refs redirects remote-curl: store url as a strbuf remote-curl: make refs_url a strbuf http: update base URLs when we see redirects http: provide effective url to callers http: hoist credential request out of handle_curl_result http: refactor options to http_get_* http_request: factor out curlinfo_strbuf http_get_file: style fixes	2013-10-30 12:09:53 -07:00
Jeff King	050ef3655c	remote-curl: rewrite base url from info/refs redirects For efficiency and security reasons, an earlier commit in this series taught http_get_* to re-write the base url based on redirections we saw while making a specific request. This commit wires that option into the info/refs request, meaning that a redirect from http://example.com/foo.git/info/refs to https://example.com/bar.git/info/refs will behave as if "https://example.com/bar.git" had been provided to git in the first place. The tests bear some explanation. We introduce two new hierearchies into the httpd test config: 1. Requests to /smart-redir-limited will work only for the initial info/refs request, but not any subsequent requests. As a result, we can confirm whether the client is re-rooting its requests after the initial contact, since otherwise it will fail (it will ask for "repo.git/git-upload-pack", which is not redirected). 2. Requests to smart-redir-auth will redirect, and require auth after the redirection. Since we are using the redirected base for further requests, we also update the credential struct, in order not to mislead the user (or credential helpers) about which credential is needed. We can therefore check the GIT_ASKPASS prompts to make sure we are prompting for the new location. Because we have neither multiple servers nor https support in our test setup, we can only redirect between paths, meaning we need to turn on credential.useHttpPath to see the difference. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>	2013-10-14 17:01:34 -07:00
Jeff King	b227bbc43a	remote-curl: store url as a strbuf We use a strbuf to generate the string containing the remote URL, but then detach it to a bare pointer. This makes it harder to later manipulate the URL, as we have forgotten the length (and the allocation semantics are not as clear). Let's instead keep the strbuf around. As a bonus, this eliminates a confusing double-use of the "buf" strbuf in main(). Prior to this, it was used both for constructing the url, and for reading commands from stdin. The downside is that we have to update each call site to refer to "url.buf" rather than just "url" when they want the C string. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>	2013-10-14 17:01:15 -07:00
Jeff King	c65d5692cd	remote-curl: make refs_url a strbuf In the discover_refs function, we use a strbuf named "buffer" for multiple purposes. First we build the info/refs URL in it, and then detach that to a bare pointer. Then, we use the same strbuf to store the result of fetching the refs. Let's instead keep a separate refs_url strbuf. This is less confusing, as the "buffer" strbuf is now used for only one thing. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>	2013-10-14 16:57:04 -07:00
Jeff King	2501aff8b7	http: hoist credential request out of handle_curl_result When we are handling a curl response code in http_request or in the remote-curl RPC code, we use the handle_curl_result helper to translate curl's response into an easy-to-use code. When we see an HTTP 401, we do one of two things: 1. If we already had a filled-in credential, we mark it as rejected, and then return HTTP_NOAUTH to indicate to the caller that we failed. 2. If we didn't, then we ask for a new credential and tell the caller HTTP_REAUTH to indicate that they may want to try again. Rejecting in the first case makes sense; it is the natural result of the request we just made. However, prompting for more credentials in the second step does not always make sense. We do not know for sure that the caller is going to make a second request, and nor are we sure that it will be to the same URL. Logically, the prompt belongs not to the request we just finished, but to the request we are (maybe) about to make. In practice, it is very hard to trigger any bad behavior. Currently, if we make a second request, it will always be to the same URL (even in the face of redirects, because curl handles the redirects internally). And we almost always retry on HTTP_REAUTH these days. The one exception is if we are streaming a large RPC request to the server (e.g., a pushed packfile), in which case we cannot restart. It's extremely unlikely to see a 401 response at this stage, though, as we would typically have seen it when we sent a probe request, before streaming the data. This patch drops the automatic prompt out of case 2, and instead requires the caller to do it. This is a few extra lines of code, and the bug it fixes is unlikely to come up in practice. But it is conceptually cleaner, and paves the way for better handling of credentials across redirects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>	2013-10-14 16:55:13 -07:00
Jeff King	1bbcc224cc	http: refactor options to http_get_* Over time, the http_get_strbuf function has grown several optional parameters. We now have a bitfield with multiple boolean options, as well as an optional strbuf for returning the content-type of the response. And a future patch in this series is going to add another strbuf option. Treating these as separate arguments has a few downsides: 1. Most call sites need to add extra NULLs and 0s for the options they aren't interested in. 2. The http_get_* functions are actually wrappers around 2 layers of low-level implementation functions. We have to pass these options through individually. 3. The http_get_strbuf wrapper learned these options, but nobody bothered to do so for http_get_file, even though it is backed by the same function that does understand the options. Let's consolidate the options into a single struct. For the common case of the default options, we'll allow callers to simply pass a NULL for the options struct. The resulting code is often a few lines longer, but it ends up being easier to read (and to change as we add new options, since we do not need to update each call site). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>	2013-09-30 17:21:59 -07:00
Junio C Hamano	2233ad4534	Merge branch 'jc/push-cas' Allow a safer "rewind of the remote tip" push than blind "--force", by requiring that the overwritten remote ref to be unchanged since the new history to replace it was prepared. The machinery is more or less ready. The "--force" option is again the big red button to override any safety, thanks to J6t's sanity (the original round allowed --lockref to defeat --force). The logic to choose the default implemented here is fragile (e.g. "git fetch" after seeing a failure will update the remote-tracking branch and will make the next "push" pass, defeating the safety pretty easily). It is suitable only for the simplest workflows, and it may hurt users more than it helps them. * jc/push-cas: push: teach --force-with-lease to smart-http transport send-pack: fix parsing of --force-with-lease option t5540/5541: smart-http does not support "--force-with-lease" t5533: test "push --force-with-lease" push --force-with-lease: tie it all together push --force-with-lease: implement logic to populate old_sha1_expect[] remote.c: add command line option parser for "--force-with-lease" builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN cache.h: move remote/connect API out of it	2013-09-09 14:30:29 -07:00
Junio C Hamano	711b276974	Merge branch 'nd/clone-connectivity-shortcut' * nd/clone-connectivity-shortcut: smart http: use the same connectivity check on cloning	2013-09-09 14:30:01 -07:00
Junio C Hamano	05c1eb1034	push: teach --force-with-lease to smart-http transport We have been passing enough information to enable the compare-and-swap logic down to the transport layer, but the transport helper was not passing it to smart-http transport. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-02 16:11:06 -07:00
Nguyễn Thái Ngọc Duy	9ba380481c	smart http: use the same connectivity check on cloning This is an extension of `c6807a4` (clone: open a shortcut for connectivity check - 2013-05-26) to reduce the cost of connectivity check at clone time, this time with smart http protocol. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-23 12:18:18 -07:00
Junio C Hamano	222b1212c1	remote-http: use argv-array Instead of using a hand-managed argument array, use argv-array API to manage dynamically formulated command line. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-09 12:34:16 -07:00
Jeff King	de89f0b25a	remote-curl: die directly with http error messages When we encounter an unknown http error (e.g., a 403), we hand the error code to http_error, which then prints it with error(). After that we die with the redundant message "HTTP request failed". Instead, let's just drop http_error entirely, which does nothing but pass arguments to error(), and instead die directly with a useful message. So before: $ git clone https://example.com/repo.git Cloning into 'repo'... error: unable to access 'https://example.com/repo.git': The requested URL returned error: 403 Forbidden fatal: HTTP request failed and after: $ git clone https://example.com/repo.git Cloning into 'repo'... fatal: unable to access 'https://example.com/repo.git': The requested URL returned error: 403 Forbidden Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-06 18:56:45 -07:00
Jeff King	67d2a7b5c5	http: simplify http_error helper function This helper function should really be a one-liner that prints an error message, but it has ended up unnecessarily complicated: 1. We call error() directly when we fail to start the curl request, so we must later avoid printing a duplicate error in http_error(). It would be much simpler in this case to just stuff the error message into our usual curl_errorstr buffer rather than printing it ourselves. This means that http_error does not even have to care about curl's exit value (the interesting part is in the errorstr buffer already). 2. We return the "ret" value passed in to us, but none of the callers actually cares about our return value. We can just drop this entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-06 18:56:44 -07:00
Jeff King	d5ccbe4dfb	remote-curl: consistently report repo url for http errors When we report http errors in fetching the initial ref advertisement, we show the full URL we attempted to use, including "info/refs?service=git-upload-pack". While this may be useful for debugging a broken server, it is unnecessarily verbose and confusing for most cases, in which the client user is not even the same person as the owner of the repository. Let's just show the repository URL; debugging can happen with GIT_CURL_VERBOSE, which shows way more useful information, anyway. At the same time, let's also make sure to mention the repository URL when we report failed authentication (previously we said only "Authentication failed"). Knowing the URL can help the user realize why authentication failed (e.g., they meant to push to remote A, not remote B). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-06 18:56:43 -07:00
Jeff King	cfa0f4040d	remote-curl: always show friendlier 404 message When we get an http 404 trying to get the initial list of refs from the server, we try to be helpful and remind the user that update-server-info may need to be run. This looks like: $ git clone https://github.com/non/existent Cloning into 'existent'... fatal: https://github.com/non/existent/info/refs?service=git-upload-pack not found: did you run git update-server-info on the server? Suggesting update-server-info may be a good suggestion for users who are in control of the server repo and who are planning to set up dumb http. But for users of smart http, and especially users who are not in control of the server repo, the advice is useless and confusing. Since most people are expected to use smart http these days, it does not make sense to keep the update-server-info hint. We not only drop the mention of update-server-info, but also show only the main repo URL, not the full "info/refs" and service parameter. These elements may be useful for debugging a broken server configuration, but in the majority of cases, users are not fetching from their own repositories, but rather from other people's repositories; they have neither the power nor interest to fix a broken configuration, and the extra components just make the message more confusing. Users who do want to debug can and should use GIT_CURL_VERBOSE to get more complete information on the actual URLs visited. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-06 18:56:43 -07:00
Jeff King	110bcdc3d0	remote-curl: let servers override http 404 advice When we get an http 404 trying to get the initial list of refs from the server, we try to be helpful and remind the user that update-server-info may need to be run. This looks like: $ git clone https://github.com/non/existent Cloning into 'existent'... fatal: https://github.com/non/existent/info/refs?service=git-upload-pack not found: did you run git update-server-info on the server? Suggesting update-server-info may be a good suggestion for users who are in control of the server repo and who are planning to set up dumb http. But for users of smart http, and especially users who are not in control of the server repo, the advice is useless and confusing. The previous patch taught remote-curl to show custom advice from the server when it is available. When we have shown messages from the server, we can also drop our custom advice; what the server has to say is likely to be more accurate and helpful. We not only drop the mention of update-server-info, but also show only the main repo URL, not the full "info/refs" and service parameter. These elements may be useful for debugging a broken server configuration, but again, anything the server has provided is likely to be more useful (and one can still use GIT_CURL_VERBOSE to get much more complete debugging information). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-06 18:56:42 -07:00
Jeff King	426e70d4a1	remote-curl: show server content on http errors If an http request to a remote git server fails, we show only the http response code, or sometimes a custom message for particular codes. This gives the server no opportunity to offer a more detailed explanation of the reason for the failure, or to give extra advice. This patch teaches remote-curl to record and display the body content of a failed http response. We only display such responses when the content-type is advertised as text/plain, as it is the most likely to look presentable on the user's terminal (and it is hoped to be a good indication that the message is intended for git clients, and not for a web browser). Each line of the new output is prepended with "remote:". Example output may look like this (assuming the server is configured to display such a helpful message): $ GIT_SMART_HTTP=0 git clone https://example.com/some/repo.git Cloning into 'repo'... remote: Sorry, fetching via dumb http is forbidden. remote: Please upgrade your git client to v1.6.6 or greater remote: and make sure that smart-http is enabled. error: The requested URL returned error: 403 while accessing http://localhost:5001/some/repo.git/info/refs fatal: HTTP request failed For the sake of simplicity, we only record and display these errors during the initial fetch of the ref list, as that is the initial contact with the server and where the most common, interesting errors happen (and there is already precedent, as that is the only place we currently massage http error codes into more helpful messages). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-06 18:56:42 -07:00
Jeff King	2a4552021a	remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-24 00:17:38 -08:00
Jeff King	b8054bbee7	remote-curl: move ref-parsing code up in file The ref-parsing functions are static. Let's move them up in the file to be available to more functions, which will help us with later refactoring. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-24 00:17:38 -08:00
Jeff King	5dbf43602d	remote-curl: pass buffer straight to get_remote_heads Until recently, get_remote_heads only knew how to read refs from a file descriptor. To hack around this, we spawned a thread (or forked a process) to write the buffer back to us. Now that we can just pass it our buffer directly, we don't have to use this hack anymore. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-24 00:17:38 -08:00
Jeff King	85edf4f58b	teach get_remote_heads to read from a memory buffer Now that we can read packet data from memory as easily as a descriptor, get_remote_heads can take either one as a source. This will allow further refactoring in remote-curl. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-24 00:17:38 -08:00
Jeff King	4981fe750b	pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-24 00:14:15 -08:00
Jeff King	819b929d33	pkt-line: teach packet_read_line to chomp newlines The packets sent during ref negotiation are all terminated by newline; even though the code to chomp these newlines is short, we end up doing it in a lot of places. This patch teaches packet_read_line to auto-chomp the trailing newline; this lets us get rid of a lot of inline chomping code. As a result, some call-sites which are not reading line-oriented data (e.g., when reading chunks of packfiles alongside sideband) transition away from packet_read_line to the generic packet_read interface. This patch converts all of the existing callsites. Since the function signature of packet_read_line does not change (but its behavior does), there is a possibility of new callsites being introduced in later commits, silently introducing an incompatibility. However, since a later patch in this series will change the signature, such a commit would have to be merged directly into this commit, not to the tip of the series; we can therefore ignore the issue. This is an internal cleanup and should produce no change of behavior in the normal case. However, there is one corner case to note. Callers of packet_read_line have never been able to tell the difference between a flush packet ("0000") and an empty packet ("0004"), as both cause packet_read_line to return a length of 0. Readers treat them identically, even though Documentation/technical/protocol-common.txt says we must not; it also says that implementations should not send an empty pkt-line. By stripping out the newline before the result gets to the caller, we will now treat the newline-only packet ("0005\n") the same as an empty packet, which in turn gets treated like a flush packet. In practice this doesn't matter, as neither empty nor newline-only packets are part of git's protocols (at least not for the line-oriented bits, and readers who are not expecting line-oriented packets will be calling packet_read directly, anyway). But even if we do decide to care about the distinction later, it is orthogonal to this patch. The right place to tighten would be to stop treating empty packets as flush packets, and this change does not make doing so any harder. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-20 13:42:21 -08:00
Jeff King	cdf4fb8e33	pkt-line: drop safe_write function This is just write_or_die by another name. The one distinction is that write_or_die will treat EPIPE specially by suppressing error messages. That's fine, as we die by SIGPIPE anyway (and in the off chance that it is disabled, write_or_die will simulate it). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-20 13:42:21 -08:00
Shawn Pearce	4656bf47fc	Verify Content-Type from smart HTTP servers Before parsing a suspected smart-HTTP response verify the returned Content-Type matches the standard. This protects a client from attempting to process a payload that smells like a smart-HTTP server response. JGit has been doing this check on all responses since the dawn of time. I mistakenly failed to include it in git-core when smart HTTP was introduced. At the time I didn't know how to get the Content-Type from libcurl. I punted, meant to circle back and fix this, and just plain forgot about it. Signed-off-by: Shawn Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-04 10:22:36 -08:00
Junio C Hamano	fda800f0b1	Merge branch 'jk/maint-http-half-auth-fetch' Finishing touches to squelch a compiler warning. * jk/maint-http-half-auth-fetch: remote-curl.c: Fix a compiler warning	2012-11-21 11:59:29 -08:00
Ramsay Jones	377115493a	remote-curl.c: Fix a compiler warning In particular, gcc issues an "'gzip_size' might be used uninitialized" warning (-Wuninitialized). However, this warning is a false positive, since the 'gzip_size' variable would not, in fact, be used uninitialized. In order to suppress the warning, we simply initialise the variable to zero in it's declaration. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-21 11:54:32 -08:00
Junio C Hamano	a9bb4e55a3	Merge branch 'jk/maint-http-half-auth-fetch' Fixes fetch from servers that ask for auth only during the actual packing phase. This is not really a recommended configuration, but it cleans up the code at the same time. * jk/maint-http-half-auth-fetch: remote-curl: retry failed requests for auth even with gzip remote-curl: hoist gzip buffer size to top of post_rpc	2012-11-20 10:30:17 -08:00
Jeff King	2e736fd5e9	remote-curl: retry failed requests for auth even with gzip Commit `b81401c` taught the post_rpc function to retry the http request after prompting for credentials. However, it did not handle two cases: 1. If we have a large request, we do not retry. That's OK, since we would have sent a probe (with retry) already. 2. If we are gzipping the request, we do not retry. That was considered OK, because the intended use was for push (e.g., listing refs is OK, but actually pushing objects is not), and we never gzip on push. This patch teaches post_rpc to retry even a gzipped request. This has two advantages: 1. It is possible to configure a "half-auth" state for fetching, where the set of refs and their sha1s are advertised, but one cannot actually fetch objects. This is not a recommended configuration, as it leaks some information about what is in the repository (e.g., an attacker can try brute-forcing possible content in your repository and checking whether it matches your branch sha1). However, it can be slightly more convenient, since a no-op fetch will not require a password at all. 2. It future-proofs us should we decide to ever gzip more requests. Signed-off-by: Jeff King <peff@peff.net>	2012-10-31 07:45:13 -04:00
Jeff King	df126e108b	remote-curl: hoist gzip buffer size to top of post_rpc When we gzip the post data for a smart-http rpc request, we compute the gzip body and its size inside the "use_gzip" conditional. We keep track of the body after the conditional ends, but not the size. Let's remember both, which will enable us to retry failed gzip requests in a future patch. Signed-off-by: Jeff King <peff@peff.net>	2012-10-31 07:45:08 -04:00
Jeff King	58f3f9893d	Merge branch 'jk/maint-http-init-not-in-result-handler' Further clean-up to the http codepath that picks up results after cURL library is done with one request slot. * jk/maint-http-init-not-in-result-handler: http: do not set up curl auth after a 401 remote-curl: do not call run_slot repeatedly	2012-10-29 04:13:09 -04:00
Junio C Hamano	053a08f5bb	Merge branch 'jk/maint-http-half-auth-push' Fixes a regression in maint-1.7.11 (v1.7.11.7), maint (v1.7.12.1) and master (v1.8.0-rc0). * jk/maint-http-half-auth-push: http: fix segfault in handle_curl_result	2012-10-16 11:44:37 -07:00
Jeff King	1960897ebc	http: do not set up curl auth after a 401 When we get an http 401, we prompt for credentials and put them in our global credential struct. We also feed them to the curl handle that produced the 401, with the intent that they will be used on a retry. When the code was originally introduced in commit `42653c0`, this was a necessary step. However, since `dfa1725`, we always feed our global credential into every curl handle when we initialize the slot with get_active_slot. So every further request already feeds the credential to curl. Moreover, accessing the slot here is somewhat dubious. After the slot has produced a response, we don't actually control it any more. If we are using curl_multi, it may even have been re-initialized to handle a different request. It just so happens that we will reuse the curl handle within the slot in such a case, and that because we only keep one global credential, it will be the one we want. So the current code is not buggy, but it is misleading. By cleaning it up, we can remove the slot argument entirely from handle_curl_result, making it much more obvious that slots should not be accessed after they are marked as finished. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-10-12 09:45:15 -07:00
Jeff King	abf8df869c	remote-curl: do not call run_slot repeatedly Commit `b81401c` (http: prompt for credentials on failed POST) taught post_rpc to call run_slot in a loop in order to retry a request after asking the user for credentials. However, after a call to run_slot we will have called finish_active_slot. This means we have released the slot, and we should no longer look at it. As it happens, this does not cause any bugs in the current code, since we know that we are not using curl_multi in this code path, and therefore nobody will have taken over our slot in the meantime. However, it is good form to actually call get_active_slot again. It also future proofs us against changes in the http code. We can do this by jumping back to a retry label at the top of our function. We just need to reorder a few setup lines that should not be repeated; everything else within the loop is either idempotent, needs to be repeated, or in a path we do not follow (e.g., we do not even try when large_request is set, because we don't know how much data we might have streamed from our helper program). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-10-12 09:45:13 -07:00
Jeff King	188923f0d1	http: fix segfault in handle_curl_result When we create an http active_request_slot, we can set its "results" pointer back to local storage. The http code will fill in the details of how the request went, and we can access those details even after the slot has been cleaned up. Commit `8809703` (http: factor out http error code handling) switched us from accessing our local results struct directly to accessing it via the "results" pointer of the slot. That means we're accessing the slot after it has been marked as finished, defeating the whole purpose of keeping the results storage separate. Most of the time this doesn't matter, as finishing the slot does not actually clean up the pointer. However, when using curl's multi interface with the dumb-http revision walker, we might actually start a new request before handing control back to the original caller. In that case, we may reuse the slot, zeroing its results pointer, and leading the original caller to segfault while looking for its results inside the slot. Instead, we need to pass a pointer to our local results storage to the handle_curl_result function, rather than relying on the pointer in the slot struct. This matches what the original code did before the refactoring (which did not use a separate function, and therefore just accessed the results struct directly). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-10-12 09:42:31 -07:00
Junio C Hamano	fb1e4a85bf	Merge branch 'jk/smart-http-switch' Allows users to turn off smart-http when talking to dumb-only servers. * jk/smart-http-switch: remote-curl: let users turn off smart http remote-curl: rename is_http variable	2012-09-29 22:28:25 -07:00
Junio C Hamano	c318040a36	Merge branch 'sp/maint-http-enable-gzip' Allows a more common 'gzip' Accept-Encoding to be used. * sp/maint-http-enable-gzip: Enable info/refs gzip decompression in HTTP client	2012-09-29 22:28:20 -07:00
Jeff King	02572c2e3a	remote-curl: let users turn off smart http Usually there is no need for users to specify whether an http remote is smart or dumb; the protocol is designed so that a single initial request is made, and the client can determine the server's capability from the response. However, some misconfigured dumb-only servers may not like the initial request by a smart client, as it contains a query string. Until recently, commit `703e6e7` worked around this by making a second request. However, that commit was recently reverted due to its side effect of masking the initial request's error code. Since git has had that workaround for several years, we don't know exactly how many such misconfigured servers are out there. The reversion of `703e6e7` assumes they are rare enough not to worry about. Still, that reversion leaves somebody who does run into such a server with no escape hatch at all. Let's give them an environment variable they can tweak to perform the "dumb" request. This is intentionally not a documented interface. It's overly simple and is really there for debugging in case somebody does complain about git not working with their server. A real user-facing interface would entail a per-remote or per-URL config variable. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-09-21 10:33:11 -07:00
Jeff King	243c329c1e	remote-curl: rename is_http variable We don't actually care whether the connection is http or not; what we care about is whether it might be smart http. Rename the variable to be more accurate, which will make it easier to later make smart-http optional. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-09-20 10:48:45 -07:00
Shawn O. Pearce	aa90b9697f	Enable info/refs gzip decompression in HTTP client Some HTTP servers try to use gzip compression on the /info/refs request to save transfer bandwidth. Repositories with many tags may find the /info/refs request can be gzipped to be 50% of the original size due to the few but often repeated bytes used (hex SHA-1 and commonly digits in tag names). For most HTTP requests enable "Accept-Encoding: gzip" ensuring the /info/refs payload can use this encoding format. Only request gzip encoding from servers. Although deflate is supported by libcurl, most servers have standardized on gzip encoding for compression as that is what most browsers support. Asking for deflate increases request sizes by a few bytes, but is unlikely to ever be used by a server. Disable the Accept-Encoding header on probe RPCs as response bodies are supposed to be exactly 4 bytes long, "0000". The HTTP headers requesting and indicating compression use more space than the data transferred in the body. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-09-20 10:26:50 -07:00

1 2 3

124 commits