Commit graph

24 commits

Author SHA1 Message Date
Junio C Hamano
777e75b605 Merge branch 'jk/http-backend-deadlock'
Communication between the HTTP server and http_backend process can
lead to a dead-lock when relaying a large ref negotiation request.
Diagnose the situation better, and mitigate it by reading such a
request first into core (to a reasonable limit).

* jk/http-backend-deadlock:
  http-backend: spool ref negotiation requests to buffer
  t5551: factor out tag creation
  http-backend: fix die recursion with custom handler
2015-06-01 12:45:09 -07:00
Jeff King
6bc0cb5176 http-backend: spool ref negotiation requests to buffer
When http-backend spawns "upload-pack" to do ref
negotiation, it streams the http request body to
upload-pack, who then streams the http response back to the
client as it reads. In theory, git can go full-duplex; the
client can consume our response while it is still sending
the request.  In practice, however, HTTP is a half-duplex
protocol. Even if our client is ready to read and write
simultaneously, we may have other HTTP infrastructure in the
way, including the webserver that spawns our CGI, or any
intermediate proxies.

In at least one documented case[1], this leads to deadlock
when trying a fetch over http. What happens is basically:

  1. Apache proxies the request to the CGI, http-backend.

  2. http-backend gzip-inflates the data and sends
     the result to upload-pack.

  3. upload-pack acts on the data and generates output over
     the pipe back to Apache. Apache isn't reading because
     it's busy writing (step 1).

This works fine most of the time, because the upload-pack
output ends up in a system pipe buffer, and Apache reads
it as soon as it finishes writing. But if both the request
and the response exceed the system pipe buffer size, then we
deadlock (Apache blocks writing to http-backend,
http-backend blocks writing to upload-pack, and upload-pack
blocks writing to Apache).

We need to break the deadlock by spooling either the input
or the output. In this case, it's ideal to spool the input,
because Apache does not start reading either stdout _or_
stderr until we have consumed all of the input. So until we
do so, we cannot even get an error message out to the
client.

The solution is fairly straight-forward: we read the request
body into an in-memory buffer in http-backend, freeing up
Apache, and then feed the data ourselves to upload-pack. But
there are a few important things to note:

  1. We limit the in-memory buffer to prevent an obvious
     denial-of-service attack. This is a new hard limit on
     requests, but it's unlikely to come into play. The
     default value is 10MB, which covers even the ridiculous
     100,000-ref negotation in the included test (that
     actually caps out just over 5MB). But it's configurable
     on the off chance that you don't mind spending some
     extra memory to make even ridiculous requests work.

  2. We must take care only to buffer when we have to. For
     pushes, the incoming packfile may be of arbitrary
     size, and we should connect the input directly to
     receive-pack. There's no deadlock problem here, though,
     because we do not produce any output until the whole
     packfile has been read.

     For upload-pack's initial ref advertisement, we
     similarly do not need to buffer. Even though we may
     generate a lot of output, there is no request body at
     all (i.e., it is a GET, not a POST).

[1] http://article.gmane.org/gmane.comp.version-control.git/269020

Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 20:43:18 -07:00
Jeff King
d595bdc17f doc: put example URLs and emails inside literal backticks
This makes sure that AsciiDoc does not turn them into links.
Regular AsciiDoc does not catch these cases, but AsciiDoctor
does treat them as links.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-12 22:14:46 -07:00
Junio C Hamano
efb4ec68b8 Merge commit 'doc/http-backend: missing accent grave in literal mark-up'
* commit '5df05146d5cb94628a3dfc53063c802ee1152cec':
  doc/http-backend: missing accent grave in literal mark-up
2014-04-09 11:45:04 -07:00
Thomas Ackermann
5df05146d5 doc/http-backend: missing accent grave in literal mark-up
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-09 11:43:56 -07:00
Michael Haggerty
8169007468 doc: remote author/documentation sections from more pages
We decided at 48bb914e (doc: drop author/documentation sections from
most pages, 2011-03-11) to remove "author" and "documentation"
sections from our documentation.  Remove a few stragglers.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-27 08:34:34 -08:00
Jeff King
b0808819e5 doc/http-backend: match query-string in apache half-auth example
When setting up a "half-auth" repository in which reads can
be done anonymously but writes require authentication, it is
best if the server can require authentication for both the
ref advertisement and the actual receive-pack POSTs. This
alleviates the need for the admin to set http.receivepack in
the repositories, and means that the client is challenged
for credentials immediately, instead of partway through the
push process (and git clients older than v1.7.11.7 had
trouble handling these challenges).

Since detecting a push during the ref advertisement requires
matching the query string, and this is non-trivial to do in
Apache, we have traditionally punted and instructed users to
just protect "/git-receive-pack$".  This patch provides the
mod_rewrite recipe to actually match the ref advertisement,
which is preferred.

While we're at it, let's add the recipe to our test scripts
so that we can be sure that it works, and doesn't get broken
(either by our changes or by changes in Apache).

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-13 22:27:06 -07:00
Jeff King
3813a33de5 doc/http-backend: give some lighttpd config examples
The examples in the documentation are all for Apache. Let's
at least cover the basics: an anonymous server, an
authenticated server, and a "half auth" server with
anonymous read and authenticated write.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-11 07:33:21 -07:00
Jeff King
fdae191003 doc/http-backend: clarify "half-auth" repo configuration
When the http-backend is set up to allow anonymous read but
authenticated write, the http-backend manual suggests
catching only the "/git-receive-pack" POST of the packfile,
not the initial "info/refs?service=git-receive-pack" GET in
which we advertise refs.

This does work and is secure, as we do not allow any write
during the info/refs request, and the information in the ref
advertisement is the same that you would get from a fetch.

However, the configuration required by the server is
slightly more complex. The default `http.receivepack`
setting is to allow pushes if the webserver tells us that
the user authenticated, and otherwise to return a 403
("Forbidden"). That works fine if authentication is turned
on completely; the initial request requires authentication,
and http-backend realizes it is OK to do a push.

But for this "half-auth" state, no authentication has
occurred during the initial ref advertisement. The
http-backend CGI therefore does not think that pushing
should be enabled, and responds with a 403. The client
cannot continue, even though the server would have allowed
it to run if it had provided credentials.

It would be much better if the server responded with a 401,
asking for credentials during the initial contact. But
git-http-backend does not know about the server's auth
configuration (so a 401 would be confusing in the case of a
true anonymous server). Unfortunately, configuring Apache to
recognize the query string and apply the auth appropriately
to receive-pack (but not upload-pack) initial requests is
non-trivial.

The site admin can work around this by just turning on
http.receivepack explicitly in its repositories. Let's
document this workaround.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-11 07:33:07 -07:00
Thomas Ackermann
2de9b71138 Documentation: the name of the system is 'Git', not 'git'
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-01 13:53:33 -08:00
Josh Triplett
d49483f0ca ref namespaces: documentation
Document the namespace mechanism in a new gitnamespaces(7) page.
Reference it from receive-pack and upload-pack.

Document the new --namespace option and GIT_NAMESPACE environment
variable in git(1), and reference gitnamespaces(7).

Add a sample Apache configuration to http-backend(1) to support
namespaced repositories, and reference gitnamespaces(7).

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-11 09:35:46 -07:00
Greg Bacon
09f53b16bc Documentation: Clarify support for smart HTTP backend
In the description of http.getanyfile, replace the vague "older Git
clients" with the earliest release whose client is able to use the
upload pack service.

Signed-off-by: Greg Bacon <gbacon@dbresearch.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-30 16:49:19 -07:00
Ralf Wildenhues
6a5d0b0a90 Fix typos in technical documentation.
Signed-off-by: Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-31 10:24:53 -08:00
Junio C Hamano
add0951ab0 Merge remote branch 'remotes/trast-doc/for-next'
* remotes/trast-doc/for-next:
  Documentation: spell 'git cmd' without dash throughout
  Documentation: format full commands in typewriter font
  Documentation: warn prominently against merging with dirty trees
  Documentation/git-merge: reword references to "remote" and "pull"

Conflicts:
	Documentation/config.txt
	Documentation/git-config.txt
	Documentation/git-merge.txt
2010-01-20 20:28:49 -08:00
Thomas Rast
0b444cdb19 Documentation: spell 'git cmd' without dash throughout
The documentation was quite inconsistent when spelling 'git cmd' if it
only refers to the program, not to some specific invocation syntax:
both 'git-cmd' and 'git cmd' spellings exist.

The current trend goes towards dashless forms, and there is precedent
in 647ac70 (git-svn.txt: stop using dash-form of commands.,
2009-07-07) to actively eliminate the dashed variants.

Replace 'git-cmd' with 'git cmd' throughout, except where git-shell,
git-cvsserver, git-upload-pack, git-receive-pack, and
git-upload-archive are concerned, because those really live in the
$PATH.
2010-01-10 13:01:28 +01:00
Tarmigan Casebolt
8b2bd7cdac Smart-http: check if repository is OK to export before serving it
Similar to how git-daemon checks whether a repository is OK to be
exported, smart-http should also check.  This check can be satisfied
in two different ways: the environmental variable GIT_HTTP_EXPORT_ALL
may be set to export all repositories, or the individual repository
may have the file git-daemon-export-ok.

Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Tarmigan Casebolt <tarmigan+git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-06 01:16:50 -08:00
Shawn O. Pearce
5abb013b3d http-backend: Use http.getanyfile to disable dumb HTTP serving
Some repository owners may wish to enable smart HTTP, but disallow
dumb content serving.  Disallowing dumb serving might be because
the owners want to rely upon reachability to control which objects
clients may access from the repository, or they just want to
encourage clients to use the more bandwidth efficient transport.

If http.getanyfile is set to false the backend CGI will return with
'403 Forbidden' when an object file is accessed by a dumb client.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-04 17:58:16 -08:00
Mark Lodato
f5ba2d18f9 http-backend: more explict LocationMatch
In the git-http-backend examples, only match git-receive-pack within
/git/.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-04 17:58:15 -08:00
Mark Lodato
8127f778a0 http-backend: add example for gitweb on same URL
In the git-http-backend documentation, add an example of how to set up
gitweb and git-http-backend on the same URL by using a series of
mod_alias commands.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-04 17:58:15 -08:00
Mark Lodato
0ebb1fa78e http-backend: use mod_alias instead of mod_rewrite
In the git-http-backend documentation, use mod_alias exlusively, instead
of using a combination of mod_alias and mod_rewrite.  This makes the
example slightly shorted and a bit more clear.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-04 17:58:15 -08:00
Mark Lodato
b9af4ab3cd http-backend: reword some documentation
Clarify some of the git-http-backend documentation, particularly:

* In the Description, state that smart/dumb HTTP fetch and smart HTTP
  push are supported, state that authenticated clients allow push, and
  remove the note that this is only suited for read-only updates.

* At the start of Examples, state explicitly what URL is mapping to what
  location on disk.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-04 17:58:15 -08:00
Mark Lodato
917adc0360 http-backend: add GIT_PROJECT_ROOT environment var
Add a new environment variable, GIT_PROJECT_ROOT, to override the
method of using PATH_TRANSLATED to find the git repository on disk.
This makes it much easier to configure the web server, especially when
the web server's DocumentRoot does not contain the git repositories,
which is the usual case.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-04 17:58:15 -08:00
Shawn O. Pearce
556cfa3b6d Smart fetch and push over HTTP: server side
Requests for $GIT_URL/git-receive-pack and $GIT_URL/git-upload-pack
are forwarded to the corresponding backend process by directly
executing it and leaving stdin and stdout connected to the invoking
web server.  Prior to starting the backend process the HTTP response
headers are sent, thereby freeing the backend from needing to know
about the HTTP protocol.

Requests that are encoded with Content-Encoding: gzip are
automatically inflated before being streamed into the backend.
This is primarily useful for the git-upload-pack backend, which
receives highly repetitive text data from clients that easily
compresses to 50% of its original size.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-04 17:58:15 -08:00
Shawn O. Pearce
2f4038ab33 Git-aware CGI to provide dumb HTTP transport
The git-http-backend CGI can be configured into any Apache server
using ScriptAlias, such as with the following configuration:

  LoadModule cgi_module /usr/libexec/apache2/mod_cgi.so
  LoadModule alias_module /usr/libexec/apache2/mod_alias.so
  ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

Repositories are accessed via the translated PATH_INFO.

The CGI is backwards compatible with the dumb client, allowing all
older HTTP clients to continue to download repositories which are
managed by the CGI.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-04 17:58:04 -08:00