Commit graph

25427 commits

Author SHA1 Message Date
Junio C Hamano b84c783917 streaming filter: ident filter
Add support for "ident" filter on the output codepath. This does not work
with lf-to-crlf filter together (yet).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano e322ee38ad Add LF-to-CRLF streaming conversion
If we do not have to guess or validate by scanning the input, we can
just stream this through.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano 4ae6670444 stream filter: add "no more input" to the filters
Some filters may need to buffer the input and look-ahead inside it
to decide what to output, and they may consume more than zero bytes
of input and still not produce any output. After feeding all the
input, pass NULL as input as keep calling stream_filter() to let
such filters know there is no more input coming, and it is time for
them to produce the remaining output based on the buffered input.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano b6691092d7 Add streaming filter API
This introduces an API to plug custom filters to an input stream.

The caller gets get_stream_filter("path") to obtain an appropriate
filter for the path, and then uses it when opening an input stream
via open_istream().  After that, the caller can read from the stream
with read_istream(), and close it with close_istream(), just like an
unfiltered stream.

This only adds a "null" filter that is a pass-thru filter, but later
changes can add LF-to-CRLF and other filters, and the callers of the
streaming API do not have to change.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano d1bf0e0831 convert.h: move declarations for conversion from cache.h
Before adding the streaming filter API to the conversion layer,
move the existing declarations related to the conversion to its
own header file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Jim Meyering 23c7df6bdd sha1_file: use the correct type (ssize_t, not size_t) for read-style function
Using an unsigned type, we would fail to detect a read error and then
proceed to try to write (size_t)-1 bytes.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 11:25:59 -07:00
Junio C Hamano 93aa7bd595 streaming: read loose objects incrementally
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano f0270efd46 sha1_file.c: expose helpers to read loose objects
Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header()
available to the streaming read API by exporting them via cache.h header
file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano 7ef2d9a260 streaming: read non-delta incrementally from a pack
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano de6182db67 streaming_write_entry(): support files with holes
One typical use of a large binary file is to hold a sparse on-disk hash
table with a lot of holes. Help preserving the holes with lseek().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano b0d9c69f5e convert: CRLF_INPUT is a no-op in the output codepath
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano dd8e912190 streaming_write_entry(): use streaming API in write_entry()
When the output to a path does not have to be converted, we can read from
the object database from the streaming API and write to the file in the
working tree, without having to hold everything in the memory.

The ident, auto- and safe- crlf conversions inherently require you to read
the whole thing before deciding what to do, so while it is technically
possible to support them by using a buffer of an unbound size or rewinding
and reading the stream twice, it is less practical than the traditional
"read the whole thing in core and convert" approach.

Adding streaming filters for the other conversions on top of this should
be doable by tweaking the can_bypass_conversion() function (it should be
renamed to can_filter_stream() when it happens). Then the streaming API
can be extended to wrap the git_istream streaming_write_entry() opens on
the underlying object in another git_istream that reads from it, filters
what is read, and let the streaming_write_entry() read the filtered
result. But that is outside the scope of this series.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:46:58 -07:00
Junio C Hamano 46bf043807 streaming: a new API to read from the object store
Given an object name, use open_istream() to get a git_istream handle
that you can read_istream() from as if you are using read(2) to read
the contents of the object, and close it with close_istream() when
you are done.

Currently, we do not do anything fancy--it just calls read_sha1_file()
and keeps the contents in memory as a whole, and carve it out as you
request with read_istream().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:46:55 -07:00
Junio C Hamano fd5db55d8b write_entry(): separate two helper functions out
In the write-out codepath, a block of code determines what file in the
working tree to write to, and opens an output file descriptor to it.

After writing the contents out to the file, another block of code runs
fstat() on the file descriptor when appropriate.

Separate these blocks out to open_output_fd() and fstat_output()
helper functions.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:54 -07:00
Junio C Hamano f8c8abc5b7 unpack_object_header(): make it public
This function is used to read and skip over the per-object header
in a packfile.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:54 -07:00
Junio C Hamano 5266d369b2 sha1_object_info_extended(): hint about objects in delta-base cache
An object found in the delta-base cache is not guaranteed to
stay there, but we know it came from a pack and it is likely
to give us a quick access if we read_sha1_file() it right now,
which is a piece of useful information.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:50 -07:00
Junio C Hamano 9a49059022 sha1_object_info_extended(): expose a bit more info
The original interface for sha1_object_info() takes an object name and
gives back a type and its size (the latter is given only when it was
asked).  The new interface wraps its implementation and exposes a bit
more pieces of information that the interface used to discard, namely:

 - where the object is stored (loose? cached? packed?)
 - if packed, where in which packfile?

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * In the earlier round, this used u.pack.delta to record the length of
   the delta chain, but the caller is not necessarily interested in the
   length of the delta chain per-se, but may only want to know if it is a
   delta against another object or is stored as a deflated data. Calling
   packed_object_info_detail() involves walking the reverse index chain to
   compute the store size of the object and is unnecessarily expensive.

   We could resurrect the code if a new caller wants to know, but I doubt
   it.
2011-05-19 14:22:47 -07:00
Junio C Hamano b9a62cbeb9 packed_object_info_detail(): do not return a string
Instead return an integer that can be given to typename() if
the caller wants a string, just like everybody else does.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-16 22:13:34 -07:00
Junio C Hamano 02071b27f1 Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streaming
* jc/convert:
  convert: make it harder to screw up adding a conversion attribute
  convert: make it safer to add conversion attributes
  convert: give saner names to crlf/eol variables, types and functions
  convert: rename the "eol" global variable to "core_eol"

* jc/bigfile:
  Bigfile: teach "git add" to send a large file straight to a pack
  index_fd(): split into two helper functions
  index_fd(): turn write_object and format_check arguments into one flag

* jc/replacing:
  read_sha1_file(): allow selective bypassing of replacement mechanism
  inline lookup_replace_object() calls
  read_sha1_file(): get rid of read_sha1_file_repl() madness
  t6050: make sure we test not just commit replacement
  Declare lookup_replace_object() in cache.h, not in commit.h
2011-05-15 16:30:13 -07:00
Junio C Hamano c565cb452c Sync release notes for 1.7.6 to exclude what are in maintenance track
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 16:19:16 -07:00
Junio C Hamano f574cb3404 Merge branch 'maint'
* maint:
  Update draft release notes to 1.7.5.2
  git_open_noatime(): drop unused parameter
  sha1_file: typofix
2011-05-15 16:16:56 -07:00
Junio C Hamano 96dbe93da5 Update draft release notes to 1.7.5.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 16:11:55 -07:00
Junio C Hamano 06c0f42f94 Merge branch 'cn/format-patch-quiet' into maint
* cn/format-patch-quiet:
  format-patch: document --quiet option
  format-patch: don't pass on the --quiet flag
2011-05-15 16:10:49 -07:00
Junio C Hamano ab02095ccd Merge branch 'jm/mergetool-submodules' into maint
* jm/mergetool-submodules:
  mergetool: Teach about submodules
2011-05-15 15:57:16 -07:00
Junio C Hamano 92b501f2a0 Merge branch 'jk/format-patch-quote-special-in-from' into maint
* jk/format-patch-quote-special-in-from:
  pretty: quote rfc822 specials in email addresses
2011-05-15 15:56:44 -07:00
Junio C Hamano e5c1650b27 Merge branch 'vh/git-svn-doc' into maint
* vh/git-svn-doc:
  git-svn.txt: small typeface improvements
  git-svn.txt: move option descriptions
  git-svn.txt: fix usage of --add-author-from
2011-05-15 15:52:40 -07:00
Junio C Hamano f4e516834e git_open_noatime(): drop unused parameter
Since commit c793430 (Limit file descriptors used by packs, 2011-02-28),
the extra parameter added in f2e872aa (Work around EMFILE when there are
too many pack files, 2010-11-01) is not used anymore.

Remove it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-15 15:24:52 -07:00
Junio C Hamano ccf5ace0dc sha1_file: typofix
The number zero is spelled "zero", not "zer0".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:24:36 -07:00
Junio C Hamano 5bf29b9500 read_sha1_file(): allow selective bypassing of replacement mechanism
The way "object replacement" mechanism was tucked to the read_sha1_file()
interface was suboptimal in a couple of ways:

 - Callers that want it to die with useful diagnosis upon seeing a corrupt
   object does not have a way to say that they do not want any object
   replacement.

 - Callers who do not want it to die but want to handle the errors
   themselves are told to arrange to call read_object(), but the function
   does not use the replacement mechanism, and also it is a file scope
   static function that not many callers can call to begin with.

This adds a read_sha1_file_extended() that takes a set of flags; the
callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask
for object replacement mechanism to kick in.

Later, we could add another flag bit to tell the function to return an
error instead of dying and then remove the misguided "call read_object()
yourself".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:34 -07:00
Junio C Hamano e1111cef23 inline lookup_replace_object() calls
In a repository without object replacement, lookup_replace_object() should
be a no-op. Check the flag "read_replace_refs" on the side of the caller,
and bypess a function call when we know we are not dealing with replacement.

Also, even when we are set up to replace objects, if we do not find any
replacement defined, flip that flag off to avoid function call overhead
for all the later object accesses.

As this change the semantics of the flag from "do we need read the
replacement definition?" to "do we need to check with the lookup table?"
the flag needs to be renamed later to something saner, e.g. "use_replace",
when the codebase is calmer, but not now.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:33 -07:00
Junio C Hamano 4bbf5a2615 read_sha1_file(): get rid of read_sha1_file_repl() madness
Most callers want to silently get a replacement object, and they do not
care what the real name of the replacement object is.  Worse yet, no sane
interface to return the underlying object without replacement is provided.

Remove the function and make only the few callers that want the name of
the replacement object find it themselves.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:33 -07:00
Junio C Hamano abb25ac365 t6050: make sure we test not just commit replacement
The replacement mechanism should affect all types of objects not
just commits, so make sure it deals with at least a blob.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:32 -07:00
Junio C Hamano fea33a1ef3 Declare lookup_replace_object() in cache.h, not in commit.h
The declaration is misplaced as the replace API is supposed to affect
not just commits, but all types of objects.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:31 -07:00
Junio C Hamano 2f80de956f Merge branch 'maint'
* maint:
  add, merge, diff: do not use strcasecmp to compare config variable names
2011-05-14 20:44:09 -07:00
Jonathan Nieder 8c2be75fe1 add, merge, diff: do not use strcasecmp to compare config variable names
The config machinery already makes section and variable names
lowercase when parsing them, so using strcasecmp for comparison just
feels wasteful.  No noticeable change intended.

Noticed-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-14 18:53:39 -07:00
Junio C Hamano 4dd1fbc7b1 Bigfile: teach "git add" to send a large file straight to a pack
When adding a new content to the repository, we have always slurped
the blob in its entirety in-core first, and computed the object name
and compressed it into a loose object file.  Handling large binary
files (e.g.  video and audio asset for games) has been problematic
because of this design.

At the middle level of "git add" callchain is an internal API
index_fd() that takes an open file descriptor to read from the
working tree file being added with its size. Teach it to call out to
fast-import when adding a large blob.

The write-out codepath in entry.c::write_entry() should be taught to
stream, instead of reading everything in core. This should not be so
hard to implement, especially if we limit ourselves only to loose
object files and non-delta representation in packfiles.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13 16:11:18 -07:00
Junio C Hamano 2de58b398b Update draft release notes to 1.7.6
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13 11:14:07 -07:00
Junio C Hamano e4ae6efb78 Merge branch 'bf/commit-template-no-cleanup'
* bf/commit-template-no-cleanup:
  Do not strip empty lines / trailing spaces from a commit message template
2011-05-13 11:03:08 -07:00
Junio C Hamano d6ad4ff120 Merge branch 'jc/t1506-shell-param-expansion-gotcha'
* jc/t1506-shell-param-expansion-gotcha:
  t1507: avoid "${parameter<op>'word'}" inside double-quotes
2011-05-13 11:02:47 -07:00
Junio C Hamano ad29f71d53 Merge branch 'rr/rerere-libify-clear-gc'
* rr/rerere-libify-clear-gc:
  rerere: libify rerere_clear() and rerere_gc()
2011-05-13 11:02:40 -07:00
Junio C Hamano e9c1a3a426 Merge branch 'js/maint-send-pack-stateless-rpc-deadlock-fix'
* js/maint-send-pack-stateless-rpc-deadlock-fix:
  send-pack: unbreak push over stateless rpc
  send-pack: avoid deadlock when pack-object dies early
2011-05-13 11:02:29 -07:00
Junio C Hamano df54e2bfd6 Merge branch 'jh/dirstat-lines'
* jh/dirstat-lines:
  Mark dirstat error messages for translation
  Improve error handling when parsing dirstat parameters
  New --dirstat=lines mode, doing dirstat analysis based on diffstat
  Allow specifying --dirstat cut-off percentage as a floating point number
  Add config variable for specifying default --dirstat behavior
  Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
  Make --dirstat=0 output directories that contribute < 0.1% of changes
  Add several testcases for --dirstat and friends
2011-05-13 11:01:32 -07:00
Junio C Hamano f7d59e2568 Merge branch 'jc/fix-add-u-unmerged'
* jc/fix-add-u-unmerged:
  Fix "add -u" that sometimes fails to resolve unmerged paths
2011-05-13 11:01:15 -07:00
Junio C Hamano 3e1a363b1f Merge branch 'jn/setup-revisions-glob-and-friends-passthru'
* jn/setup-revisions-glob-and-friends-passthru:
  revisions: allow --glob and friends in parse_options-enabled commands
  revisions: split out handle_revision_pseudo_opt function
2011-05-13 11:00:25 -07:00
Junio C Hamano 2f3e3f573d Merge branch 'cn/log-parse-opt'
* cn/log-parse-opt:
  log: convert to parse-options
2011-05-13 10:59:57 -07:00
Junio C Hamano 32341b9df5 Merge branch 'maint'
* maint:
  Prepare for 1.7.5.2
  t5400: Fix a couple of typos

Conflicts:
	RelNotes
2011-05-13 10:58:10 -07:00
Junio C Hamano 375f8a032e Prepare for 1.7.5.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13 10:57:09 -07:00
Junio C Hamano e31b018372 Merge branch 'aw/maint-rebase-i-p-no-ff' into maint
* aw/maint-rebase-i-p-no-ff:
  git-rebase--interactive.sh: preserve-merges fails on merges created with no-ff
2011-05-13 10:45:21 -07:00
Junio C Hamano bc67ad8c37 Merge branch 'js/blame-parsename' into maint
* js/blame-parsename:
  t/annotate-tests: Use echo & cat instead of sed
  blame: tolerate bogus e-mail addresses a bit better
2011-05-13 10:45:00 -07:00
Junio C Hamano 978471dcce Merge branch 'gr/cvsimport-alternative-cvspass-location' into maint
* gr/cvsimport-alternative-cvspass-location:
  Look for password in both CVS and CVSNT password files.
2011-05-13 10:44:54 -07:00