Commit graph

113 commits

Author SHA1 Message Date
Jonathan Nieder 3f790003a3 vcs-svn: suppress a -Wtype-limits warning
On 32-bit architectures with 64-bit file offsets, gcc 4.3 and earlier
produce the following warning:

	    CC vcs-svn/sliding_window.o
	vcs-svn/sliding_window.c: In function `check_overflow':
	vcs-svn/sliding_window.c:36: warning: comparison is always false \
	    due to limited range of data type

The warning appears even when gcc is run without any warning flags
(this is gcc bug 12963).  In later versions the same warning can be
reproduced with -Wtype-limits, which is implied by -Wextra.

On 64-bit architectures it really is possible for a size_t not to be
representable as an off_t so the check this is warning about is not
actually redundant.  But even false positives are distracting.  Avoid
the warning by making the "len" argument to check_overflow a
uintmax_t; no functional change intended.

Reported-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 11:05:18 -08:00
Jonathan Nieder 150f75467c vcs-svn: allow import of > 4GiB files
There is no reason in principle that an svn-format dump would not be
able to represent a file whose length does not fit in a 32-bit
integer.  Use off_t consistently to represent file lengths (in place
of using uint32_t in some contexts) so we can handle that.

Most svn-fe code is already ready to do that without this patch and
passes values of type off_t around.  The type mismatch from stragglers
was noticed with gcc -Wtype-limits.

While at it, tighten the parsing of the Text-content-length field to
make sure it is a number and does not overflow, and tighten other
overflow checks as that value is passed around and manipulated.

Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 11:03:30 -08:00
Ramsay Jones 173223aa62 vcs-svn: rename check_overflow arguments for clarity
Code using the argument names a and b just doesn't look right (not
sure why!).  Use more explicit names "offset" and "len" to make their
type and meaning clearer.

Also rename check_overflow() to check_offset_overflow() to clarify
that we are making sure that "len" bytes beyond "offset" still fits
the type to represent an offset.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:53:18 -08:00
Junio C Hamano 58ebd9865d vcs-svn/svndiff.c: squelch false "unused" warning from gcc
Curiously, pre_len given to read_length() does not trigger the same warning
even though the code structure is the same. Most likely this is because
read_offset() is used only once and inlining it will make gcc realize that
it has a chance to do more flow analysis. Alas, the analysis is flawed, so
it does not help X-<.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-27 11:58:56 -08:00
Junio C Hamano d475536658 Merge branch 'svn-fe' of git://repo.or.cz/git/jrn into jn/svn-fe
This simplifies svn-fe a great deal and fulfills a longstanding wish:
support for dumps with deltas in them, and incremental imports.

The cost is that commandline usage of the svn-fe tool becomes a little
more complicated since it no longer keeps state itself but instead reads
blobs back from fast-import in order to copy them between revisions and
apply deltas to them.

Also removes a couple of custom data structures and replaces them with
strbufs like other parts of Git.

* 'svn-fe' of git://repo.or.cz/git/jrn: (32 commits)
  vcs-svn: reset first_commit_done in fast_export_init
  vcs-svn: do not initialize report_buffer twice
  vcs-svn: avoid hangs from corrupt deltas
  vcs-svn: guard against overflow when computing preimage length
  vcs-svn: cap number of bytes read from sliding view
  test-svn-fe: split off "test-svn-fe -d" into a separate function
  vcs-svn: implement text-delta handling
  vcs-svn: let deltas use data from preimage
  vcs-svn: let deltas use data from postimage
  vcs-svn: verify that deltas consume all inline data
  vcs-svn: implement copyfrom_data delta instruction
  vcs-svn: read instructions from deltas
  vcs-svn: read inline data from deltas
  vcs-svn: read the preimage when applying deltas
  vcs-svn: parse svndiff0 window header
  vcs-svn: skeleton of an svn delta parser
  vcs-svn: make buffer_read_binary API more convenient
  vcs-svn: learn to maintain a sliding view of a file
  Makefile: list one vcs-svn/xdiff object or header per line
  vcs-svn: avoid using ls command twice
  ...

Conflicts:
	Makefile
	contrib/svn-fe/svn-fe.txt
2012-01-27 11:20:00 -08:00
Ævar Arnfjörð Bjarmason 952fba9c63 Fix a bitwise negation assignment issue spotted by Sun Studio
Change direct and indirect assignments of the bitwise negation of 0 to
uint32_t variables to have a "U" suffix. I.e. ~0U instead of ~0. This
eliminates warnings under Sun Studio 12 Update 1:

    "vcs-svn/string_pool.c", line 11: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/string_pool.c", line 81: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/repo_tree.c", line 112: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/repo_tree.c", line 112: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "test-treap.c", line 34: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)

The semantics are still the same as demonstrated by this program:

    $ cat test.c && make test && ./test
    #include <stdio.h>
    #include <stdint.h>

    int main(void)
    {
        uint32_t foo = ~0;
        uint32_t bar = ~0U;

        printf("foo = <%u> bar = <%u>\n", foo, bar);

        return 0;
    }
    cc     test.c   -o test
    "test.c", line 5: warning: initializer will be sign-extended: -1
    foo = <4294967295> bar = <4294967295>

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21 10:19:40 -08:00
Dmitry Ivankov c5bcbcdcfa vcs-svn: reset first_commit_done in fast_export_init
first_commit_done has zero as a default value, but it
is not reset back to zero in fast_export_init.

Reset it back to zero so that each export will have
proper initial state.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-23 10:04:36 -05:00
Dmitry Ivankov c5f1fbe7bc vcs-svn: do not initialize report_buffer twice
When importing from a dump with deltas, first fast_export_init calls
buffer_fdinit, and then init_report_buffer calls fdopen once again
when processing the first delta.  The second initialization is
redundant and leaks a FILE *.

Remove the redundant on-demand initialization to fix this.
Initializing directly in fast_export_init is simpler and lets the
caller pass an int specifying which fd to use instead of hard-coding
REPORT_FILENO.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-21 05:02:57 -05:00
Jonathan Nieder 3ac10b2e3f vcs-svn: avoid hangs from corrupt deltas
A corrupt Subversion-format delta can request reads past the end of
the preimage.  Set sliding_view::max_off so such corruption is caught
when it appears rather than blocking in an impossible-to-fulfill
read() when input is coming from a socket or pipe.

Inspired-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15 02:32:50 -05:00
Jonathan Nieder abe27c0cbd vcs-svn: guard against overflow when computing preimage length
Signed integer overflow produces undefined behavior in C and off_t is
a signed type.  For predictable behavior, add some checks to protect
in advance against overflow.

On 32-bit systems ftell as called by buffer_tmpfile_prepare_to_read
is likely to fail with EOVERFLOW when reading the corresponding
postimage, and this patch does not fix that.  So it's more of a
futureproofing measure than a complete fix.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15 02:32:50 -05:00
Jonathan Nieder 157415a9a9 Merge branch 'db/delta-applier' into db/text-delta
* db/delta-applier:
  vcs-svn: cap number of bytes read from sliding view
  test-svn-fe: split off "test-svn-fe -d" into a separate function
2011-06-15 02:31:35 -05:00
Jonathan Nieder fbdd4f6fb4 vcs-svn: cap number of bytes read from sliding view
Introduce a "max_off" field in struct sliding_view, roughly
representing a maximum number of bytes that can be read from "file".
If it is set to a nonnegative integer, a call to move_window()
attempting to put the right endpoint beyond that offset will return
an error instead.

The idea is to use this when applying Subversion-format deltas to
prevent reads past the end of the preimage (which has known length).
Without such a check, corrupt deltas would cause svn-fe to block
indefinitely when data in the input pipe is exhausted.

Inspired-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15 02:15:22 -05:00
David Barr 7a75e661c5 vcs-svn: implement text-delta handling
Handle input in Subversion's dumpfile format, version 3.  This is the
format produced by "svnrdump dump" and "svnadmin dump --deltas", and
the main difference between v3 dumpfiles and the dumpfiles already
handled is that these can include nodes whose properties and text are
expressed relative to some other node.

To handle such nodes, we find which node the text and properties are
based on, handle its property changes, use the cat-blob command to
request the basis blob from the fast-import backend, use the
svndiff0_apply() helper to apply the text delta on the fly, writing
output to a temporary file, and then measure that postimage file's
length and write its content to the fast-import stream.

The temporary postimage file is shared between delta-using nodes to
avoid some file system overhead.

The svn-fe interface needs to be more complicated to accomodate the
backward flow of information from the fast-import backend to svn-fe.
The backflow fd is not needed when parsing streams without deltas,
though, so existing scripts using svn-fe on v2 dumps should
continue to work.

NEEDSWORK: generalize interface so caller sets the backflow fd, close
temporary file before exiting

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-05-26 02:28:04 -05:00
Jonathan Nieder e9f3f8b6f4 Merge branch 'db/delta-applier' into db/text-delta
* db/delta-applier:
  vcs-svn: let deltas use data from preimage
  vcs-svn: let deltas use data from postimage
  vcs-svn: verify that deltas consume all inline data
  vcs-svn: implement copyfrom_data delta instruction
  vcs-svn: read instructions from deltas
  vcs-svn: read inline data from deltas
  vcs-svn: read the preimage when applying deltas
  vcs-svn: parse svndiff0 window header
  vcs-svn: skeleton of an svn delta parser
  vcs-svn: make buffer_read_binary API more convenient
  vcs-svn: learn to maintain a sliding view of a file
  Makefile: list one vcs-svn/xdiff object or header per line

Conflicts:
	Makefile
	vcs-svn/LICENSE
2011-05-26 02:27:48 -05:00
Jonathan Nieder c19d653c4f Merge branch 'db/svn-fe-code-purge' into svn-fe
* db/svn-fe-code-purge:
  vcs-svn: drop obj_pool
  vcs-svn: drop treap
  vcs-svn: drop string_pool
  vcs-svn: pass paths through to fast-import

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/repo_tree.h
	vcs-svn/string_pool.c
	vcs-svn/svndump.c
	vcs-svn/trp.txt
2011-05-26 02:12:14 -05:00
Jonathan Nieder 9ecfa8ae4c Merge branch 'db/vcs-svn-incremental' into svn-fe
This teaches svn-fe to incrementally import into an existing
repository (at last!) at the expense of less convenient UI.  Think of
it as growing pains.  This opens the door to many excellent things,
and it would be a bad idea to discourage people from building on it
for much longer.

* db/vcs-svn-incremental:
  vcs-svn: avoid using ls command twice
  vcs-svn: use mark from previous import for parent commit
  vcs-svn: handle filenames with dq correctly
  vcs-svn: quote paths correctly for ls command
  vcs-svn: eliminate repo_tree structure
  vcs-svn: add a comment before each commit
  vcs-svn: save marks for imported commits
  vcs-svn: use higher mark numbers for blobs
  vcs-svn: set up channel to read fast-import cat-blob response

Conflicts:
	t/t9010-svn-fe.sh
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/svndump.c
2011-05-26 02:02:44 -05:00
Junio C Hamano 793066e790 Merge branch 'rj/sparse'
* rj/sparse:
  sparse: Fix some "symbol not declared" warnings
  sparse: Fix errors due to missing target-specific variables
  sparse: Fix an "symbol 'merge_file' not decared" warning
  sparse: Fix an "symbol 'format_subject' not declared" warning
  sparse: Fix some "Using plain integer as NULL pointer" warnings
  sparse: Fix an "symbol 'cmd_index_pack' not declared" warning
  Makefile: Use cgcc rather than sparse in the check target
2011-04-27 11:36:42 -07:00
Ramsay Jones c51477229e sparse: Fix some "symbol not declared" warnings
In particular, sparse issues the "symbol 'a_symbol' was not declared.
Should it be static?" warnings for the following symbols:

    attr.c:468:12: 'git_etc_gitattributes'
    attr.c:476:5:  'git_attr_system'
    vcs-svn/svndump.c:282:6: 'svndump_read'
    vcs-svn/svndump.c:417:5: 'svndump_init'
    vcs-svn/svndump.c:432:6: 'svndump_deinit'
    vcs-svn/svndump.c:445:6: 'svndump_reset'

The symbols in attr.c only require file scope, so we add the static
modifier to their declaration.

The symbols in vcs-svn/svndump.c are external symbols, and they
already have extern declarations in the "svndump.h" header file,
so we simply include the header in svndump.c.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-22 10:04:27 -07:00
Jim Meyering 0353a0c4ec remove doubled words, e.g., s/to to/to/, and fix related typos
I found that some doubled words had snuck back into projects from which
I'd already removed them, so now there's a "syntax-check" makefile rule in
gnulib to help prevent recurrence.

Running the command below spotted a few in git, too:

  git ls-files | xargs perl -0777 -n \
    -e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt])\s+\1\b/gims)' \
    -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \
    -e 'print "$ARGV:$n:$v\n"}'

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-13 11:59:11 -07:00
Junio C Hamano 44a9cedb00 Merge branch 'svn-fe' of git://repo.or.cz/git/jrn
* 'svn-fe' of git://repo.or.cz/git/jrn:
  tests: kill backgrounded processes more robustly
  vcs-svn: a void function shouldn't try to return something
  tests: make sure input to sed is newline terminated
  vcs-svn: add missing cast to printf argument
2011-03-30 10:49:13 -07:00
Michael Witten 9e113988d3 vcs-svn: a void function shouldn't try to return something
As v1.7.4-rc0~184 (2010-10-04) and C99 §6.8.6.4.1 remind us, standard
C does not permit returning an expression of type void, even for a
tail call.

Noticed with gcc -pedantic:

 vcs-svn/svndump.c: In function 'handle_node':
 vcs-svn/svndump.c:213:3: warning: ISO C forbids 'return' with expression,
  in function returning void [-pedantic]

[jn: with simplified log message]

Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-29 14:47:02 -05:00
Jonathan Nieder 8cc299daf2 vcs-svn: add missing cast to printf argument
gcc -m32 correctly warns:

 vcs-svn/fast_export.c: In function 'fast_export_commit':
 vcs-svn/fast_export.c:54:2: warning: format '%llu' expects
   argument of type 'long long unsigned int', but argument 2
   has type 'unsigned int' [-Wformat]

Fix it.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-28 09:46:44 -07:00
Jonathan Nieder c846e41078 vcs-svn: let deltas use data from preimage
The copyfrom_source instruction appends data from the preimage buffer
to the end of output.  Its arguments are a length and an offset
relative to the beginning of the source view.

With this change, the delta applier is able to reproduce all 5,636,613
blobs in the early history of the ASF repository.  Tested with

	mkfifo backflow
	svn-fe <svn-asf-public-r0:940166 3<backflow |
	git fast-import --cat-blob-fd=3 3>backflow

with svn-asf-public-r0:940166 produced by whatever version of
Subversion the dumps in /dump/ on svn.apache.org use (presumably
1.6.something).

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-28 00:33:48 -05:00
Jonathan Nieder d3f131b57e vcs-svn: let deltas use data from postimage
The copyfrom_target instruction copies appends data that is already
present in the current output view to the end of output.  (The offset
argument is relative to the beginning of output produced in the
current window.)

The region copied is allowed to run past the end of the existing
output.  To support that case, copy one character at a time rather
than calling memcpy or memmove.  This allows copyfrom_target to be
used once to repeat a string many times.  For example:

	COPYFROM_DATA 2
	COPYFROM_OUTPUT 10, 0
	DATA "ab"

would produce the output "ababababababababababab".

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:28:27 -05:00
Jonathan Nieder 4c9b93ed76 vcs-svn: verify that deltas consume all inline data
By constraining the format of deltas, we can more easily detect
corruption and other breakage.

Requiring deltas not to provide unconsumed data also opens the
possibility of ignoring the declared amount of novel data and simply
streaming the data as needed to fulfill copyfrom_data requests.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:28:02 -05:00
Jonathan Nieder ec71aa2e1f vcs-svn: implement copyfrom_data delta instruction
The copyfrom_data instruction copies a few bytes verbatim from the
novel text section of a window to the postimage.

[jn: with memory leak fix from David]

Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:27:55 -05:00
Jonathan Nieder ef2ac77e9f vcs-svn: read instructions from deltas
Buffer the instruction section upon encountering it for later
interpretation.

An alternative design would involve parsing the instructions
at this point and buffering them in some processed form.  Using
the unprocessed form is simpler.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:02:05 -05:00
Jonathan Nieder fc4ae43b2c vcs-svn: read inline data from deltas
Each window of an svndiff0-format delta includes a section for novel
text to be copied to the postimage (in the order it appears in the
window, possibly interspersed with other data).

Slurp in this data when encountering it.  It is not actually necessary
to do so --- it would be just as easy to copy from delta to output
as part of interpreting the relevant instructions --- but this way,
the code that interprets svndiff0 instructions can proceed very
quickly because it does not require I/O.

Subversion's svndiff0 parser rejects deltas that do not consume all
the novel text that was provided.  Omit that check for now so we can
test the new functionality right away, rather than waiting to learn
instructions that consume data.

Do check for truncated data sections.  Subversion's parser rejects
deltas that end in the middle of a declared novel-text section, so it
should be safe for us to reject them, too.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 22:51:00 -05:00
Jonathan Nieder bcd254621f vcs-svn: read the preimage when applying deltas
The source view offset heading each svndiff0 window represents a
number of bytes past the beginning of the preimage.  Together with the
source view length, it dictates to the delta applier what portion of
the preimage instructions will refer to.  Read that portion right away
using the sliding window code.

Maybe some day we will use mmap to read data more lazily.

Subversion's implementation tolerates source view offsets pointing
past the end of the preimage file but we do not, for simplicity.

This does not teach the delta applier to read instructions or copy
data from the source view.  Deltas that could produce nonempty output
will still be rejected.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 22:50:01 -05:00
Jonathan Nieder 252712111f vcs-svn: parse svndiff0 window header
Each window in a subversion delta (svndiff0-format file) starts with a
window header, consisting of five integers with variable-length
representation:

	source view offset
	source view length
	output length
	instructions length
	auxiliary data length

Parse it.  The result is not usable for deltas with nonempty postimage
yet; in fact, this only adds support for deltas without any
instructions or auxiliary data.  This is a good place to stop, though,
since that little support lets us add some simple passing tests
concerning error handling to the test suite.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 22:42:53 -05:00
Jonathan Nieder ddcc8c5b46 vcs-svn: skeleton of an svn delta parser
A delta in the subversion delta (svndiff0) format consists of the
magic bytes SVN\0 followed by a sequence of windows of a certain well
specified format (starting with five integers).

Add an svndiff0_apply function and test-svn-fe -d commandline tool to
parse such a delta in the special case of not including any windows.

Later patches will add features to turn this into a fully functional
delta applier for svn-fe to use to parse the streams produced by
"svnrdump dump" and "svnadmin dump --deltas".

The content of symlinks starts with the word "link " in Subversion's
worldview, so we need to be able to prepend that text to input for the
sake of delta application.  So initialization of the input state of
the delta preimage is left to the calling program, giving callers a
chance to seed the buffer with text of their choice.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 22:41:38 -05:00
Jonathan Nieder 896e4bfcec vcs-svn: make buffer_read_binary API more convenient
buffer_read_binary is a thin wrapper around fread, but its signature
is wrong:

 - fread can fill an arbitrary in-memory buffer.  buffer_read_binary
   is limited to buffers whose size is representable by a 32-bit
   integer.
 - The result from fread is the number of bytes actually read.
   buffer_read_binary only reports the number of bytes read by
   incrementing sb->len by that amount and returns void.

Fix both: let buffer_read_binary accept a size_t instead of uint32_t
for the number of bytes to read and as a convenience return the number
of bytes actually read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 22:37:05 -05:00
Jonathan Nieder 9d2f5ddfe5 vcs-svn: learn to maintain a sliding view of a file
Each section of a Subversion-format delta only requires examining (and
keeping in random-access memory) a small portion of the preimage.  At
any moment, this portion starts at a certain file offset and has a
well-defined length, and as the delta is applied, the portion advances
from the beginning to the end of the preimage.  Add a move_window
function to keep track of this view into the preimage.

You can use it like this:

	buffer_init(f, NULL);
	struct sliding_view window = SLIDING_VIEW_INIT(f);
	move_window(&window, 3, 7);	/* (1) */
	move_window(&window, 5, 5);	/* (2) */
	move_window(&window, 12, 2);	/* (3) */
	strbuf_release(&window.buf);
	buffer_deinit(f);

The data structure is called sliding_view instead of _window to
prevent confusion with svndiff0 Windows.

In this example, (1) reads 10 bytes and discards the first 3;
(2) discards the first 2, which are not needed any more; and (3) skips
2 bytes and reads 2 new bytes to work with.

When move_window returns, the file position indicator is at position
window->off + window->width and the data from positions window->off to
the current file position are stored in window->buf.

This function performs only sequential access from the input file and
never seeks, so it can be safely used on pipes and sockets.

On end-of-file, move_window silently reads less than the caller
requested.  On other errors, it prints a message and returns -1.

Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 20:23:32 -05:00
Jonathan Nieder 41e6b91f01 vcs-svn: add missing cast to printf argument
gcc -m32 correctly warns:

 vcs-svn/fast_export.c: In function 'fast_export_commit':
 vcs-svn/fast_export.c:54:2: warning: format '%llu' expects
   argument of type 'long long unsigned int', but argument 2
   has type 'unsigned int' [-Wformat]

Fix it.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 12:21:12 -05:00
Junio C Hamano a080fdd1b1 Merge branch 'svn-fe' of git://repo.or.cz/git/jrn
* 'svn-fe' of git://repo.or.cz/git/jrn:
  vcs-svn: handle log message with embedded NUL
  vcs-svn: avoid unnecessary copying of log message and author
  vcs-svn: remove buffer_read_string
  vcs-svn: make reading of properties binary-safe
2011-03-26 11:35:41 -07:00
David Barr 43155cfe14 vcs-svn: avoid using ls command twice
Currently there are two functions to retrieve the mode and content
at a path:

	const char *repo_read_path(const uint32_t *path);
	uint32_t repo_read_mode(const uint32_t *path)

Replace them with a single function with two return values.  This
means we can use one round-trip to get the same information from
fast-import that previously took two.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 01:00:05 -05:00
Jonathan Nieder 195b7ca6f2 vcs-svn: handle log message with embedded NUL
Pass the log message by strbuf instead of as a C-style string and use
fwrite instead of printf to write it to fast-import so embedded '\0'
bytes can be preserved.

Currently "git log" doesn't show the embedded NULs but "git cat-file
commit" can.

While at it, stop including system headers from repo_tree.h.  git
source files need to include git-compat-util.h (or cache.h or
builtin.h) sooner to ensure the appropriate feature test macros are
defined.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:49:37 -05:00
Jonathan Nieder 4c3169b03e vcs-svn: avoid unnecessary copying of log message and author
Use strbuf_swap when storing the svn:log and svn:author properties, so
pointers to rather than the contents of buffers get copied.  The main
effect should be to make the code a little easier to read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:41:38 -05:00
Jonathan Nieder 7e2fe3a9fc vcs-svn: remove buffer_read_string
All previous users of buffer_read_string have already been converted
to use the more intuitive buffer_read_binary, so remove the old API to
avoid some confusion.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:17:35 -05:00
Jonathan Nieder e7d04ee147 vcs-svn: make reading of properties binary-safe
svn-fe errors out on revision 59151 of the ASF repository:

 fatal: invalid dump: unexpected end of file

The proximate cause is a property with an embedded NUL character.
Previously such anomalies were ignored but commit c9d1c8ba
(2010-12-28) introduced a check strlen(val) == len to avoid reading
uninitialized data when a property list ends early and unfortunately
this test does not distinguish between "foo" followed by EOF and the
string "foo\0bar\0baz".

Fix it by using buffer_read_binary to read to a strbuf and checking
the actual length read.  Most consumers of properties still use
C-style strings, so in practice an author or log message with embedded
NULs will be truncated, but a least this way svn-fe won't error out
(fixing the regression).

Reported-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:15:10 -05:00
Junio C Hamano 785d6989da Merge branch 'svn-fe' of git://repo.or.cz/git/jrn
* 'svn-fe' of git://repo.or.cz/git/jrn:
  vcs-svn: use strchr to find RFC822 delimiter
  vcs-svn: implement perfect hash for top-level keys
  vcs-svn: implement perfect hash for node-prop keys
  vcs-svn: use strbuf for author, UUID, and URL
  vcs-svn: use strbuf for revision log
  vcs-svn: improve reporting of input errors
  vcs-svn: make buffer_copy_bytes return length read
  vcs-svn: make buffer_skip_bytes return length read
  vcs-svn: improve support for reading large files
  vcs-svn: allow input errors to be detected promptly
  vcs-svn: simplify repo_modify_path and repo_copy
  vcs-svn: handle_node: use repo_read_path
  vcs-svn: introduce repo_read_path to check the content at a path
2011-03-22 20:51:07 -07:00
Jonathan Nieder 41b9dd9d4f Merge branch 'db/length-as-hash' into svn-fe
* db/length-as-hash:
  vcs-svn: use strchr to find RFC822 delimiter
  vcs-svn: implement perfect hash for top-level keys
  vcs-svn: implement perfect hash for node-prop keys

Conflicts:
	vcs-svn/svndump.c
2011-03-22 18:44:49 -05:00
David Barr cba3546a43 vcs-svn: drop obj_pool
This reverts commit 4709455db3 (Add
memory pool library, 2010-08-09).  svn-fe uses strbufs to avoid memory
allocation overhead nowadays.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:39:53 -05:00
David Barr 5db348dbd5 vcs-svn: drop treap
This reverts commit 951f316470
(Add treap implementation, 2010-08-09).  The string_pool was
trp.h's last user.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:34:44 -05:00
David Barr 28c5d9ed2a vcs-svn: drop string_pool
This reverts commit 1d73b52f5b
(Add string-specific memory pool, 2010-08-09).  Now that svn-fe
does not need to maintain a growing collection of strings (paths)
over a long period of time, the string_pool is not needed.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:32:58 -05:00
David Barr 030879718f vcs-svn: pass paths through to fast-import
Now that there is no internal representation of the repo, it is not
necessary to tokenise paths.  Use strbuf instead and bypass
string_pool.

This means svn-fe can handle arbitrarily long paths (as long as a
strbuf can fit them), with arbitrarily many path components.

While at it, since we now treat paths in their entirety, only quote
when necessary.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:32:58 -05:00
Jonathan Nieder fa6c4bceab Merge branch 'db/strbufs-for-metadata' into db/svn-fe-code-purge
* db/strbufs-for-metadata:
  vcs-svn: use strbuf for author, UUID, and URL
  vcs-svn: use strbuf for revision log

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/svndump.c
2011-03-22 18:19:46 -05:00
Jonathan Nieder 5c674860eb Merge branch 'db/length-as-hash' (early part) into db/svn-fe-code-purge
* 'db/length-as-hash' (early part):
  vcs-svn: implement perfect hash for top-level keys
  vcs-svn: implement perfect hash for node-prop keys
  vcs-svn: improve reporting of input errors
  vcs-svn: make buffer_copy_bytes return length read
  vcs-svn: make buffer_skip_bytes return length read
  vcs-svn: improve support for reading large files

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/svndump.c
2011-03-22 18:11:59 -05:00
David Barr f1602054e3 vcs-svn: use strchr to find RFC822 delimiter
This is a small optimisation (4% reduction in user time) but is the
largest artifact within the parsing portion of svndump.c

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:05 -05:00
David Barr 90c0a3cfe3 vcs-svn: implement perfect hash for top-level keys
Instead of interning property names and comparing their string_pool
keys, look them up in a table by string length, which should be about
as fast.

Another small step towards removing dependence on string_pool
altogether.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:05 -05:00