Commit graph

24 commits

Author SHA1 Message Date
Jonathan Nieder cb3f87cf1b vcs-svn: allow input from file descriptor
Based-on-patch-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder cc193f1f0b vcs-svn: allow character-oriented input
buffer_read_char can be used in place of buffer_read_string(1) to
avoid consuming valuable static buffer space.  The delta applier will
use this to read variable-length integers one byte at a time.

Underneath, it is fgetc, wrapped so the line_buffer library can
maintain its role as gatekeeper of input.

Later it might be worth checking if fgetc_unlocked is faster ---
most line_buffer functions are not thread-safe anyway.

Helpd-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder e832f43c1d vcs-svn: add binary-safe read function
buffer_read_string works well for non line-oriented input except for
one problem: it does not tell the caller how many bytes were actually
written.  This means that unless one is very careful about checking
for errors (and eof) the calling program cannot tell the difference
between the string "foo" followed by an early end of file and the
string "foo\0bar\0baz".

So introduce a variant that reports the length, too, a thinner wrapper
around strbuf_fread.  Its result is written to a strbuf so the caller
does not need to keep track of the number of bytes read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder e5e45ca1e3 vcs-svn: teach line_buffer to handle multiple input files
Collect the line_buffer state in a newly public line_buffer struct.
Callers can use multiple line_buffers to manage input from multiple
files at a time.

svn-fe's delta applier will use this to stream a delta from svnrdump
and the preimage it applies to from fast-import at the same time.

The tests don't take advantage of the new features, but I think that's
okay.  It is easier to find lingering examples of nonreentrant code by
searching for "static" in line_buffer.c.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:59 -06:00
Jonathan Nieder d350822fa7 vcs-svn: collect line_buffer data in a struct
Prepare for the line_buffer lib to support input from multiple files,
by collecting global state in a struct that can be easily passed
around.

No API change yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Jonathan Nieder deadcef4c1 vcs-svn: replace buffer_read_string memory pool with a strbuf
obj_pool is inherently global and does not use the standard growing
factor alloc_nr, which makes it feel out of place in the git codebase.
Plus it is overkill for this application: all that is needed is a
buffer that can grow between requests to accomodate larger strings.
Use a strbuf instead.

As a side effect, this improves the error handling: allocation
failures will result in a clean exit instead of segfaults.  It would
be nice to add a test case (using ulimit or failmalloc) but that can
wait for another day.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Jonathan Nieder 4d21bec0d2 vcs-svn: eliminate global byte_buffer
The data stored in byte_buffer[] is always either discarded or
written to stdout immediately.  No need for it to persist between
function calls.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Ramsay Jones 5ee5f5a65d svndump.c: Fix a printf format compiler warning
In particular, on systems that define uint32_t as an unsigned long,
gcc complains as follows:

        CC vcs-svn/svndump.o
    vcs-svn/svndump.c: In function `svndump_read':
    vcs-svn/svndump.c:215: warning: int format, uint32_t arg (arg 2)

In order to suppress the warning we use the C99 format specifier
macro PRIu32 from <inttypes.h>.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18 16:48:47 -08:00
Junio C Hamano bf9b46c16d Merge branch 'jn/svn-fe' (early part)
* 'jn/svn-fe' (early part):
  vcs-svn: Error out for v3 dumps

Conflicts:
	t/t9010-svn-fe.sh
2011-01-05 13:34:43 -08:00
Junio C Hamano b720c75afd Merge branch 'jn/maint-svn-fe'
* jn/maint-svn-fe:
  t9010 fails when no svn is available
  vcs-svn: fix intermittent repo_tree corruption
  treap: make treap_insert return inserted node
  t9010 (svn-fe): Eliminate dependency on svn perl bindings
2010-12-16 12:49:35 -08:00
Jonathan Nieder 3c93983875 vcs-svn: fix intermittent repo_tree corruption
Pointers to directory entries do not remain valid after a call to
dent_insert.

Noticed in the course of importing a small Subversion repository
(~1000 revs); after setting up a dirent for a certain path as a
placeholder, by luck dent_insert would trigger a realloc that
shifted around addresses, resulting in an import with that file
replaced by a directory.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-07 16:04:02 -08:00
Jonathan Nieder 97a5e3453a treap: make treap_insert return inserted node
Suppose I try the following:

	struct int_node *node = node_pointer(node_alloc(1));
	node->n = 5;
	treap_insert(&root, node);
	printf("%d\n", node->n);

Usually the result will be 5.  But since treap_insert draws memory
from the node pool, if the caller is unlucky then (1) the node pool
will be full and (2) realloc will be forced to move the node pool to
find room, so the node address becomes invalid and the result of
dereferencing it is undefined.

So we ought to use offsets in preference to pointers for references
that would remain valid after a call to treap_insert.  Tweak the
signature to hint at a certain special case: since the inserted node
can change address (though not offset), as a convenience teach
treap_insert to return its new address.

So the motivational example could be fixed by adding "node =".

	struct int_node *node = node_pointer(node_alloc(1));
	node->n = 5;
	node = treap_insert(&root, node);
	printf("%d\n", node->n);

Based on a true story.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-07 16:03:55 -08:00
Jonathan Nieder b3e5bce1aa vcs-svn: Error out for v3 dumps
By ignoring the Text-Delta and Prop-Delta node fields, current svn-fe
happily mistakes deltas for full text and instead of cleanly erroring
out, it produces a valid but semantically bogus fast-import stream
when fed a dump file in the modern "svnadmin dump --deltas" format.

Dump file parsers are supposed to ignore header fields they don't
understand (to allow for backward-compatible extensions), but they are
also supposed to check the SVN-fs-dump-format-version header to
prevent misinterpretation of non backward-compatible extensions.
Do so.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:48:52 -08:00
Ramsay Jones 5418d96ddc vcs-svn: Fix some printf format compiler warnings
In particular, on systems that define uint32_t as an unsigned long,
gcc complains as follows:

      CC vcs-svn/fast_export.o
  vcs-svn/fast_export.c: In function `fast_export_modify':
  vcs-svn/fast_export.c:28: warning: unsigned int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c:28: warning: int format, uint32_t arg (arg 3)
  vcs-svn/fast_export.c: In function `fast_export_commit':
  vcs-svn/fast_export.c:42: warning: int format, uint32_t arg (arg 5)
  vcs-svn/fast_export.c:62: warning: int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c: In function `fast_export_blob':
  vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 3)
      CC vcs-svn/svndump.o
  vcs-svn/svndump.c: In function `svndump_read':
  vcs-svn/svndump.c:260: warning: int format, uint32_t arg (arg 3)

In order to suppress the warnings we use the C99 format specifier
macros PRIo32 and PRIu32 from <inttypes.h>.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-12 10:24:55 -07:00
Jonathan Nieder 6117abae56 vcs-svn: Avoid %z in format string
In the spirit of v1.6.4-rc0~124 (MinGW: Fix compiler warning in
merge-recursive, 2009-05-23), use a 32-bit integer instead; the
dump file parser does not support any better, anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00
Jonathan Nieder 68b4cfbc91 vcs-svn: Rename dirent pool to build on Windows
dirent is #define’d to mingw_dirent in compat/mingw.h, with the
result that

 obj_pool_gen(dirent, struct repo_dirent, 4096)

creates functions with names like mingw_dirent_alloc and
references to dirent_alloc go unresolved.  Rename the functions
to dent_* to avoid this problem.

Reported-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00
Jonathan Nieder 6ad263ce7a treap: style fix
Missing spaces in while (0) and trpn_pointer(a, b).

Remove parentheses around return value.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00
David Barr 21746aa34f SVN dump parser
svndump parses data that is in SVN dumpfile format produced by
`svnadmin dump` with the help of line_buffer and uses repo_tree and
fast_export to emit a git fast-import stream.

Based roughly on com.hydrografix.svndump 0.92 from the SvnToCCase
project at <http://svn2cc.sarovar.org/>, by Stefan Hegny and
others.

[rr: allow input from files other than stdin]
[jn: with test, more error reporting]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00
David Barr c0e6c23dca Infrastructure to write revisions in fast-export format
repo_tree maintains the exporter's state and provides a facility to to
call fast_export, which writes objects to stdout suitable for
consumption by fast-import.

The exported functions roughly correspond to Subversion FS operations.

 . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete
   update the current commit, based roughly on the corresponding
   Subversion FS operation.

 . repo_commit calls out to fast_export to write the current commit to
   the fast-import stream in stdout.

 . repo_diff is used by the fast_export module to write the changes
   for a commit.

 . repo_reset erases the exporter's state, so valgrind can be happy.

[rr: squelched compiler warnings]
[jn: removed support for maintaining state on-disk, though we may
want to add it back later]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
David Barr 3bbaec00a8 Add stream helper library
This library provides thread-unsafe fgets()- and fread()-like
functions where the caller does not have to supply a buffer.  It
maintains a couple of static buffers and provides an API to use
them.

[rr: allow input from files other than stdin]
[jn: with tests, documentation, and error handling improvements]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
David Barr 1d73b52f5b Add string-specific memory pool
Intern strings so they can be compared by address and stored without
wasting space.

This library uses the macros in the obj_pool.h and trp.h to create a
memory pool for strings and expose an API for handling them.

[rr: added API docs]
[jn: with some API simplifications, new documentation and tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
Jason Evans 951f316470 Add treap implementation
Provide macros to generate a type-specific treap implementation and
various functions to operate on it. It uses obj_pool.h to store memory
nodes in a treap.  Previously committed nodes are never removed from
the pool; after any *_commit operation, it is assumed (correctly, in
the case of svn-fast-export) that someone else must care about them.

Treaps provide a memory-efficient binary search tree structure.
Insertion/deletion/search are about as about as fast in the average
case as red-black trees and the chances of worst-case behavior are
vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
behavior is a small price to pay, given that treaps are much simpler
to implement.

>From http://www.canonware.com/download/trp/trp_hash/trp.h

[db: Altered to reference nodes by offset from a common base pointer]
[db: Bob Jenkins' hashing implementation dropped for Knuth's]
[db: Methods unnecessary for search and insert dropped]
[rr: Squelched compiler warnings]
[db: Added support for immutable treap nodes]
[jn: Reintroduced treap_nsearch(); with tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
David Barr 4709455db3 Add memory pool library
Add a memory pool library implemented using C macros. The
obj_pool_gen() macro creates a type-specific memory pool.

The memory pool library is distinguished from the existing specialized
allocators in alloc.c by using a contiguous block for all allocations.
This means that on one hand, long-lived pointers have to be written as
offsets, since the base address changes as the pool grows, but on the
other hand, the entire pool can be easily written to the file system.
This could allow the memory pool to persist between runs of an
application.

For the svn importer, such a facility is useful because each svn
revision can copy trees and files from any previous revision.  The
relevant information for all revisions has to persist somehow to
support incremental runs.

[rr: minor cleanups]
[jn: added tests; removed file system backing for now]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
Jonathan Nieder 3f527372d9 Introduce vcs-svn lib
Teach the build system to build a separate library for the
upcoming subversion interop support.

The resulting vcs-svn/lib.a does not contain any code, nor is
it built during a normal build.  This is just scaffolding for
later changes.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00