Commit graph

493 commits

Author SHA1 Message Date
Junio C Hamano 05d88e6f7e Merge branch 'di/fast-import-blob-tweak'
* di/fast-import-blob-tweak:
  fast-import: treat cat-blob as a delta base hint for next blob
  fast-import: count and report # of calls to diff_delta in stats
2011-08-28 21:18:47 -07:00
Junio C Hamano 45792b64c1 Merge branch 'di/fast-import-deltified-tree'
* di/fast-import-deltified-tree:
  fast-import: prevent producing bad delta
  fast-import: add a test for tree delta base corruption
2011-08-28 21:18:47 -07:00
Junio C Hamano 0b98954975 Merge branch 'di/fast-import-ident'
* di/fast-import-ident:
  fsck: improve committer/author check
  fsck: add a few committer name tests
  fast-import: check committer name more strictly
  fast-import: don't fail on omitted committer name
  fast-import: add input format tests
2011-08-28 21:18:47 -07:00
Dmitry Ivankov 6c447f633c fast-import: allow to tag newly created objects
fast-import allows to tag objects by sha1 and to query sha1 of objects
being imported. So it should allow to tag these objects, make it do so.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-23 11:25:59 -07:00
Dmitry Ivankov 2efe38e7da fast-import: add tests for tagging blobs
fast-import allows to create an annotated tag that annotates a blob,
via mark or direct sha1 specification.

For mark it works, for sha1 it tries to read the object. It tries to
do so via read_sha1_file, and then checks the size to be at least 46.

That's weird, let's just allow to (annotated) tag any object referenced
by sha1. If the object originates from our packfile, we still fail though.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-23 11:25:56 -07:00
Dmitry Ivankov a7e9c34126 fast-import: treat cat-blob as a delta base hint for next blob
Delta base for blobs is chosen as a previously saved blob. If we
treat cat-blob's blob as a delta base for the next blob, nothing
is likely to become worse.

For fast-import stream producer like svn-fe cat-blob is used like
following:
- svn-fe reads file delta in svn format
- to apply it, svn-fe asks cat-blob 'svn delta base'
- applies 'svn delta' to the response
- produces a blob command to store the result

Currently there is no way for svn-fe to give fast-import a hint on
object delta base. While what's requested in cat-blob is most of
the time a best delta base possible. Of course, it could be not a
good delta base, but we don't know any better one anyway.

So do treat cat-blob's result as a delta base for next blob. The
profit is nice: 2x to 7x reduction in pack size AND 1.2x to 3x
time speedup due to diff_delta being faster on good deltas. git gc
--aggressive can compress it even more, by 10% to 70%, utilizing
more cpu time, real time and 3 cpu cores.

Tested on 213M and 2.7G fast-import streams, resulting packs are 22M
and 113M, import time is 7s and 60s, both streams are produced by
svn-fe, sniffed and then used as raw input for fast-import.

For git-fast-export produced streams there is no change as it doesn't
use cat-blob and doesn't try to reorder blobs in some smart way to
make successive deltas small.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Acked-by: David Barr <davidbarr@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22 11:57:07 -07:00
Dmitry Ivankov 94c3b48247 fast-import: count and report # of calls to diff_delta in stats
It's an interesting number, how often do we try to deltify each type of
objects and how often do we succeed. So do add it to stats.

Success doesn't mean much gain in pack size though. As we allow delta to
be as big as (data.len - 20). And delta close to data.len gains nothing
compared to no delta at all even after zlib compression (delta is pretty
much the same as data, just with few modifications).

We should try to make less attempts that result in huge deltas as these
consume more cpu than trivial small deltas. Either by choosing a better
delta base or reducing delta size upper bound or doing less delta attempts
at all.

Currently, delta base for blobs is a waste literally. Each blob delta
base is chosen as a previously stored blob. Disabling deltas for blobs
doesn't increase pack size and reduce import time, or at least doesn't
increase time for all fast-import streams I've tried.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Acked-by: David Barr <davidbarr@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22 11:57:06 -07:00
Dmitry Ivankov 8fb3ad76b1 fast-import: prevent producing bad delta
To produce deltas for tree objects fast-import tracks two versions
of tree's entries - base and current one. Base version stands both
for a delta base of this tree, and for a entry inside a delta base
of a parent tree. So care should be taken to keep it in sync.

tree_content_set cuts away a whole subtree and replaces it with a
new one (or NULL for lazy load of a tree with known sha1). It
keeps a base sha1 for this subtree (needed for parent tree). And
here is the problem, 'subtree' tree root doesn't have the implied
base version entries.

Adjusting the subtree to include them would mean a deep rewrite of
subtree. Invalidating the subtree base version would mean recursive
invalidation of parents' base versions. So just mark this tree as
do-not-delta me. Abuse setuid bit for this purpose.

tree_content_replace is the same as tree_content_set except that is
is used to replace the root, so just clearing base sha1 here (instead
of setting the bit) is fine.

[di: log message]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-14 14:40:01 -07:00
Dmitry Ivankov 4b4963c0e1 fast-import: check committer name more strictly
The documentation declares following identity format:
(<name> SP)? LT <email> GT
where name is any string without LF and LT characters.
But fast-import just accepts any string up to first GT
instead of checking the whole format, and moreover just
writes it as is to the commit object.

git-fsck checks for [^<\n]* <[^<>\n]*> format. Note that the
space is mandatory. And the space quirk is already handled via
extending the string to the left when needed.

Modify fast-import input identity format to a slightly stricter
one - deny LF, LT and GT in both <name> and <email>. And check
for it.

This is stricter then git-fsck as fsck accepts "Name> <email>"
currently, but soon fsck check will be adjusted likewise.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 12:21:03 -07:00
Dmitry Ivankov 17fb00721b fast-import: don't fail on omitted committer name
fast-import format declares 'committer_name SP' to be optional in
'committer_name SP LT email GT'. But for a (commit) object SP is
obligatory while zero length committer_name is ok. git-fsck checks
that SP is present, so fast-import must prepend it if the name SP
part is omitted. It doesn't do so and thus for "LT email GT" ident
it writes a bad object.

Name cannot contain LT or GT, ident always comes after SP in fast-import.
So if ident starts with LT reuse the SP as if a valid 'SP LT email GT'
ident was passed.

This fixes a ident parsing bug for a well-formed fast-import input.
Though the parsing is still loose and can accept a ill-formed input.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 12:20:56 -07:00
Junio C Hamano 59d9ba869e Merge branch 'sr/transport-helper-fix'
* sr/transport-helper-fix: (21 commits)
  transport-helper: die early on encountering deleted refs
  transport-helper: implement marks location as capability
  transport-helper: Use capname for refspec capability too
  transport-helper: change import semantics
  transport-helper: update ref status after push with export
  transport-helper: use the new done feature where possible
  transport-helper: check status code of finish_command
  transport-helper: factor out push_update_refs_status
  fast-export: support done feature
  fast-import: introduce 'done' command
  git-remote-testgit: fix error handling
  git-remote-testgit: only push for non-local repositories
  remote-curl: accept empty line as terminator
  remote-helpers: export GIT_DIR variable to helpers
  git_remote_helpers: push all refs during a non-local export
  transport-helper: don't feed bogus refs to export push
  git-remote-testgit: import non-HEAD refs
  t5800: document some non-functional parts of remote helpers
  t5800: use skip_all instead of prereq
  t5800: factor out some ref tests
  ...
2011-08-01 15:00:14 -07:00
Sverre Rabbelier be56862f19 fast-import: introduce 'done' command
Add a 'done' command that causes fast-import to stop reading from the
stream and exit.

If the new --done command line flag was passed on the command line
(or a "feature done" declaration included at the start of the stream),
make the 'done' command mandatory.  So "git fast-import --done"'s
input format will be prefix-free, making errors easier to detect when
they show up as early termination at some convenient time of the
upstream of a pipe writing to fast-import.

Another possible application of the 'done' command would to be allow a
fast-import stream that is only a small part of a larger encapsulating
stream to be easily parsed, leaving the file offset after the "done\n"
so the other application can pick up from there.  This patch does not
teach fast-import to do that --- fast-import still uses buffered input
(stdio).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-19 11:17:47 -07:00
Junio C Hamano d907bf8ef3 Merge branch 'jc/index-pack'
* jc/index-pack:
  verify-pack: use index-pack --verify
  index-pack: show histogram when emulating "verify-pack -v"
  index-pack: start learning to emulate "verify-pack -v"
  index-pack: a miniscule refactor
  index-pack --verify: read anomalous offsets from v2 idx file
  write_idx_file: need_large_offset() helper function
  index-pack: --verify
  write_idx_file: introduce a struct to hold idx customization options
  index-pack: group the delta-base array entries also by type

Conflicts:
	builtin/verify-pack.c
	cache.h
	sha1_file.c
2011-07-19 09:54:51 -07:00
Junio C Hamano ef49a7a012 zlib: zlib can only process 4GB at a time
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:52:15 -07:00
Junio C Hamano 225a6f1068 zlib: wrap deflateBound() too
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:18:17 -07:00
Junio C Hamano 55bb5c9147 zlib: wrap deflate side of the API
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().

There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:10:29 -07:00
Sverre Rabbelier 4cce4ef2d5 fast-import: fix option parser for no-arg options
While refactoring the options parser in bc3c79a (fast-import: add
(non-)relative-marks feature, 2009-12-04), it was made too lenient
for options that take no argument, fix that.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-05 21:21:24 -07:00
Junio C Hamano f28d2e33c6 Merge branch 'jc/pack-objects-bigfile' into maint
* jc/pack-objects-bigfile:
  Teach core.bigfilethreashold to pack-objects
2011-05-04 14:57:38 -07:00
Junio C Hamano 15366280c2 Teach core.bigfilethreashold to pack-objects
The pack-objects command should take notice of the object file and
refrain from attempting to delta large ones, to be consistent with
the fast-import command.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-05 20:25:49 -07:00
Stephen Boyd c2e86addb8 Fix sparse warnings
Fix warnings from 'make check'.

 - These files don't include 'builtin.h' causing sparse to complain that
   cmd_* isn't declared:

   builtin/clone.c:364, builtin/fetch-pack.c:797,
   builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78,
   builtin/merge-index.c:69, builtin/merge-recursive.c:22
   builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426
   builtin/notes.c:822, builtin/pack-redundant.c:596,
   builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149,
   builtin/remote.c:1512, builtin/remote-ext.c:240,
   builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384,
   builtin/unpack-file.c:25, builtin/var.c:75

 - These files have symbols which should be marked static since they're
   only file scope:

   submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13,
   submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79,
   unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123,
   url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48

 - These files redeclare symbols to be different types:

   builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571,
   usage.c:49, usage.c:58, usage.c:63, usage.c:72

 - These files use a literal integer 0 when they really should use a NULL
   pointer:

   daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362

While we're in the area, clean up some unused #includes in builtin files
(mostly exec_cmd.h).

Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22 10:16:54 -07:00
Junio C Hamano b2f6eab402 Merge branch 'maint'
* maint:
  Prepare draft release notes to 1.7.4.2
  gitweb: highlight: replace tabs with spaces
  make_absolute_path: return the input path if it points to our buffer
  valgrind: ignore SSE-based strlen invalid reads
  diff --submodule: split into bite-sized pieces
  cherry: split off function to print output lines
  branch: split off function that writes tracking info and commit subject
  standardize brace placement in struct definitions
  compat: make gcc bswap an inline function
  enums: omit trailing comma for portability

Conflicts:
	RelNotes
2011-03-16 16:59:30 -07:00
Jonathan Nieder 9cba13ca5d standardize brace placement in struct definitions
In a struct definitions, unlike functions, the prevailing style is for
the opening brace to go on the same line as the struct name, like so:

 struct foo {
	int bar;
	char *baz;
 };

Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many
matches as 'struct [a-z_]*$'.

Linus sayeth:

 Heretic people all over the world have claimed that this inconsistency
 is ...  well ...  inconsistent, but all right-thinking people know that
 (a) K&R are _right_ and (b) K&R are right.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16 12:49:02 -07:00
Junio C Hamano 674ef90904 Merge branch 'sp/maint-fd-limit'
* sp/maint-fd-limit:
  sha1_file.c: Don't retain open fds on small packs
  mingw: add minimum getrlimit() compatibility stub
  Limit file descriptors used by packs
2011-03-15 14:22:23 -07:00
Shawn O. Pearce d131b7afea sha1_file.c: Don't retain open fds on small packs
If a pack file is small enough that its entire contents fits within
one mmap window, mmap the file and then immediately close its file
descriptor.  This reduces the number of file descriptors that are
needed to read from repositories with many tiny pack files, such
as one that has received 1000 pushes (and created 1000 small pack
files) since its last repack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-02 11:25:30 -08:00
Jonathan Nieder 6288e3e180 fast-import: make code "-Wpointer-arith" clean
The dereference() function to peel a tree-ish and find the underlying
tree expects arithmetic to (void *) to work on byte addresses.  We
should be reading the text of objects through a char * anyway.

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-28 15:25:12 -06:00
Junio C Hamano ebcfb3791a write_idx_file: introduce a struct to hold idx customization options
Remove two globals, pack_idx_default version and pack_idx_off32_limit,
and place them in a pack_idx_option structure.  Allow callers to pass
it to write_idx_file() as a parameter.

Adjust all callers to the API change.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27 23:29:03 -08:00
David Barr 8dc6a373d2 fast-import: add 'ls' command
Lazy fast-import frontend authors that want to rely on the backend to
keep track of the content of the imported trees _almost_ have what
they need in the 'cat-blob' command (v1.7.4-rc0~30^2~3, 2010-11-28).
But it is not quite enough, since

 (1) cat-blob can be used to retrieve the content of files, but
     not their mode, and

 (2) using cat-blob requires the frontend to keep track of a name
     (mark number or object id) for each blob to be retrieved

Introduce an 'ls' command to complement cat-blob and take care of the
remaining needs.  The 'ls' command finds what is at a given path
within a given tree-ish (tag, commit, or tree):

	'ls' SP <dataref> SP <path> LF

or in fast-import's active commit:

	'ls' SP <path> LF

The response is a single line sent through the cat-blob channel,
imitating ls-tree output.  So for example:

	FE> ls :1 Documentation
	gfi> 040000 tree 9e6c2b599341d28a2a375f8207507e0a2a627fe9	Documentation
	FE> ls 9e6c2b599341d28a2a375f8207507e0a2a627fe9 git-fast-import.txt
	gfi> 100644 blob 4f92954396e3f0f97e75b6838a5635b583708870	git-fast-import.txt
	FE> ls :1 RelNotes
	gfi> 120000 blob b942e49944	RelNotes
	FE> cat-blob b942e49944
	gfi> b942e49944 blob 32
	gfi> Documentation/RelNotes/1.7.4.txt

The most interesting parts of the reply are the first word, which is
a 6-digit octal mode (regular file, executable, symlink, directory,
or submodule), and the part from the second space to the tab, which is
a <dataref> that can be used in later cat-blob, ls, and filemodify (M)
commands to refer to the content (blob, tree, or commit) at that path.

If there is nothing there, the response is "missing some/path".

The intent is for this command to be used to read files from the
active commit, so a frontend can apply patches to them, and to copy
files and directories from previous revisions.

For example, proposed updates to svn-fe use this command in place of
its internal representation of the repository directory structure.
This simplifies the frontend a great deal and means support for
resuming an import in a separate fast-import run (i.e., incremental
import) is basically free.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Sverre Rabbelier <srabbelier@gmail.com>
2011-02-26 04:57:58 -06:00
Junio C Hamano fc180d98a2 Merge branch 'rr/fi-import-marks-if-exists'
* rr/fi-import-marks-if-exists:
  fast-import: Introduce --import-marks-if-exists
2011-02-09 16:41:16 -08:00
Junio C Hamano a8e4a5943a Merge branch 'maint-1.7.0' into maint
* maint-1.7.0:
  fast-import: introduce "feature notes" command
  fast-import: clarify documentation of "feature" command

Conflicts:
	Documentation/git-fast-import.txt
2011-02-09 16:40:12 -08:00
Jonathan Nieder 547e8b9205 fast-import: introduce "feature notes" command
Here is a 'feature' command for streams to use to require support for
the notemodify (N) command.

When the 'feature' facility was introduced (v1.7.0-rc0~95^2~4,
2009-12-04), the notes import feature was old news (v1.6.6-rc0~21^2~8,
2009-10-09) and it was not obvious it deserved to be a named feature.
But now that is clear, since all major non-git fast-import backends
lack support for it.

Details: on git version with this patch applied, any "feature notes"
command in the features/options section at the beginning of a stream
will be treated as a no-op.  On fast-import implementations without
the feature (and older git versions), the command instead errors out
with a message like

	This version of fast-import does not support feature notes.

So by declaring use of notes at the beginning of a stream, frontends
can avoid wasting time and other resources when the backend does not
support notes.  (This would be especially important for backends that
do not support rewinding history after a botched import.)

Improved-by: Thomas Rast <trast@student.ethz.ch>
Improved-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-09 16:06:51 -08:00
Junio C Hamano 99e63ef24e Merge branch 'maint'
* maint:
  rebase -i: clarify in-editor documentation of "exec"
  tests: sanitize more git environment variables
  fast-import: treat filemodify with empty tree as delete
  rebase: give a better error message for bogus branch
  rebase: use explicit "--" with checkout

Conflicts:
	t/t9300-fast-import.sh
2011-01-27 10:27:49 -08:00
Junio C Hamano 5ce3258122 Merge branch 'jn/fast-import-empty-tree-removal' into maint
* jn/fast-import-empty-tree-removal:
  fast-import: treat filemodify with empty tree as delete
2011-01-27 10:23:53 -08:00
Jonathan Nieder 8fe533f686 fast-import: treat filemodify with empty tree as delete
Normal git processes do not allow one to build a tree with an empty
subtree entry without trying hard at it.  This is in keeping with the
general UI philosophy: git tracks content, not empty directories.

v1.7.3-rc0~75^2 (2010-06-30) changed that by making it easy to include
an empty subtree in fast-import's active commit:

	M 040000 4b825dc642 subdir

One can trigger this by reading an empty tree (for example, the tree
corresponding to an empty root commit) and trying to move it to a
subtree.  It is better and more closely analogous to 'git read-tree
--prefix' to treat such commands as requests to remove the subtree.

Noticed-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-27 10:22:37 -08:00
Junio C Hamano 267684f0b7 Merge branch 'jn/maint-fast-import-object-reuse' into maint
* jn/maint-fast-import-object-reuse:
  fast-import: insert new object entries at start of hash bucket
2011-01-19 08:25:46 -08:00
Ramkumar Ramachandra dded4f12a4 fast-import: Introduce --import-marks-if-exists
When a frontend uses a marks file to ensure its state persists between
runs, it may represent "clean slate" when bootstrapping with "no marks
yet". In such a case, feeding the last state with --import-marks and
saving the state after the current run with --export-marks would be a
natural thing to do.

The --import-marks option however errors out when the specified marks file
doesn't exist; this makes bootstrapping a bit difficult.  The location of
the marks file becomes backend-dependent when --relative-marks is in
effect, and the frontend cannot check for the existence of the file in
such a case.

The --import-marks-if-exists option does the same thing as --import-marks
but does not flag an error if the named file does not exist yet to help
these frontends.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18 07:07:01 -08:00
Junio C Hamano 914584266c Merge branch 'jn/fast-import-blob-access'
* jn/fast-import-blob-access:
  t9300: avoid short reads from dd
  t9300: remove unnecessary use of /dev/stdin
  fast-import: Allow cat-blob requests at arbitrary points in stream
  fast-import: let importers retrieve blobs
  fast-import: clarify documentation of "feature" command
  fast-import: stricter parsing of integer options

Conflicts:
	fast-import.c
2010-12-16 12:58:38 -08:00
Junio C Hamano c835288be4 Merge branch 'jn/maint-fast-import-object-reuse'
* jn/maint-fast-import-object-reuse:
  fast-import: insert new object entries at start of hash bucket
2010-12-16 12:49:16 -08:00
Junio C Hamano f73c3e9704 Merge branch 'jn/fast-import-ondemand-checkpoint'
* jn/fast-import-ondemand-checkpoint:
  fast-import: treat SIGUSR1 as a request to access objects early
2010-12-16 12:49:11 -08:00
Junio C Hamano 5e738ae820 Merge branch 'jj/icase-directory'
* jj/icase-directory:
  Support case folding in git fast-import when core.ignorecase=true
  Support case folding for git add when core.ignorecase=true
  Add case insensitivity support when using git ls-files
  Add case insensitivity support for directories when using git status
  Case insensitivity support for .gitignore via core.ignorecase
  Add string comparison functions that respect the ignore_case variable.
  Makefile & configure: add a NO_FNMATCH_CASEFOLD flag
  Makefile & configure: add a NO_FNMATCH flag

Conflicts:
	Makefile
	config.mak.in
	configure.ac
	fast-import.c
2010-12-03 16:10:34 -08:00
Jonathan Nieder 777f80d742 fast-import: Allow cat-blob requests at arbitrary points in stream
The new rule: a "cat-blob" can be inserted wherever a comment is
allowed, which means at the start of any line except in the middle of
a "data" command.

This saves frontends from having to loop over everything they want to
commit in the next commit and cat-ing the necessary objects in
advance.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-01 13:28:04 -08:00
David Barr 85c62395b1 fast-import: let importers retrieve blobs
New objects written by fast-import are not available immediately.
Until a checkpoint has been started and finishes writing the pack
index, any new blobs will not be accessible using standard git tools.

So introduce a new way to access them: a "cat-blob" command in the
command stream requests for fast-import to print a blob to stdout or a
file descriptor specified by the argument to --cat-blob-fd.  The value
for cat-blob-fd cannot be specified in the stream because that would
be a layering violation: the decision of where to direct a stream has
to be made when fast-import is started anyway, so we might as well
make the stream format is independent of that detail.

Output uses the same format as "git cat-file --batch".

Thanks to Sverre Rabbelier and Sam Vilain for guidance in designing
the protocol.

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-01 13:27:37 -08:00
Jonathan Nieder a9ff277e58 fast-import: stricter parsing of integer options
Check the result from strtoul to avoid accepting arguments like
--depth=-1 and --active-branches=foo,bar,baz.

Requested-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-01 13:26:52 -08:00
Junio C Hamano ed8298dc34 Merge branch 'jn/fast-import-fix'
* jn/fast-import-fix:
  fast-import: do not clear notes in do_change_note_fanout()
  t9300 (fast-import): another test for the "replace root" feature
  fast-import: tighten M 040000 syntax
  fast-import: filemodify after M 040000 <tree> "" crashes
2010-11-29 17:52:32 -08:00
Jonathan Nieder dc01f59d21 fast-import: treat SIGUSR1 as a request to access objects early
It can be tedious to wait for a multi-million-revision import.
Unfortunately it is hard to spy on the import because fast-import
works by continuously streaming out objects, without updating the pack
index or refs until a checkpoint command or the end of the stream.

So allow the impatient operator to request checkpoints by sending a
signal, like so:

	killall -USR1 git-fast-import

When receiving such a signal, fast-import would schedule a checkpoint
to take place after the current top-level command (usually a "commit"
or "blob" request) finishes.

Caveats: just like ordinary checkpoint commands, such requests slow
down the import.  Switching to a new pack at a suboptimal moment is
also likely to result in a less dense initial collection of packs.
That's the price.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 15:01:24 -08:00
David Barr b7c1ce4f14 fast-import: insert new object entries at start of hash bucket
More often than not, find_object is called for recently inserted objects.
Optimise for this case by inserting new entries at the start of the chain.
This doesn't affect the cost of new inserts but reduces the cost of find
and insert for existing object entries.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 11:25:16 -08:00
Junio C Hamano d4c4369752 Sync with 1.7.3.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-21 17:16:10 -07:00
Jonathan Nieder b21241253b fast-import: do not clear notes in do_change_note_fanout()
Commit 5edde51 (fast-import: filemodify after M 040000 <tree> ""
crashes, 2010-10-17) taught fast-import to load trees from the
object db as needed when it is time to access them.

But it went too far.  In change_note_fanout(), an empty,
not-loaded tree is not meant to destroy notes, so calling
load_tree() at that point is exactly the wrong thing to do.

Kudos to Johan Herland for t9301, which caught this failure.

Reported-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-20 14:35:58 -07:00
Jonathan Nieder 3421578393 fast-import: tighten M 040000 syntax
When tree_content_set() is asked to modify the path "foo/bar/",
it first recurses like so:

	tree_content_set(root, "foo/bar/", sha1, S_IFDIR) ->
	 tree_content_set(root:foo, "bar/", ...) ->
	  tree_content_set(root:foo/bar, "", ...)

And as a side-effect of 2794ad5 (fast-import: Allow filemodify to set
the root, 2010-10-10), this last call is accepted and changes
the tree entry for root:foo/bar to refer to the specified tree.

That seems safe enough but let's reject the new syntax (we never meant
to support it) and make it harder for frontends to introduce pointless
incompatibilities with git fast-import 1.7.3.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-18 16:42:26 -07:00
Jonathan Nieder 5edde51018 fast-import: filemodify after M 040000 <tree> "" crashes
Until M 040000 <tree> "" syntax was introduced in commit 2794ad5
(fast-import: Allow filemodify to set the root, 2010-10-10), it
was impossible for the root entry to refer to an unloaded tree.
Update various functions to take that possibility into account.
Otherwise

	M 040000 <tree> ""
	M 100644 :1 "foo"

and similar commands (using D, C, or R after resetting the root
tree) segfault.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-18 16:41:27 -07:00
David Barr 2794ad5244 fast-import: Allow filemodify to set the root
v1.7.3-rc0~75^2 (Teach fast-import to import subtrees named by tree id,
2010-06-30) has a shortcoming - it doesn't allow the root to be set.
Extend this behaviour by allowing the root to be referenced as the
empty path, "".

For a command (like filter-branch --subdirectory-filter) that wants
to commit a lot of trees that already exist in the object db, writing
undeltified objects as loose files only to repack them later can
involve a significant amount of overhead.
(23% slow-down observed on Linux 2.6.35, worse on Mac OS X 10.6)

Fortunately we have fast-import (which is one of the only git commands
that will write to a pack directly) but there is not an advertised way
to tell fast-import to commit a given tree without unpacking it.

This patch changes that, by allowing

	M 040000 <tree id> ""

as a filemodify line in a commit to reset to a particular tree without
any need to parse it.  For example,

	M 040000 4b825dc642 ""

is a synonym for the deleteall command and the fast-import equivalent of

	git read-tree 4b825dc642

Signed-off-by: David Barr <david.barr@cordelta.com>
Commit-message-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Sverre Rabbelier <srabbelier@gmail.com>
Tested-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-13 15:10:31 -07:00
Štěpán Němec 62b4698e55 Use angles for placeholders consistently
Signed-off-by: Štěpán Němec <stepnem@gmail.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-08 12:29:52 -07:00
Joshua Jensen 50906e04e8 Support case folding in git fast-import when core.ignorecase=true
When core.ignorecase=true, imported file paths will be folded to match
existing directory case.

Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06 11:21:57 -07:00
Junio C Hamano 3f29dd6c23 Merge branch 'en/d-f-conflict-fix'
* en/d-f-conflict-fix:
  merge-recursive: Avoid excessive output for and reprocessing of renames
  merge-recursive: Fix multiple file rename across D/F conflict
  t6031: Add a testcase covering multiple renames across a D/F conflict
  merge-recursive: Fix typo
  Mark tests that use symlinks as needing SYMLINKS prerequisite
  t/t6035-merge-dir-to-symlink.sh: Remove TODO on passing test
  fast-import: Improve robustness when D->F changes provided in wrong order
  fast-export: Fix output order of D/F changes
  merge_recursive: Fix renames across paths below D/F conflicts
  merge-recursive: Fix D/F conflicts
  Add a rename + D/F conflict testcase
  Add additional testcases for D/F conflicts

Conflicts:
	merge-recursive.c
2010-08-31 16:23:58 -07:00
Junio C Hamano ebb561bcfc Merge branch 'jn/fast-import-subtree'
* jn/fast-import-subtree:
  Teach fast-import to import subtrees named by tree id
2010-08-18 12:14:41 -07:00
Raja R Harinath 7e7db5e452 fast-import: export correctly marks larger than 2^20-1
dump_marks_helper() has a bug when dumping marks larger than 2^20-1,
i.e., when the sparse array has more than two levels.  The bug was
that the 'base' counter was being shifted by 20 bits at level 3, and
then again by 10 bits at level 2, rather than a total shift of 20 bits
in this argument to the recursive call:

  (base + k) << m->shift

There are two ways to fix this correctly, the elegant:

  (base + k) << 10

and the one I chose due to edit distance:

  base + (k << m->shift)

Signed-off-by: Raja R Harinath <harinath@hurrynot.org>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-11 10:45:15 -07:00
Elijah Newren 253fb5f889 fast-import: Improve robustness when D->F changes provided in wrong order
When older versions of fast-export came across a directory changing to a
symlink (or regular file), it would output the changes in the form
  M 120000 :239821 dir-changing-to-symlink
  D dir-changing-to-symlink/filename1
When fast-import sees the first line, it deletes the directory named
dir-changing-to-symlink (and any files below it) and creates a symlink in
its place.  When fast-import came across the second line, it was previously
trying to remove the file and relevant leading directories in
tree_content_remove(), and as a side effect it would delete the symlink
that was just created.  This resulted in the symlink silently missing from
the resulting repository.

To improve robustness, we ignore file deletions underneath directory names
that correspond to non-directories.  This can also be viewed as a minor
optimization: since there cannot be a file and a directory with the same
name in the same directory, the file clearly can't exist so nothing needs
to be done to delete it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-09 16:16:47 -07:00
Jonathan Nieder 334fba656b Teach fast-import to import subtrees named by tree id
To simulate the svn cp command, it would be very useful to be
replace an arbitrary file in the current revision by an
arbitrary directory from a previous one.  Modify the filemodify
command to allow that:

 M 040000 <tree id> pathname

This would be most useful in combination with a facility to
print the commit ids for new revisions as they are written.

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-05 12:11:33 -07:00
Junio C Hamano 8d676d85f7 Merge branch 'gv/portable'
* gv/portable:
  test-lib: use DIFF definition from GIT-BUILD-OPTIONS
  build: propagate $DIFF to scripts
  Makefile: Tru64 portability fix
  Makefile: HP-UX 10.20 portability fixes
  Makefile: HPUX11 portability fixes
  Makefile: SunOS 5.6 portability fix
  inline declaration does not work on AIX
  Allow disabling "inline"
  Some platforms lack socklen_t type
  Make NO_{INET_NTOP,INET_PTON} configured independently
  Makefile: some platforms do not have hstrerror anywhere
  git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition
  test_cmp: do not use "diff -u" on platforms that lack one
  fixup: do not unconditionally disable "diff -u"
  tests: use "test_cmp", not "diff", when verifying the result
  Do not use "diff" found on PATH while building and installing
  enums: omit trailing comma for portability
  Makefile: -lpthread may still be necessary when libc has only pthread stubs
  Rewrite dynamic structure initializations to runtime assignment
  Makefile: pass CPPFLAGS through to fllow customization

Conflicts:
	Makefile
	wt-status.h
2010-06-21 06:02:44 -07:00
Gary V. Vaughan 4b05548fc0 enums: omit trailing comma for portability
Without this patch at least IBM VisualAge C 5.0 (I have 5.0.2) on AIX
5.1 fails to compile git.

enum style is inconsistent already, with some enums declared on one
line, some over 3 lines with the enum values all on the middle line,
sometimes with 1 enum value per line... and independently of that the
trailing comma is sometimes present and other times absent, often
mixing with/without trailing comma styles in a single file, and
sometimes in consecutive enum declarations.

Clearly, omitting the comma is the more portable style, and this patch
changes all enum declarations to use the portable omitted dangling
comma style consistently.

Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-31 16:59:27 -07:00
Sverre Rabbelier 580d5f83e7 fast-import: always create marks_file directories
CC: "Shawn O. Pearce" <spearce@spearce.org>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-31 09:37:26 -07:00
Michael Lukashov 1b22b6c897 refactor duplicated encode_header in pack-objects and fast-import
The following function is duplicated:

  encode_header

Move this function to sha1_file.c and rename it 'encode_in_pack_object_header',
as suggested by Junio C Hamano

Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 15:30:20 -08:00
Nicolas Pitre b500d5e11e fast-import: use the diff_delta() max_delta_size argument
This let diff_delta() abort early if it is going to bust the given
size limit.  Also, only objects larger than 20 bytes are considered
as objects smaller than that are most certainly going to produce
larger deltas than the original object due to the additional headers.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:44 -08:00
Nicolas Pitre 8c2ca8dd8a fast-import: honor pack.indexversion and pack.packsizelimit config vars
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:44 -08:00
Nicolas Pitre 89e0a3a131 fast-import: make default pack size unlimited
Now that fast-import is creating packs with index version 2, there is
no point limiting the pack size by default.  A pack split will still
happen if off_t is not sufficiently large to hold large offsets.

While updating the doc, let's remove the "packfiles fit on CDs"
suggestion.  Pack files created by fast-import are still suboptimal and
a 'git repack -a -f -d' or even 'git gc --aggressive' would be a pretty
good idea before considering storage on CDs.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:43 -08:00
Nicolas Pitre 427cb22c40 fast-import: use write_idx_file() instead of custom code
This allows for the creation of pack index version 2 with its object
CRC and the possibility for a pack to be larger than 4 GB.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:43 -08:00
Nicolas Pitre 212818160d fast-import: use sha1write() for pack data
This is in preparation for using write_idx_file().  Also, by using
sha1write() we get some buffering to reduces the number of write
syscalls, and the written data is SHA1 summed which allows for the extra
data integrity validation check performed in fixup_pack_header_footer()
(details on this in commit abeb40e5aa).

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:42 -08:00
Nicolas Pitre 3fc366bdbb fast-import: start using struct pack_idx_entry
This is in preparation for using write_idx_file().

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:42 -08:00
Junio C Hamano 4d0cc22437 fast-import: count --max-pack-size in bytes
Similar in spirit to 07cf0f2 (make --max-pack-size argument to 'git
pack-object' count in bytes, 2010-02-03) which made the option by the same
name to pack-objects, this counts the pack size limit in bytes.

In order not to cause havoc with people used to the previous megabyte
scale an integer smaller than 8192 is interpreted in megabytes but the
user gets a warning.  Also a minimum size of 1 MiB is enforced to avoid an
explosion of pack files.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
2010-02-04 15:12:17 -08:00
Junio C Hamano 76ea93ccb5 fast-import.c: Fix big-file-threshold parsing bug
Manual merge made at 844ad3d (Merge branch 'sp/maint-fast-import-large-blob'
into sp/fast-import-large-blob, 2010-02-01) did not correctly reflect the change
of unit in which this variable's value is counted from its previous version.

Now it counts in bytes, not in megabytes.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
2010-02-04 09:09:50 -08:00
Junio C Hamano 844ad3d9a0 Merge branch 'sp/maint-fast-import-large-blob' into sp/fast-import-large-blob
* sp/maint-fast-import-large-blob:
  fast-import: Stream very large blobs directly to pack
  bash: don't offer remote transport helpers as subcommands

Conflicts:
	fast-import.c
2010-02-01 12:42:00 -08:00
Shawn O. Pearce 5eef828bc0 fast-import: Stream very large blobs directly to pack
If a blob is larger than the configured big-file-threshold, instead
of reading it into a single buffer obtained from malloc, stream it
onto the end of the current pack file.  Streaming the larger objects
into the pack avoids the 4+ GiB memory footprint that occurs when
fast-import is processing 2+ GiB blobs.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-01 12:09:47 -08:00
Junio C Hamano 5fc9df08b5 Merge branch 'jh/notes' (early part)
* 'jh/notes' (early part):
  Add more testcases to test fast-import of notes
  Rename t9301 to t9350, to make room for more fast-import tests
  fast-import: Proper notes tree manipulation
2010-01-20 20:28:49 -08:00
Junio C Hamano e33fd3c326 Merge branch 'maint'
* maint:
  Update draft release notes to 1.6.6.1
  grep: NUL terminate input from a file
  fast-import: tag may point to any object type
2010-01-18 18:16:19 -08:00
Junio C Hamano 6304c4068e Merge branch 'dp/maint-1.6.5-fast-import-non-commit-tag' into maint
* dp/maint-1.6.5-fast-import-non-commit-tag:
  fast-import: tag may point to any object type
2010-01-18 18:15:12 -08:00
Junio C Hamano fa232d457e Merge branch 'sr/gfi-options'
* sr/gfi-options:
  fast-import: add (non-)relative-marks feature
  fast-import: allow for multiple --import-marks= arguments
  fast-import: test the new option command
  fast-import: add option command
  fast-import: add feature command
  fast-import: put marks reading in its own function
  fast-import: put option parsing code in separate functions
2010-01-17 15:58:11 -08:00
Dmitry Potapov 8db751a8f9 fast-import: tag may point to any object type
If you tried to export the official git repository, and then to import it
back then git-fast-import would die complaining that "Mark :1 not a commit".

Accordingly to a generated crash file, Mark 1 is not a commit but a blob,
which is pointed by junio-gpg-pub tag. Because git-tag allows to create such
tags, git-fast-import should import them.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-14 21:51:06 -08:00
Shawn O. Pearce 74fbd1182a fast-import: Document author/committer/tagger name is optional
The fast-import parser does not validate that the author, committer
or tagger name component contains both a name and an email address.
Therefore the name component has always been optional.  Correct the
documentation to match the implementation.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-31 14:46:08 -08:00
Johan Herland 2a113aee9b fast-import: Proper notes tree manipulation
This patch teaches 'git fast-import' to automatically organize note objects
in a fast-import stream into an appropriate fanout structure. The notes API
in notes.h is NOT used to accomplish this, because trying to keep the
fast-import and notes data structures in sync would yield a significantly
larger patch with higher complexity.

Note objects are added with the 'N' command, and accounted for with a
per-branch counter, which is used to trigger fanout restructuring when
needed. Note that when restructuring the branch tree, _any_ entry whose
path consists of 40 hex chars (not including directory separators) will
be recognized as a note object. It is therefore not advisable to
manipulate note entries with M/D/R/C commands.

Since note objects are stored in the same tree structure as other objects,
the unloading and reloading of a fast-import branches handle note objects
transparently.

This patch has been improved by the following contributions:
- Shawn O. Pearce: Several style- and logic-related improvements

Cc: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-07 13:52:52 -08:00
Sverre Rabbelier bc3c79aefc fast-import: add (non-)relative-marks feature
After specifying 'feature relative-marks' the paths specified with
'feature import-marks' and 'feature export-marks' are relative to an
internal directory in the current repository.

In git-fast-import this means that the paths are relative to the
'.git/info/fast-import' directory. However, other importers may use a
different location.

Add 'feature non-relative-marks' to disable this behavior, this way
it is possible to, for example, specify the import-marks location as
relative, and the export-marks location as non-relative.

Also add tests to verify this behavior.

Cc: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-05 12:43:24 -08:00
Sverre Rabbelier 081751c882 fast-import: allow for multiple --import-marks= arguments
The --import-marks= option may be specified multiple times on the
commandline and should result in all marks being read in. Only one
import-marks feature may be specified in the stream, which is
overriden by any --import-marks= commandline options.

If one wishes to specify import-marks files in addition to the one
specified in the stream, it is easy to repeat the stream option as a
--import-marks= commandline option.

Also verify this behavior with tests.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-04 16:10:59 -08:00
Sverre Rabbelier 9c8398f0c9 fast-import: add option command
This allows the frontend to specify any of the supported options as
long as no non-option command has been given. This way the
user does not have to include any frontend-specific options, but
instead she can rely on the frontend to tell fast-import what it
needs.

Also factor out parsing of argv and have it execute when we reach the
first non-option command, or after all commands have been read and
no non-option command has been encountered.

Non-git options are ignored, unrecognised options result in an error.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-04 16:10:22 -08:00
Sverre Rabbelier f963bd5d71 fast-import: add feature command
This allows the fronted to require a specific feature to be supported
by the backend, or abort.

Also add support for four initial feature, date-format=, force=,
import-marks=, export-marks=.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-04 16:08:55 -08:00
Sverre Rabbelier 07cd9328b6 fast-import: put marks reading in its own function
All options do nothing but set settings, with the exception of the
--input-marks option. Delay the reading of the marks file till after
all options have been parsed.

Also, rename mark_file to export_marks_file as it is now ambiguous.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-04 16:08:54 -08:00
Sverre Rabbelier 0f6927c229 fast-import: put option parsing code in separate functions
Putting the options in their own functions increases readability of
the option parsing block and makes it easier to reuse the option
parsing code later on.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-04 16:08:53 -08:00
Junio C Hamano 885d492f69 Merge branch 'jh/notes' (early part)
* 'jh/notes' (early part):
  Add selftests verifying concatenation of multiple notes for the same commit
  Refactor notes code to concatenate multiple notes annotating the same object
  Add selftests verifying that we can parse notes trees with various fanouts
  Teach the notes lookup code to parse notes trees with various fanout schemes
  Teach notes code to free its internal data structures on request
  Add '%N'-format for pretty-printing commit notes
  Add flags to get_commit_notes() to control the format of the note string
  t3302-notes-index-expensive: Speed up create_repo()
  fast-import: Add support for importing commit notes
  Teach "-m <msg>" and "-F <file>" to "git notes edit"
  Add an expensive test for git-notes
  Speed up git notes lookup
  Add a script to edit/inspect notes
  Introduce commit notes

Conflicts:
	.gitignore
	Documentation/pretty-formats.txt
	pretty.c
2009-11-20 23:53:55 -08:00
Jonathan Nieder 71a04a8b52 Show usage string for 'git fast-import -h'
Let "git fast-import -h" (with no other arguments) print usage
before exiting, even when run outside any repository.

Cc: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-10 11:06:56 -08:00
Johan Herland a8dd2e7d2b fast-import: Add support for importing commit notes
Introduce a 'notemodify' subcommand of the 'commit' command. This subcommand
is similar to 'filemodify', except that no mode is supplied (all notes have
mode 0644), and the path is set to the hex SHA1 of the given "comittish".

This enables fast import of note objects along with their associated commits,
since the notes can now be named using the mark references of their
corresponding commits.

The patch also includes a test case of the added functionality.

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-19 19:00:24 -07:00
Junio C Hamano 1cd749cc07 fast-import.c::validate_raw_date(): really validate the value
When reading the "raw format" timestamp from the input stream, make sure
that the timezone offset is a reasonable value by imitating 7122f82
(date.c: improve guess between timezone offset and year., 2006-06-08).

We _might_ want to also check if the timestamp itself is reasonable, but
that is left for a separate commit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-07 13:05:03 -07:00
Thomas Rast 0721c314a5 Use die_errno() instead of die() when checking syscalls
Lots of die() calls did not actually report the kind of error, which
can leave the user confused as to the real problem.  Use die_errno()
where we check a system/library call that sets errno on failure, or
one of the following that wrap such calls:

  Function              Passes on error from
  --------              --------------------
  odb_pack_keep         open
  read_ancestry         fopen
  read_in_full          xread
  strbuf_read           xread
  strbuf_read_file      open or strbuf_read_file
  strbuf_readlink       readlink
  write_in_full         xwrite

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Thomas Rast d824cbba02 Convert existing die(..., strerror(errno)) to die_errno()
Change calls to die(..., strerror(errno)) to use the new die_errno().

In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Junio C Hamano 36587681b4 Merge branch 'ar/unlink-err'
* ar/unlink-err:
  print unlink(2) errno in copy_or_link_directory
  replace direct calls to unlink(2) with unlink_or_warn
  Introduce an unlink(2) wrapper which gives warning if unlink failed
2009-05-18 09:01:06 -07:00
Felipe Contreras 4b25d091ba Fix a bunch of pointer declarations (codestyle)
Essentially; s/type* /type */ as per the coding guidelines.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-01 15:17:31 -07:00
Alex Riesen 691f1a28bf replace direct calls to unlink(2) with unlink_or_warn
This helps to notice when something's going wrong, especially on
systems which lock open files.

I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
  called from such a function
- it is in a static function, returning void and the function is only
  called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-29 18:37:41 -07:00
Michael J Gruber b18cc5a3b2 Fix more typos/spelling in comments
A few more fixes on top of the automatic spell checker generated ones.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-22 19:03:39 -07:00
Mike Ralphson 3ea3c215c0 Fix typos / spelling in comments
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-22 19:02:12 -07:00
Junio C Hamano 03a39a9184 Merge branch 'jc/shared-literally'
* jc/shared-literally:
  t1301: loosen test for forced modes
  set_shared_perm(): sometimes we know what the final mode bits should look like
  move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath
  Move chmod(foo, 0444) into move_temp_to_file()
  "core.sharedrepository = 0mode" should set, not loosen
2009-04-06 00:42:52 -07:00
Johan Herland fb8b193670 Move chmod(foo, 0444) into move_temp_to_file()
When writing out a loose object or a pack (index), move_temp_to_file() is
called to finalize the resulting file. These files (loose files and packs)
should all have permission mode 0444 (modulo adjust_shared_perm()).
Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite
(or even forgetting to chmod() at all), do the chmod() call from within
move_temp_to_file().

Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-27 22:10:58 -07:00
Elijah Newren 98e1a4186a Correct missing SP characters in grammar comment at top of fast-import.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-25 16:36:16 -07:00
Benjamin Kramer eb3a9dd327 Remove unused function scope local variables
These variables were unused and can be removed safely:

  builtin-clone.c::cmd_clone(): use_local_hardlinks, use_separate_remote
  builtin-fetch-pack.c::find_common(): len
  builtin-remote.c::mv(): symref
  diff.c::show_stats():show_stats(): total
  diffcore-break.c::should_break(): base_size
  fast-import.c::validate_raw_date(): date, sign
  fsck.c::fsck_tree(): o_sha1, sha1
  xdiff-interface.c::parse_num(): read_some

Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 20:52:17 -08:00
Junio C Hamano bb0cebd7d0 Merge branch 'jc/maint-1.6.0-pack-directory'
* jc/maint-1.6.0-pack-directory:
  Make sure objects/pack exists before creating a new pack
2009-02-25 14:50:05 -08:00
Junio C Hamano 6e180cdcec Make sure objects/pack exists before creating a new pack
In a repository created with git older than f49fb35 (git-init-db: create
"pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is
not created upon initialization.  It was Ok because subdirectories are
created as needed inside directories init-db creates, and back then,
packfiles were recent invention.

After the said commit, new codepaths started relying on the presense of
objects/pack/ directory in the repository.  This was exacerbated with
8b4eb6b (Do not perform cross-directory renames when creating packs,
2008-09-22) that moved the location temporary pack files are created from
objects/ directory to objects/pack/ directory, because moving temporary to
the final location was done carefully with lazy leading directory creation.

Many packfile related operations in such an old repository can fail
mysteriously because of this.

This commit introduces two helper functions to make things work better.

 - odb_mkstemp() is a specialized version of mkstemp() to refactor the
   code and teach it to create leading directories as needed;

 - odb_pack_keep() refactors the code to create a ".keep" file while
   create leading directories as needed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-25 14:39:42 -08:00
Junio C Hamano ba19a808aa Drop double-semicolon in C
The worst offenders are "continue;;" and "break;;" in switch statements.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-10 22:26:37 -08:00
Junio C Hamano fd8475d9fb Merge branch 'maint'
* maint:
  Clear the delta base cache during fast-import checkpoint
2009-02-10 21:30:45 -08:00
Junio C Hamano 9b27ea9518 Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  Clear the delta base cache during fast-import checkpoint
2009-02-10 15:32:26 -08:00
Shawn O. Pearce 3d20c636af Clear the delta base cache during fast-import checkpoint
Otherwise we may reuse the same memory address for a totally
different "struct packed_git", and a previously cached object from
the prior occupant might be returned when trying to unpack an object
from the new pack.

Found-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-10 15:30:59 -08:00
Steffen Prohaska 2fb3f6db96 Add calls to git_extract_argv0_path() in programs that call git_config_*
Programs that use git_config need to find the global configuration.
When runtime prefix computation is enabled, this requires that
git_extract_argv0_path() is called early in the program's main().

This commit adds the necessary calls.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-26 00:26:05 -08:00
Junio C Hamano 4f8b8992ef Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  fast-import: Cleanup mode setting.
  Git.pm: call Error::Simple() properly
2009-01-13 23:10:50 -08:00
Felipe Contreras 3d1d81eba2 fast-import: Cleanup mode setting.
"S_IFREG | mode" makes only sense for 0644 and 0755.

Even though doing (S_IFREG | mode) may not hurt when mode is any other
supported value, that is only true because S_IFREG mode bit happens to
be already on for S_IFLNK or S_IFGITLINK.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-13 22:57:12 -08:00
René Scharfe c55fae43c4 fast-import.c: stricter strtoul check, silence compiler warning
Store the return value of strtoul() in order to avoid compiler
warnings on Ubuntu 8.10.

Also check errno after each call, which is the only way to notice
an overflow without making ULONG_MAX an illegal date.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-21 01:48:26 -08:00
Junio C Hamano efe05b019c Merge branch 'maint' to sync with GIT 1.6.0.6
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-19 19:35:55 -08:00
Junio C Hamano 88fbf67b78 fast-import: make tagger information optional
Even though newer Porcelain tools always record the tagger information
when creating new tags, export/import pair should be able to faithfully
reproduce ancient tag objects that lack tagger information.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
2008-12-19 19:25:06 -08:00
Junio C Hamano 90c3302173 Merge branch 'maint'
* maint:
  fast-import: close pack before unlinking it
  pager: do not dup2 stderr if it is already redirected
  git-show: do not segfault when showing a bad tag
2008-12-15 23:06:13 -08:00
Johannes Schindelin 87c8a56e4f fast-import: close pack before unlinking it
This is sort of a companion patch to 4723ee9(Close files opened by
lock_file() before unlinking.): on Windows, you cannot delete what
is still open.

This makes test 9300-fast-import pass on Windows for me; quite a few
fast-imports leave temporary packs until the test "blank lines not
necessary after other commands" actually tests for the number of files
in .git/objects/pack/, which has a few temporary packs now.

I guess that 8b4eb6b(Do not perform cross-directory renames when
creating packs) was "responsible" for the breakage.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-15 23:04:48 -08:00
YONETANI Tomokazu 2fad5329f4 git-fast-import possible memory corruption problem
Internal "allocate in bulk, we will never free this memory anyway"
allocator used in fast-import had a logic to round up the size of the
requested memory block in a wrong place (it computed if the available
space is enough to fit the request first, and then carved a chunk of
memory by size rounded up to the alignment, which could go beyond the
actually available space).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-14 16:41:32 -08:00
Nicolas Pitre 9126f0091f fix openssl headers conflicting with custom SHA1 implementations
On ARM I have the following compilation errors:

    CC fast-import.o
In file included from cache.h:8,
                 from builtin.h:6,
                 from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1

This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version.  Compilation of git is probably broken on PPC too for the
same reason.

Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c.  But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.

As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-02 18:06:56 -07:00
Junio C Hamano c4275591fb Merge branch 'maint'
* maint:
  builtin-prune.c: prune temporary packs in <object_dir>/pack directory
  Do not perform cross-directory renames when creating packs
2008-09-23 02:05:35 -07:00
Petr Baudis 8b4eb6b6cd Do not perform cross-directory renames when creating packs
A comment on top of create_tmpfile() describes caveats ('can have
problems on various systems (FAT, NFS, Coda)') that should apply
in this situation as well.  This in the end did not end up solving
any of my personal problems, but it might be a useful cleanup patch
nevertheless.

Signed-off-by: Petr Baudis <pasky@suse.cz>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-22 12:19:14 -07:00
Junio C Hamano 53b543ab82 Merge branch 'np/maint-safer-pack'
* np/maint-safer-pack:
  fixup_pack_header_footer(): use nicely aligned buffer sizes
  index-pack: use fixup_pack_header_footer()'s validation mode
  pack-objects: use fixup_pack_header_footer()'s validation mode
  improve reliability of fixup_pack_header_footer()
  pack-objects: improve returned information from write_one()
2008-09-02 17:46:48 -07:00
David Soria Parra 85e7283069 cast pid_t's to uintmax_t to improve portability
Some systems (like e.g. OpenSolaris) define pid_t as long,
therefore all our sprintf that use %i/%d cause a compiler warning
beacuse of the implicit long->int cast. To make sure that
we fit the limits, we display pids as PRIuMAX and cast them explicitly
to uintmax_t.

Signed-off-by: David Soria Parra <dsp@php.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-31 16:56:22 -07:00
Nicolas Pitre abeb40e5aa improve reliability of fixup_pack_header_footer()
Currently, this function has the potential to read corrupted pack data
from disk and give it a valid SHA1 checksum.  Let's add the ability to
validate SHA1 checksum of existing data along the way, including before
and after any arbitrary point in the pack.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-29 21:51:27 -07:00
Alexander Gavrilov 03db4525d3 Support gitlinks in fast-import.
Currently fast-import/export cannot be used for
repositories with submodules. This patch extends
the relevant programs to make them correctly
process gitlinks.

Links can be represented by two forms of the
Modify command:

M 160000 SHA1 some/path

which sets the link target explicitly, or

M 160000 :mark some/path

where the mark refers to a commit. The latter
form can be used by importing tools to build
all submodules simultaneously in one physical
repository, and then simply fetch them apart.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-19 11:25:51 -07:00
Stephan Beyer 1b1dd23f2d Make usage strings dash-less
When you misuse a git command, you are shown the usage string.
But this is currently shown in the dashed form.  So if you just
copy what you see, it will not work, when the dashed form
is no longer supported.

This patch makes git commands show the dash-less version.

For shell scripts that do not specify OPTIONS_SPEC, git-sh-setup.sh
generates a dash-less usage string now.

Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-13 14:12:48 -07:00
Linus Torvalds 4c81b03e30 Make pack creation always fsync() the result
This means that we can depend on packs always being stable on disk,
simplifying a lot of the object serialization worries.  And unlike loose
objects, serializing pack creation IO isn't going to be a performance
killer.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-31 14:46:57 -07:00
Junio C Hamano 9bd81e4249 Merge branch 'js/config-cb'
* js/config-cb:
  Provide git_config with a callback-data parameter

Conflicts:

	builtin-add.c
	builtin-cat-file.c
2008-05-25 14:25:02 -07:00
Miklos Vajna b30317819d git-fast-import: rename cmd_*() functions to parse_*()
There is a cmd_merge() function in fast-import that will conflict with
builtin-merge's cmd_merge() function. To keep it consistent, rename all
cmd_*() function to parse_*()

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-16 12:40:09 -07:00
Johannes Schindelin ef90d6d420 Provide git_config with a callback-data parameter
git_config() only had a function parameter, but no callback data
parameter.  This assumes that all callback functions only modify
global variables.

With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-14 12:34:44 -07:00
Eyvind Bernhardsen 198724ad4e fast-import: Allow "reset" to delete a new branch without error
Creating a branch in fast-import and then resetting it without making
any further commits to it currently causes an error message at the
end of the import.

This error is triggered by cvs2svn's git backend, which uses a
temporary fixup branch when it creates tags, because the fixup branch
is reset after each tag.

This patch prevents the error, allowing "reset" to be used to delete
temporary branches.

Signed-off-by: Eyvind Bernhardsen <eyvind-git@orakel.ntnu.no>
Acked-by: Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-16 14:24:32 -07:00
Junio C Hamano ad416ed433 Merge branch 'maint' to sync with 1.5.4.4
* maint:
  GIT 1.5.4.4
  ident.c: reword error message when the user name cannot be determined
  Fix dcommit, rebase when rewriteRoot is in use
  Really make the LF after reset in fast-import optional
2008-03-08 20:07:57 -08:00
Adeodato Simó 655e8515f2 Really make the LF after reset in fast-import optional
cmd_from() ends with a call to read_next_command(), which is needed
when using cmd_from() from commands where from is not the last element.

With reset, however, "from" is the last command, after which the flow
returns to the main loop, which calls read_next_command() again.

Because of this, always set unread_command_buf in cmd_reset_branch(),
even if cmd_from() was successful.

Add a test case for this in t9300-fast-import.sh.

Signed-off-by: Adeodato Simó <dato@net.com.org.es>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-08 10:46:10 -08:00
Jean-Luc Herren 733ee2b7a9 fast-import: exit with proper message if not a git dir
git fast-import expects to be run from an existing (possibly
empty) repository.  It was dying with a suboptimal message if that
wasn't the case.

Signed-off-by: Jean-Luc Herren <jlh@gmx.ch>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
2008-03-02 16:07:41 -08:00
Shawn O. Pearce 118805b920 Finish current packfile during fast-import crash handler
If fast-import is in the middle of crashing due to a protocol error
or something like that then it can be very useful to have the mark
table and all objects up until that point be available for a new
import to resume from.

Currently we just close the active packfile, unkeep all of our
newly created packfiles (so they can be deleted), and dump the
marks table to a temporary file.

We don't attempt to update the refs/tags that the process has in
memory as much of that data can be found in the crash report and I'm
not sure it would be the right thing to do under every type of crash.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-16 00:47:07 -08:00
Shawn O. Pearce 3b08e5b8c9 Include the fast-import marks table in crash reports
If fast-import was not run with --export-marks but we are crashing
the frontend application developer may still benefit from having
that information available to them.  We now include the marks table
as part of the crash report if --export-marks was not supplied on
the command line.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-16 00:47:07 -08:00
Shawn O. Pearce fbc63ea694 Include annotated tags in fast-import crash reports
If annotated tags were created they exist in a different namespace
within the fast-import process' internal memory tables so we did
not export them in the inactive branch table.  Now they are written
out after the branches, in the order that they were defined by the
frontend process.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-16 00:47:07 -08:00
Shawn O. Pearce e8b32e0610 fast-import: check return value from unpack_entry()
If the tree object we have asked for is deltafied in the packfile and
the delta did not apply correctly or was not able to be decompressed
from the packfile then we can get back NULL instead of the tree data.
This is (part of) the reason why read_sha1_file() can return NULL, so
we need to also handle it the same way.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-15 20:11:51 -08:00
Shawn O. Pearce 7422bac441 Document the hairy gfi_unpack_entry part of fast-import
Junio pointed out this part of fast-import wasn't very clear on
initial read, and it took some time for someone who was new to
fast-import's "dirty little tricks" to understand how this was
even working.  So a little bit of commentary in the proper place
may help future readers.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-21 01:04:12 -08:00
Shawn O. Pearce bb23fdfa6c Teach fast-import to honor pack.compression and pack.depth
We now use the configured pack.compression and pack.depth values
within fast-import, as like builtin-pack-objects fast-import is
generating a packfile for consumption by the Git tools.

We use the same behavior as builtin-pack-objects does for these
options, allowing core.compression to supply the default value
for pack.compression.

The default setting for pack.depth within fast-import is still 10
as users will generally repack fast-import generated packfiles by
`repack -f`.  A large delta depth within the fast-import packfile
can significantly slow down such a later repack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-21 01:04:10 -08:00
Jim Meyering 5a7b1b571e fast-import: Don't use a maybe-clobbered errno value
Without this change, each diagnostic could use an errno value
clobbered by the close or unlink in rollback_lock_file.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-18 13:19:37 -08:00
Shawn O. Pearce c9ced051c3 Fix random fast-import errors when compiled with NO_MMAP
fast-import was relying on the fact that on most systems mmap() and
write() are synchronized by the filesystem's buffer cache.  We were
relying on the ability to mmap() 20 bytes beyond the current end
of the file, then later fill in those bytes with a future write()
call, then read them through the previously obtained mmap() address.

This isn't always true with some implementations of NFS, but it is
especially not true with our NO_MMAP=YesPlease build time option used
on some platforms.  If fast-import was built with NO_MMAP=YesPlease
we used the malloc()+pread() emulation and the subsequent write()
call does not update the trailing 20 bytes of a previously obtained
"mmap()" (aka malloc'd) address.

Under NO_MMAP that behavior causes unpack_entry() in sha1_file.c to
be unable to read an object header (or data) that has been unlucky
enough to be written to the packfile at a location such that it
is in the trailing 20 bytes of a window previously opened on that
same packfile.

This bug has gone unnoticed for a very long time as it is highly data
dependent.  Not only does the object have to be placed at the right
position, but it also needs to be positioned behind some other object
that has been accessed due to a branch cache invalidation.  In other
words the stars had to align just right, and if you did run into
this bug you probably should also have purchased a lottery ticket.

Fortunately the workaround is a lot easier than the bug explanation.

Before we allow unpack_entry() to read data from a pack window
that has also (possibly) been modified through write() we force
all existing windows on that packfile to be closed.  By closing
the windows we ensure that any new access via the emulated mmap()
will reread the packfile, updating to the current file content.

This comes at a slight performance degredation as we cannot reuse
previously cached windows when we update the packfile.  But it
is a fairly minor difference as the window closes happen at only
two points:

 - When the packfile is finalized and its .idx is generated:

   At this stage we are getting ready to update the refs and any
   data access into the packfile is going to be random, and is
   going after only the branch tips (to ensure they are valid).
   Our existing windows (if any) are not likely to be positioned
   at useful locations to access those final tip commits so we
   probably were closing them before anyway.

 - When the branch cache missed and we need to reload:

   At this point fast-import is getting change commands for the next
   commit and it needs to go re-read a tree object it previously
   had written out to the packfile.  What windows we had (if any)
   are not likely to cover the tree in question so we probably were
   closing them before anyway.

We do try to avoid unnecessarily closing windows in the second case
by checking to see if the packfile size has increased since the
last time we called unpack_entry() on that packfile.  If the size
has not changed then we have not written additional data, and any
existing window is still vaild.  This nicely handles the cases where
fast-import is going through a branch cache reload and needs to read
many trees at once.  During such an event we are not likely to be
updating the packfile so we do not cycle the windows between reads.

With this change in place t9301-fast-export.sh (which was broken
by c3b0dec509) finally works again.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-17 22:39:20 -08:00
Brandon Casey fb54abd604 fast-import.c: don't try to commit marks file if write failed
We also move the assignment of -1 to the lock file descriptor
up, so that rollback_lock_file() can be called safely after a
possible attempt to fclose(). This matches the contents of
the 'if' statement just above testing success of fdopen().

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-17 22:11:42 -08:00
Brandon Casey 4ed7cd3ab0 Improve use of lockfile API
Remove remaining double close(2)'s.  i.e. close() before
commit_locked_index() or commit_lock_file().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-16 15:35:35 -08:00
Jim Meyering 95693d45ee bundle, fast-import: detect write failure
I noticed some unchecked writes.  This fixes them.

* bundle.c (create_bundle): Die upon write failure.
* fast-import.c (keep_pack): Die upon write or close failure.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-10 01:08:11 -08:00
Junio C Hamano 257f3020f6 Update callers of check_ref_format()
This updates send-pack and fast-import to use symbolic constants
for checking the return values from check_ref_format(), and also
futureproof the logic in lock_any_ref_for_update() to explicitly
name the case that is usually considered an error but is Ok for
this particular use.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-02 11:20:09 -08:00
David S. Miller 69ae517541 fast-import: fix unalinged allocation and access
The specialized pool allocator fast-import uses aligned objects on the
size of a pointer, which was not sufficient at least on Sparc.  Instead,
make the alignment for objects of type unitmax_t.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-14 20:39:16 -08:00
Junio C Hamano fb5fd01148 Merge branch 'maint'
* maint:
  git-clean: honor core.excludesfile
  Documentation: Fix man page breakage with DocBook XSL v1.72
  git-remote.txt: fix typo
  core-tutorial.txt: Fix argument mistake in an example.
  replace reference to git-rm with git-reset in git-commit doc
  Grammar fixes for gitattributes documentation
  Don't allow fast-import tree delta chains to exceed maximum depth
  revert/cherry-pick: allow starting from dirty work tree.
  t/t3404: fix test for a bogus todo file.

Conflicts:

	fast-import.c
2007-11-14 03:37:18 -08:00
Shawn O. Pearce 436e7a74c6 Don't allow fast-import tree delta chains to exceed maximum depth
Brian Downing noticed fast-import can produce tree depths of up
to 6,035 objects and even deeper.  Long delta chains can create
very small packfiles but cause problems during repacking as git
needs to unpack each tree to count the reachable blobs.

What's happening here is the active branch cache isn't big enough.
We're swapping out the branch and thus recycling the tree information
(struct tree_content) back into the free pool.  When we later reload
the tree we set the delta_depth to 0 but we kept the tree we just
reloaded as a delta base.

So if the tree we reloaded was already at the maximum depth we
wouldn't know it and make the new tree a delta.  Multiply the
number of times the branch cache has to swap out the tree times
max_depth (10) and you get the maximum delta depth of a tree created
by fast-import.  In Brian's case above the active branch cache had
to swap the branch out 603/604 times during this import to produce
a tree with a delta depth of 6035.

Acked-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-13 21:57:53 -08:00
Pierre Habouzit c2e6b6d0d1 fast-import.c: fix regression due to strbuf conversion
Without this strbuf_detach(), it yields a double free later, the
command is in fact stashed, and this is not a memory leak.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 15:28:09 -07:00
Shawn O. Pearce 8a37e21dab Merge branch 'maint'
* maint:
  Describe more 1.5.3.5 fixes in release notes
  Fix diffcore-break total breakage
  Fix directory scanner to correctly ignore files without d_type
  Improve receive-pack error message about funny ref creation
  fast-import: Fix argument order to die in file_change_m
  git-gui: Don't display CR within console windows
  git-gui: Handle progress bars from newer gits
  git-gui: Correctly report failures from git-write-tree
  gitk.txt: Fix markup.
  send-pack: respect '+' on wildcard refspecs
  git-gui: accept versions containing text annotations, like 1.5.3.mingw.1
  git-gui: Don't crash when starting gitk from a browser session
  git-gui: Allow gitk to be started on Cygwin with native Tcl/Tk
  git-gui: Ensure .git/info/exclude is honored in Cygwin workdirs
  git-gui: Handle starting on mapped shares under Cygwin
  git-gui: Display message box when we cannot find git in $PATH
  git-gui: Avoid using bold text in entire gui for some fonts
2007-10-21 02:11:45 -04:00
Julian Phillips 2005dbe2a4 fast-import: Fix argument order to die in file_change_m
The arguments to the "Not a blob" die call in file_change_m were
transposed, so that the command was printed as the type, and the type
as the command.  Switch them around so that the error message comes
out correctly.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-20 21:43:35 -04:00
Pierre Habouzit b315c5c081 strbuf change: be sure ->buf is never ever NULL.
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.

strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.

as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.

API changes:
  * strbuf_setlen now always works, so just make strbuf_reset a convenience
    macro.
  * strbuf_detatch takes a size_t* optional argument (meaning it can be
    NULL) to copy the buffer's len, as it was needed for this refactor to
    make the code more readable, and working like the callers.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-29 02:13:33 -07:00
Pierre Habouzit 7fb1011e61 Rework unquote_c_style to work on a strbuf.
If the gain is not obvious in the diffstat, the resulting code is more
readable, _and_ in checkout-index/update-index we now reuse the same buffer
to unquote strings instead of always freeing/mallocing.

This also is more coherent with the next patch that reworks quoting
functions.

The quoting function is also made more efficient scanning for backslashes
and treating portions of strings without a backslash at once.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
2007-09-20 23:32:18 -07:00
Pierre Habouzit c76689df6c strbuf API additions and enhancements.
Add strbuf_remove, change strbuf_insert:
  As both are special cases of strbuf_splice, implement them as such.
  gcc is able to do the math and generate almost optimal code this way.

Add strbuf_swap:
  Exchange the values of its arguments.
  Use it in fast-import.c

Also fix spacing issues in strbuf.h

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
2007-09-20 23:17:40 -07:00
Pierre Habouzit 182af8343c Use xmemdupz() in many places.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18 17:42:17 -07:00
Pierre Habouzit 0557656930 fast-import optimization:
Now that cmd_data acts on a strbuf, make last_object stashed buffer be a
strbuf as well. On new stash, don't free the last stashed buffer, rather
swap it with the one you will stash, this way, callers of store_object can
act on static strbufs, and at some point, fast-import won't allocate new
memory for objects buffers.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18 00:55:25 -07:00
Pierre Habouzit eec813cfc6 fast-import was using dbuf's, replace them with strbuf's.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18 00:55:15 -07:00
Pierre Habouzit e6c019d0b0 Drop strbuf's 'eof' marker, and make read_line a first class citizen.
read_line is now strbuf_getline, and is a first class citizen, it returns 0
when reading a line worked, EOF else.

The ->eof marker was used non-locally by fast-import.c, mimic the same
behaviour using a static int in "read_next_command", that now returns -1 on
EOF, and avoids to call strbuf_getline when it's in EOF state.

Also no longer automagically strbuf_release the buffer, it's counter
intuitive and breaks fast-import in a very subtle way.

Note: being at EOF implies that command_buf.len == 0.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18 00:55:10 -07:00
Pierre Habouzit ba3ed09728 Now that cache.h needs strbuf.h, remove useless includes.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 17:30:03 -07:00
Pierre Habouzit f1696ee398 Strbuf API extensions and fixes.
* Add strbuf_rtrim to remove trailing spaces.
  * Add strbuf_insert to insert data at a given position.
  * Off-by one fix in strbuf_addf: strbuf_avail() does not counts the final
    \0 so the overflow test for snprintf is the strict comparison. This is
    not critical as the growth mechanism chosen will always allocate _more_
    memory than asked, so the second test will not fail. It's some kind of
    miracle though.
  * Add size extension hints for strbuf_init and strbuf_read. If 0, default
    applies, else:
      + initial buffer has the given size for strbuf_init.
      + first growth checks it has at least this size rather than the
        default 8192.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-10 12:48:24 -07:00
Pierre Habouzit 4a241d79c9 fast-import: Use strbuf API, and simplify cmd_data()
This patch features the use of strbuf_detach, and prevent the programmer
to mess with allocation directly. The code is as efficent as before, just
more concise and more straightforward.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-06 23:57:44 -07:00
Pierre Habouzit b449f4cfc9 Rework strbuf API and semantics.
The gory details are explained in strbuf.h. The change of semantics this
patch enforces is that the embeded buffer has always a '\0' character after
its last byte, to always make it a C-string. The offs-by-one changes are all
related to that very change.

  A strbuf can be used to store byte arrays, or as an extended string
library. The `buf' member can be passed to any C legacy string function,
because strbuf operations always ensure there is a terminating \0 at the end
of the buffer, not accounted in the `len' field of the structure.

  A strbuf can be used to generate a string/buffer whose final size is not
really known, and then "strbuf_detach" can be used to get the built buffer,
and keep the wrapping "strbuf" structure usable for further work again.

  Other interesting feature: strbuf_grow(sb, size) ensure that there is
enough allocated space in `sb' to put `size' new octets of data in the
buffer. It helps avoiding reallocating data for nothing when the problem the
strbuf helps to solve has a known typical size.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-06 23:57:44 -07:00
Alex Riesen 4bf53833db Avoid using va_copy in fast-import: it seems to be unportable.
[sp: minor change to use fputs, thus reducing the patch size]

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-20 21:57:50 -07:00
Junio C Hamano 7e5dcea831 fast-import pull request
* skip_optional_lf() decl is old-style -- please say

	static skip_optional_lf(void)
        {
        	...
	}

* t9300 #14 fails, like this:

* expecting failure: git-fast-import <input
fatal: Branch name doesn't conform to GIT standards: .badbranchname
fast-import: dumping crash report to .git/fast_import_crash_14354
./test-lib.sh: line 143: 14354 Segmentation fault      git-fast-import <input

-- >8 --
Subject: [PATCH] fastimport: Fix re-use of va_list

The va_list is designed to be used only once. The current code
reuses va_list argument may cause segmentation fault.  Copy and
release the arguments to avoid this problem.

While we are at it, fix old-style function declaration of
skip_optional_lf().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 13:11:01 -04:00
Shawn O. Pearce 904b194151 Include recent command history in fast-import crash reports
When we crash the frontend developer (or end-user) may need to know
roughly around what part of the input stream we had a problem with
and aborted on.  Because line numbers aren't very useful in this
sort of application we instead just keep the last 100 commands in
a FIFO queue and print them as part of the crash report.

Currently one problem with this design is a commit that has
more than 100 modified files in it will flood the FIFO and any
context regarding branch/from/committer/mark/comments will be lost.
We really should save only the last few (10?) file changes for the
current commit, ensuring we have some prior higher level commands
in the FIFO when we crash on a file M/D/C/R command.

Another issue with this approach is the FIFO only includes the
commands, it does not include the commit messages.  Yet having a
commit message may be useful to help locate the relevant change in
the source material.  In practice I don't think this is going to be a
major concern as the frontend can always embed its own source change
set identifier as a comment (which will appear in the crash report)
and the commit message(s) for the most recent commits of any given
branch should be obtainable from the (packed) commit objects.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:42:41 -04:00
Shawn O. Pearce 8acb3297f3 Generate crash reports on die in fast-import
As fast-import is quite strict about its input and die()'s anytime
something goes wrong it can be difficult for a frontend developer
to troubleshoot why fast-import rejected their input, or to even
determine what input command it rejected.

This change introduces a custom handler for Git's die() routine.
When we receive a die() for any reason (fast-import or a lower level
core Git routine we called) the error is first dumped onto stderr
and then a more extensive crash report file is prepared in GIT_DIR.
Finally we exit the process with status 128, just like the stock
builtin die handler.

An internal flag is set to prevent any further die()'s that may be
invoked during the crash report generator from causing us to enter
into an infinite loop.  We shouldn't die() from our crash report
handler, but just in case someone makes a future code change we are
prepared to gaurd against small mistakes turning into huge problems
for the end-user.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:42:41 -04:00
Shawn O. Pearce ac053c0202 Allow frontends to bidirectionally communicate with fast-import
The existing checkpoint command is very useful to force fast-import
to dump the branches out to disk so that standard Git tools can
access them and the objects they refer to.  However there was not a
way to know when fast-import had finished executing the checkpoint
and it was safe to read those refs.

The progress command can be used to make fast-import output any
message of the frontend's choosing to standard out.  The frontend
can scan for these messages using select() or poll() to monitor a
pipe connected to the standard output of fast-import.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:38:36 -04:00
Shawn O. Pearce 1fdb649c6a Make trailing LF optional for all fast-import commands
For the same reasons as the prior change we want to allow frontends
to omit the trailing LF that usually delimits commands.  In some
cases these just make the input stream more verbose looking than
it needs to be, and its just simpler for the frontend developer to
get started if our parser is slightly more lenient about where an
LF is required and where it isn't.

To make this optional LF feature work we now have to buffer up to one
line of input in command_buf.  This buffering can happen if we look
at the current input command but don't recognize it at this point
in the code.  In such a case we need to "unget" the entire line,
but we cannot depend upon the stdio library to let us do ungetc()
for that many characters at once.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:38:35 -04:00
Shawn O. Pearce 2c570cde98 Make trailing LF following fast-import data commands optional
A few fast-import frontend developers have found it odd that we
require the LF following a `data` command, especially in the exact
byte count format.  Technically we don't need this LF to parse
the stream properly, but having it here does make the stream more
readable to humans.  We can easily make the LF optional by peeking
at the next byte available from the stream and pushing it back into
the buffer if its not LF.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:38:35 -04:00
Shawn O. Pearce 401d53fa35 Teach fast-import to ignore lines starting with '#'
Several frontend developers have asked that some form of stream
comments be permitted within a fast-import data stream.  This way
they can include information from their own frontend program about
where specific data was taken from in the source system, or about
a decision that their frontend may have made while creating the
fast-import data stream.

This change introduces comments in the Bourne-shell/Tcl/Perl style.
Lines starting with '#' are ignored, up to and including the LF.
Unlike the above mentioned three languages however we do not look for
and ignore leading whitespace.  This just simplifies the definition
of the comment format and the code that parses them.

To make comments work we had to stop using read_next_command() within
cmd_data() and directly invoke read_line() during the inline variant
of the function.  This is necessary to retain any lines of the
input data that might otherwise look like a comment to fast-import.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:38:35 -04:00
Shawn O. Pearce 3149007475 Use handy ALLOC_GROW macro in fast-import when possible
Instead of growing our buffer by hand during the inline variant of
cmd_data() we can save a few lines of code and just use the nifty
new ALLOC_GROW macro already available to us.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:38:34 -04:00
Shawn O. Pearce ea08a6fd19 Actually allow TAG_FIXUP branches in fast-import
Michael Haggerty <mhagger@alum.mit.edu> noticed while debugging a
Git backend for cvs2svn that fast-import was barfing when he tried
to use "TAG_FIXUP" as a branch name for temporary work needed to
cleanup the tree prior to creating an annotated tag object.

The reason we were rejecting the branch name was check_ref_format()
returns -2 when there are less than 2 '/' characters in the input
name.  TAG_FIXUP has 0 '/' characters, but is technically just as
valid of a ref as HEAD and MERGE_HEAD, so we really should permit it
(and any other similar looking name) during import.

New test cases have been added to make sure we still detect very
wrong branch names (e.g. containing [ or starting with .) and yet
still permit reasonable names (e.g. TAG_FIXUP).

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:38:34 -04:00
Alex Riesen c905e09006 Fix whitespace in "Format of STDIN stream" of fast-import
Something probably assumed that HT indentation is 4 characters.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-08-19 03:38:34 -04:00
Luiz Fernando N. Capitulino 7647b17f1d Use xmkstemp() instead of mkstemp()
xmkstemp() performs error checking and prints a standard error message when
an error occur.

Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-14 22:20:26 -07:00
Shawn O. Pearce b6f3481bb4 Teach fast-import to recursively copy files/directories
Some source material (e.g. Subversion dump files) perform directory
renames by telling us the directory was copied, then deleted in the
same revision.  This makes it difficult for a frontend to convert
such data formats to a fast-import stream, as all the frontend has
on hand is "Copy a/ to b/; Delete a/" with no details about what
files are in a/, unless the frontend also kept track of all files.

The new 'C' subcommand within a commit allows the frontend to make a
recursive copy of one path to another path within the branch, without
needing to keep track of the individual file paths.  The metadata
copy is performed in memory efficiently, but is implemented as a
copy-immediately operation, rather than copy-on-write.

With this new 'C' subcommand frontends could obviously implement an
'R' (rename) on their own as a combination of 'C' and 'D' (delete),
but since we have already offered up 'R' in the past and it is a
trivial thing to keep implemented I'm not going to deprecate it.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-07-15 01:41:23 -04:00
Shawn O. Pearce f39a946a1f Support wholesale directory renames in fast-import
Some source material (e.g. Subversion dump files) perform directory
renames without telling us exactly which files in that subdirectory
were moved.  This makes it hard for a frontend to convert such data
formats to a fast-import stream, as all the frontend has on hand
is "Rename a/ to b/" with no details about what files are in a/,
unless the frontend also kept track of all files.

The new 'R' subcommand within a commit allows the frontend to
rename either a file or an entire subdirectory, without needing to
know the object's SHA-1 or the specific files contained within it.
The rename is performed as efficiently as possible internally,
making it cheaper than a 'D'/'M' pair for a file rename.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-07-09 23:06:16 -04:00
Junio C Hamano 98ee8187e4 Merge branch 'maint'
* maint:
  Fix possible coredump with fast-import --import-marks
  Refactor fast-import branch creation from existing commit
  fast-import: Fix crash when referencing already existing objects
  fast-import: Fix uninitialized variable
  Documentation: fix git-config.xml generation
2007-05-23 22:37:23 -07:00
Shawn O. Pearce aac65ed1bc Fix possible coredump with fast-import --import-marks
When e8438420bb allowed us to reload
the marks table on subsequent runs of fast-import we really broke
things, as we set pack_id to MAX_PACK_ID for any objects we imported
into the marks table.  Creating a branch from that mark should fail
as we attempt to read the object through a non-existant packed_git
pointer.  Instead we have to use the normal Git object system to
locate the older commit, as we ourselves do not have a reference
to the packed_git it resides in.

This bug only occurred because t9300 was not complete enough.
When we added the --import-marks feature we didn't actually test
its implementation enough to verify the function worked as intended.
I have corrected that, and included the changes as part of this fix.
Prior versions of fast-import fail the new test(s); this commit
allows them to pass.

Credit for this bug find goes to Simon Hausmann <simon@lst.de> as
he recently identified a similiar bug in the tree lazy-loading path.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-05-24 00:50:19 -04:00
Shawn O. Pearce 654aaa37ab Refactor fast-import branch creation from existing commit
To resolve a corner case uncovered by Simon Hausmann I need to
reuse the logic for the SHA-1 expression version of the 'from '
command within the mark version of the 'from ' command.  This change
doesn't alter any functionality, but is merely breaking the common
code out to a function that I can reuse.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-05-24 00:11:48 -04:00
Simon Hausmann 20f546a86c fast-import: Fix crash when referencing already existing objects
Commit a5c1780a03 sets the pack_id of existing
objects to MAX_PACK_ID. When the same object is referenced later again it is
found in the local object hash. With such a pack_id fast-import should not try
to locate that object in the newly created pack(s).

Signed-off-by: Simon Hausmann <simon@lst.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-05-23 23:36:47 -04:00
Simon Hausmann b259157f3c fast-import: Fix uninitialized variable
Fix uninitialized last_object->no_free variable that is accessed in
store_object.

Signed-off-by: Simon Hausmann <simon@lst.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-05-23 23:36:47 -04:00
Sven Verdoolaege 68db31cc28 git-update-ref: add --no-deref option for overwriting/detaching ref
git-checkout is also adapted to make use of this new option
instead of the handcrafted command sequence.

Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-10 15:24:44 -07:00
Dana L. How 8b0eca7c7b Create pack-write.c for common pack writing code
Include a generalized fixup_pack_header_footer() in this new file.
Needed by git-repack --max-pack-size feature in a later patchset.

[sp: Moved close(pack_fd) to callers, to support index-pack, and
     changed name to better indicate it is for packfiles.]

Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-05-02 13:24:18 -04:00
Junio C Hamano 39231b1c32 Merge branch 'maint'
* maint:
  http.c: Fix problem with repeated calls of http_init
  Add missing reference to GIT_COMMITTER_DATE in git-commit-tree documentation
  Fix import-tars fix.
  Update .mailmap with "Michael"
  Do not barf on too long action description
  Catch empty pathnames in trees during fsck
  Don't allow empty pathnames in fast-import
  import-tars: be nice to wrong directory modes
  git-svn: Added 'find-rev' command
  git shortlog documentation: add long options and fix a typo
2007-04-29 01:52:43 -07:00
Shawn O. Pearce 475d1b333a Don't allow empty pathnames in fast-import
riddochc on #git noticed corruption caused by import-tars.  This
was fixed in the prior commit by Dscho, but fast-import was wrong
to have allowed a tree to be created with an empty string as the
filename.  No operating system allows this, and Git itself doesn't
accept this into the index.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-04-28 20:03:25 -04:00
Sami Farin 00be8dcc1a fast-import: size_t vs ssize_t
size_t is unsigned, so (n < 0) is never true.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-04-24 16:14:48 -04:00
Shawn O. Pearce a5c1780a03 Don't repack existing objects in fast-import
Some users of fast-import have been trying to use it to rewrite
commits and trees, an activity where the all of the relevant blobs
are already available from the existing packfiles.  In such a case
we don't want to repack a blob, even if the frontend application
has supplied us the raw data rather than a mark or a SHA-1 name.

I'm intentionally only checking the packfiles that existed when
fast-import started and am always ignoring all loose object files.

We ignore loose objects because fast-import tends to operate on a
very large number of objects in a very short timespan, and it is
usually creating new objects, not reusing existing ones.  In such
a situtation the majority of the objects will not be found in the
existing packfiles, nor will they be loose object files.  If the
frontend application really wants us to look at loose object files,
then they can just repack the repository before running fast-import.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-04-20 11:23:45 -04:00
Theodore Ts'o 46efd2d93c Rename warn() to warning() to fix symbol conflicts on BSD and Mac OS
This fixes a problem reported by Randal Schwartz:

>I finally tracked down all the (albeit inconsequential) errors I was getting
>on both OpenBSD and OSX.  It's the warn() function in usage.c.  There's
>warn(3) in BSD-style distros.  It'd take a "great rename" to change it, but if
>someone with better C skills than I have could do that, my linker and I would
>appreciate it.

It was annoying to me, too, when I was doing some mergetool testing on
Mac OS X, so here's a fix.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Randal L. Schwartz" <merlyn@stonehenge.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-31 01:11:11 -07:00
Nicolas Pitre 0e55181f29 make it more obvious that temporary files are temporary files
When some operations are interrupted (or "die()'d" or crashed) then the
partial object/pack/index file may remain around.  Make it more obvious
in their name that those files are temporary stuff and can be cleaned up
if no operation is in progress.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-24 22:32:39 -07:00
Shawn O. Pearce 061e35c581 Remove unnecessary casts from fast-import
Jeff King pointed out that these casts are quite unnecessary, as
the compiler should be doing them anyway, and may cause problems
in the future if the size of the argument for to_atom were to ever
be increased.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-03-12 15:48:37 -04:00
Shawn O. Pearce 7f09ac4714 Merge branch 'maint'
* maint:
  fast-import: grow tree storage more aggressively
2007-03-12 15:04:46 -04:00
Jeff King f022f85f6d fast-import: grow tree storage more aggressively
When building up a tree for a commit, fast-import
dynamically allocates memory for the tree entries. When more
space is needed, the allocated memory is increased by a
constant amount. For very large trees, this means
re-allocating and memcpy()ing the memory O(n) times.

To compound this problem, releasing the previous tree
resource does not free the memory; it is kept in a pool
for future trees. This means that each of the O(n)
allocations will consume increasing amounts of memory,
giving O(n^2) memory consumption.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-03-12 15:01:44 -04:00
Junio C Hamano f45fa2a073 Merge branch 'master' of git://repo.or.cz/git/fastimport
* 'master' of git://repo.or.cz/git/fastimport:
  Allow fast-import frontends to reload the marks table
  Use atomic updates to the fast-import mark file
  Preallocate memory earlier in fast-import
2007-03-07 23:10:05 -08:00
Shawn O. Pearce e8438420bb Allow fast-import frontends to reload the marks table
I'm giving fast-import a lesson on how to reload the marks table
using the same format it outputs with --export-marks.  This way
a frontend can reload the marks table from a prior import, making
incremental imports less painful.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-03-07 18:07:26 -05:00
Shawn O. Pearce 60b9004cdb Use atomic updates to the fast-import mark file
When we allow fast-import frontends to reload a mark file from a
prior session we want to let them use the same file as they exported
the marks to.  This makes it very simple for the frontend to save
state across incremental imports.

But we don't want to lose the old marks table if anything goes wrong
while writing our current marks table.  So instead of truncating and
overwriting the path specified to --export-marks we use the standard
lockfile code to write the current marks out to a temporary file,
then rename it over the old marks table.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-03-07 18:05:38 -05:00
Shawn O. Pearce 93e72d8d8f Preallocate memory earlier in fast-import
I'm about to teach fast-import how to reload the marks file created
by a prior session.  The general approach that I want to use is to
immediately parse the marks file when the specific argument is found
in argv, thereby allowing the caller to supply multiple marks files,
as the mark space can be sparsely populated.

To make that work out we need to allocate our object tables before
we parse the command line options.  Since none of these tables
depend on the command line options, we can easily relocate them.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-03-07 17:11:02 -05:00
Shawn O. Pearce 6777a59fcd Use off_t in pack-objects/fast-import when we mean an offset
Always use an off_t value in pack-objects anytime we are dealing
with an offset to some data within a packfile.

Also fixed a minor uintmax_t that was incorrectly defined before.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 11:06:33 -08:00
Shawn O. Pearce c4001d92be Use off_t when we really mean a file offset.
Not all platforms have declared 'unsigned long' to be a 64 bit value,
but we want to support a 64 bit packfile (or close enough anyway)
in the near future as some projects are getting large enough that
their packed size exceeds 4 GiB.

By using off_t, the POSIX type that is declared to mean an offset
within a file, we support whatever maximum file size the underlying
operating system will handle.  For most modern systems this is up
around 2^60 or higher.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 11:06:25 -08:00
Shawn O. Pearce 3a55602eec General const correctness fixes
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime.  Likewise we should always be
treating unsigned values as unsigned values, not as signed values.

Most of these are very straightforward.  The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case.  Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 10:47:10 -08:00
Shawn O. Pearce 6b4318e604 Merge branch 'maint'
* maint:
  fast-import: Fail if a non-existant commit is used for merge
  fast-import: Avoid infinite loop after reset

[sp: Minor evil merge to deal with type_names array moving
 to be private in 'master'.]
2007-03-05 12:50:29 -05:00
Shawn O. Pearce 2f6dc35d2a fast-import: Fail if a non-existant commit is used for merge
Johannes Sixt noticed during one of his own imports that fast-import
did not fail if a non-existant commit is referenced by SHA-1 value
as an argument to the 'merge' command.  This allowed the user to
unknowingly create commits that would fail in fsck, as the commit
contents would not be completely reachable.

A side effect of this bug was that a frontend process could mark
any SHA-1 object (blob, tree, tag) as a parent of a merge commit.
This should also fail in fsck, as the commit is not a valid commit.

We now use the same rule as the 'from' command.  If a commit is
referenced in the 'merge' command by hex formatted SHA-1 then the
SHA-1 must be a commit or a tag that can be peeled back to a commit,
the commit must already exist, and must be readable by the core Git
infrastructure code.  This requirement means that the commit must
have existed prior to fast-import starting, or the commit must have
been flushed out by a prior 'checkpoint' command.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-03-05 12:43:14 -05:00
Shawn O. Pearce 734c91f9e2 fast-import: Avoid infinite loop after reset
Johannes Sixt noticed that a 'reset' command applied to a branch that
is already active in the branch LRU cache can cause fast-import to
relink the same branch into the LRU cache twice.  This will cause
the LRU cache to contain a cycle, making unload_one_branch run in an
infinite loop as it tries to select the oldest branch for eviction.

I have trivially fixed the problem by adding an active bit to
each branch object; this bit indicates if the branch is already
in the LRU and allows us to avoid trying to add it a second time.
Converting the pack_id field into a bitfield makes this change take
up no additional memory.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-03-05 12:31:09 -05:00
Nicolas Pitre 21666f1aae convert object type handling from a string to a number
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value.  One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.

This is an initial step for the removal of the version using a char array
found in object reading code paths.  The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 01:34:21 -08:00
Nicolas Pitre df8436622f formalize typename(), and add its reverse type_from_string()
Sometime typename() is used, sometimes type_names[] is accessed directly.
Let's enforce typename() all the time which allows for validating the
type.

Also let's add a function to go from a name to a type and use it instead
of manual memcpy() when appropriate.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 01:34:21 -08:00
Junio C Hamano 599065a3bb prefixcmp(): fix-up mechanical conversion.
Previous step converted use of strncmp() with literal string
mechanically even when the result is only used as a boolean:

    if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo")))

This step manually cleans them up to read:

    if (!prefixcmp(arg, "foo"))

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-20 22:03:15 -08:00
Junio C Hamano cc44c7655f Mechanical conversion to use prefixcmp()
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily.  Leftover from this will be fixed in a separate step, including
idiotic conversions like

    if (!strncmp("foo", arg, 3))

  =>

    if (!(-prefixcmp(arg, "foo")))

This was done by using this script in px.perl

   #!/usr/bin/perl -i.bak -p
   if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
           s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
   }
   if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
           s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
   }

and running:

   $ git grep -l strncmp -- '*.c' | xargs perl px.perl

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-20 22:03:15 -08:00
Junio C Hamano 9360f27ff7 Merge branch 'maint'
* maint:
  Check for PRIuMAX rather than NO_C99_FORMAT in fast-import.c.
2007-02-20 22:02:15 -08:00
Jason Riedy 3efb1f343a Check for PRIuMAX rather than NO_C99_FORMAT in fast-import.c.
Thanks to Simon 'corecode' Schubert <corecode@fs.ei.tum.de> for
the clean-up.  Defining the C99 standard PRIuMAX when necessary
replaces UM_FMT and the awkward UM10_FMT.  There are no direct
C99 translations for other uses of NO_C99_FORMAT in git, alas.

Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-20 19:10:57 -08:00
Junio C Hamano 2b2b892e36 Merge branch 'maint'
* maint:
  Obey NO_C99_FORMAT in fast-import.c.
  Add a compat/strtoumax.c for Solaris 8.
  git-clone: Sync documentation to usage note.
2007-02-19 18:29:41 -08:00
Jason Riedy e326bce65c Obey NO_C99_FORMAT in fast-import.c.
Define UM_FMT and UM10_FMT and use in place of %ju and %10ju,
respectively.  Both format as unsigned long long, so this
assumes the compiler supports long long.

Signed-off-by: Jason Riedy <jason@acm.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-19 18:20:49 -08:00
Junio C Hamano 4a164d48df Merge branch 'jc/merge-base' (early part)
This contains an evil merge to fast-import, in order to
resolve in_merge_bases() update.
2007-02-13 16:54:35 -08:00
Shawn O. Pearce ea5e370aa9 fast-import: Support reusing 'from' and brown paper bag fix reset.
It was suggested on the mailing list that being able to use `from`
in any commit to reset the current branch is useful in some types of
importers, such as a darcs importer.

We originally did not permit resetting an existing branch with a
new `from` command during a `commit` command, but this restriction
was only to help debug the hacked up cvs2svn that Jon Smirl was
developing in parallel with git-fast-import.  It is probably more
of a problem to disallow it than to allow it.  So now we permit a
`from` during any `commit`.

While making the changes required to permit multiple `from`
commands on the same branch, I discovered we no longer needed the
last_commit field to be set to 0 during a reset, so that was removed.
(Reset was originally setting the field to 0 to signal cmd_from()
that it was OK to execute on the branch.)

While poking around in this section of fast-import I also realized
the `reset` command was not working as intended if the corresponding
`from` command was omitted (as allowed by the BNF grammar and the
code).  If `from` was omitted we cleared out the tree but we left
the tree SHA-1 and parent commit SHA-1 intact.  This is not what
the user intended in this case.  Instead they would be trying to
reset the branch to have no parent and to have no tree, making the
branch look new-born during the next commit.  We now clear these
SHA-1 values during `reset`, ensuring the branch looks new-born if
`from` does not get supplied.

New test cases for these were also added.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-12 12:17:31 -05:00
Shawn O. Pearce bdf1c06dc1 fast-import: Hide the pack boundary commits by default.
Most users don't need the pack boundary information that fast-import
was printing to standard output, especially if they were calling
it with --quiet.

Those users who do want this information probably want it captured
so they can go back and use it to repack the imported repository.
So dumping the boundary commits to a log file makes more sense then
printing them to standard output.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-11 19:45:56 -05:00
Johannes Schindelin 40db58b8dc fast-import: Fix compile warnings
Not on all platforms are size_t and unsigned long equivalent.
Since I do not know how portable %z is, I play safe, and just
cast the respective variables to unsigned long.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-07 09:28:23 -08:00
Shawn O. Pearce 22c9f7e4c5 Don't crash fast-import if the marks cannot be exported.
Apparently fast-import used to die a horrible death if we
were unable to open the marks file for output.  This is
slightly less than ideal, especially now that we dump
the marks as part of the `checkpoint` command.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-07 02:46:35 -05:00
Shawn O. Pearce 820b931012 Dump all refs and marks during a checkpoint in fast-import.
If the frontend asks us to checkpoint (via the explicit checkpoint
command) its probably because they are afraid the current import
will crash/fail/whatever and want to make sure they can pickup from
the last checkpoint.  To do that sort of recovery, we will need the
current tip of every branch and tag available at the next startup.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-07 02:42:44 -05:00
Shawn O. Pearce c499d76849 Teach fast-import how to sit quietly in the corner.
Often users will be running fast-import from within a larger frontend
process, and this may be a frequent periodic tool such as a future
edition of `git-svn fetch`.  We don't want to bombard users with our
large stats output if they won't be interested in it, so `--quiet`
is now an option to make gfi be more silent.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-07 02:19:31 -05:00
Shawn O. Pearce 825769a8fe Teach fast-import how to clear the internal branch content.
Some frontends may not be able to (easily) keep track of which files
are included in the branch, and which aren't.  Performing this
tracking can be tedious and error prone for the frontend to do,
especially if its foreign data source cannot supply the changed
path list on a per-commit basis.

fast-import now allows a frontend to request that a branch's tree
be wiped clean (reset to the empty tree) at the start of a commit,
allowing the frontend to feed in all paths which belong on the branch.

This is ideal for a tar-file importer frontend, for example, as
the frontend just needs to reformat the tar data stream into a gfi
data stream, which may be something a few Perl regexps can take
care of. :)

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-07 02:03:03 -05:00
Junio C Hamano 9981b6d915 S_IFLNK != 0140000
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06 16:08:30 -05:00
Shawn O. Pearce 7073e69e38 Don't do non-fastforward updates in fast-import.
If fast-import is being used to update an existing branch of
a repository, the user may not want to lose commits if another
process updates the same ref at the same time.  For example, the
user might be using fast-import to make just one or two commits
against a live branch.

We now perform a fast-forward check during the ref updating process.
If updating a branch would cause commits in that branch to be lost,
we skip over it and display the new SHA1 to standard error.

This new default behavior can be overridden with `--force`, like
git-push and git-fetch.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06 16:08:06 -05:00
Shawn O. Pearce 63e0c8b364 Support RFC 2822 date parsing in fast-import.
Since some frontends may be working with source material where
the dates are only readily available as RFC 2822 strings, it is
more friendly if fast-import exposes Git's parse_date() function
to handle the conversion.  This way the frontend doesn't need
to perform the parsing itself.

The new --date-format option to fast-import can be used by a
frontend to select which format it will supply date strings in.
The default is the standard `raw` Git format, which fast-import
has always supported.  Format rfc2822 can be used to activate the
parse_date() function instead.

Because fast-import could also be useful for creating new, current
commits, the format `now` is also supported to generate the current
system timestamp.  The implementation of `now` is a trivial call
to datestamp(), but is actually a whole whopping 3 lines so that
fast-import can verify the frontend really meant `now`.

As part of this change I have added validation of the `raw` date
format.  Prior to this change fast-import would accept anything
in a `committer` command, even if it was seriously malformed.
Now fast-import requires the '> ' near the end of the string and
verifies the timestamp is formatted properly.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06 14:58:30 -05:00
Shawn O. Pearce e7d06a4b70 Remove unnecessary null pointer checks in fast-import.
There is no need to check for a NULL pointer before invoking free(),
the runtime library automatically performs this check anyway and
does nothing if a NULL pointer is supplied.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06 12:05:51 -05:00
Shawn O. Pearce e5b1444b96 Correct minor style issue in fast-import.
Junio noticed that I was using a different style in fast-import
for returned pointers than the rest of Git.  Before merging this
code into the main git.git tree I'd like to make it consistent,
as this style variation was not intentional.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06 00:43:59 -05:00
Shawn O. Pearce 10e8d68820 Correct compiler warnings in fast-import.
Junio noticed these warnings/errors in fast-import when compiling
with `-Werror -ansi -pedantic`.  A few changes are to reduce compiler
warnings, while one (in cmd_merge) is a bug fix.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06 00:26:49 -05:00
Shawn O. Pearce 0b868e0240 Remove --branch-log from fast-import.
The --branch-log option and its associated code hasn't been used in
several months, as its not really very useful for debugging fast-import
or a frontend.  I don't plan on supporting it in this state long-term,
so I'm killing it now before it gets distributed to a wider audience.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-06 00:15:37 -05:00
Shawn O. Pearce 6c3aac1c69 Don't support shell-quoted refnames in fast-import.
The current implementation of shell-style quoted refnames and
SHA-1 expressions within fast-import contains a bad memory leak.
We leak the unquoted strings used by the `from` and `merge`
commands, maybe others.  Its also just muddling up the docs.

Since Git refnames cannot contain LF, and that is our delimiter
for the end of the refname, and we accept any other character
as-is, there is no reason for these strings to support quoting,
except to be nice to frontends.  But frontends shouldn't be
expecting to use funny refs in Git, and its just as simple to
never quote them as it is to always pass them through the same
quoting filter as pathnames.  So frontends should never quote
refs, or ref expressions.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-05 20:30:37 -05:00
Shawn O. Pearce 10831c5513 Reduce memory usage of fast-import.
Some structs are allocated rather frequently, but were using integer
types which were far larger than required to actually store their
full value range.

As packfiles are limited to 4 GiB we don't need more than 32 bits to
store the offset of an object within that packfile, an `unsigned long`
on a 64 bit system is likely a 64 bit unsigned value.  Saving 4 bytes
per object on a 64 bit system can add up fast on any sizable import.

As atom strings are strictly single components in a path name these
are probably limited to just 255 bytes by the underlying OS.  Going
to that short of a string is probably too restrictive, but certainly
`unsigned int` is far too large for their lengths.  `unsigned short`
is a reasonable limit.

Modes within a tree really only need two bytes to store their whole
value; using `unsigned int` here is vast overkill.  Saving 4 bytes
per file entry in an active branch can add up quickly on a project
with a large number of files.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-05 16:34:56 -05:00
Shawn O. Pearce 8c1f22da9f Include checkpoint command in the BNF.
This command isn't encouraged (as its slow) but it does exist and
is accepted, so it still should be covered in the BNF.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-02-05 16:05:11 -05:00
Shawn O. Pearce 76db9dec81 Merge branch 'master' into sp/gfi
git-fast-import requires use of inttypes.h, but the master branch has
added it to git-compat-util differently than git-fast-import originally
had used it.  This merge back of master to the fast-import topic is to
get (and use) inttypes.h the way master is using it.

This is a partially evil merge to remove the call to setup_ident(),
as the master branch now contains a change which makes this implicit
and therefore removed the function declaration. (commit 01754769).

Conflicts:

	git-compat-util.h
2007-01-30 11:07:24 -05:00
Shawn O. Pearce b715cfbba4 Accept 'inline' file data in fast-import commit structure.
Its very annoying to need to specify the file content ahead of a
commit and use marks to connect the individual blobs to the commit's
file modification entry, especially if the frontend can't/won't
generate the blob SHA1s itself.  Instead it would much easier to
use if we can accept the blob data at the same time as we receive
each file_change line.

Now fast-import accepts 'inline' instead of a mark idnum or blob
SHA1 within the 'M' type file_change command.  If an inline is
detected the very next line must be a 'data n' command, supplying
the file data.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18 15:17:58 -05:00
Shawn O. Pearce 3b4dce0275 Support delimited data regions in fast-import.
During testing its nice to not have to feed the length of a data
chunk to the 'data' command of fast-import.  Instead we would
prefer to be able to establish a data chunk much like shell's <<
operator and use a line delimiter to denote the end of the input.

So now if a data command is started as 'data <<EOF' we will look
for a terminator line containing only the string EOF on that line.
Once found, we stop the data command.  Everything between the two
lines is used as the data value.

The 'data <<' syntax is slower than 'data n', as we don't know how
many bytes to expect and instead must grow our buffer on the fly.
It also has the problem that the frontend must use a string which
will not appear on a line by itself in the input, and the data
region will always end in an LF.  For these reasons real import
frontends are encouraged to continue to use _only_ 'data n'.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18 13:25:37 -05:00
Shawn O. Pearce e5808826c4 Remove unnecessary options from fast-import.
The --objects command line option is rather unnecessary.  Internally
we allocate objects in 5000 unit blocks, ensuring that any sort
of malloc overhead is ammortized over the individual objects to
almost nothing.  Since most frontends don't know how many objects
they will need for a given import run (and its hard for them to
predict without just doing the run) we probably won't see anyone
using --objects.  Further since there's really no major benefit
to using the option, most frontends won't even bother supplying
it even if they could estimate the number of objects.  So I'm
removing it.

The --max-objects-per-pack option was probably a mistake to even
have added in the first place.  The packfile format is limited
to 4 GiB today; given that objects need at least 3 bytes of data
(and probably need even more) there's no way we are going to exceed
the limit of 1<<32-1 objects before we reach the file size limit.
So I'm removing it (to slightly reduce the complexity of the code)
before anyone gets any wise ideas and tries to use it.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18 12:02:37 -05:00
Shawn O. Pearce ebea9dd4f1 Use fixed-size integers when writing out the index in fast-import.
Currently the pack .idx file format uses 32-bit unsigned integers
for the fan-out table and the object offsets.  We had previously
defined these as 'unsigned int', but not every system will define
that type to be a 32 bit value.  To ensure maximum portability we
should always use 'uint32_t'.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18 11:30:17 -05:00
Shawn O. Pearce 566f44252b Always use struct pack_header for pack header in fast-import.
Previously we were using 'unsigned int' to update the hdr_entries
field of the pack header after the file had been completed and
was being hashed.  This may not be 32 bits on all platforms.
Instead we want to always uint32_t.

I'm actually cheating here by just using the pack_header like the
rest of Git and letting the struct definition declare the correct
type.  Right now that field is still 'unsigned int' (wrong) but a
pending change submitted by Simon 'corecode' Schubert changes it
to uint32_t.  After that change is merged in fast-import will do
the right thing all of the time.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-18 11:26:06 -05:00
Shawn O. Pearce 69e74e7412 Correct packfile edge output in fast-import.
Branches are only contained by a packfile if the branch actually
had its most recent commit in that packfile.  So new branches are
set to MAX_PACK_ID to ensure they don't cause their commit to list
as part of the first packfile when it closes out if the commit was
actually in existance before fast-import started.

Also corrected the type of last_commit to be umaxint_t to prevent
overflow and wraparound on very large imports.  Though that is
highly unlikely to occur as we're talking 4 billion commits, which
no real project has right now.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-17 02:42:43 -05:00
Shawn O. Pearce fd99224eec Declare no-arg functions as (void) in fast-import.
Apparently the git convention is to declare any function which
takes no arguments as taking void.  I did not do this during the
early fast-import development, but should have.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-17 01:47:25 -05:00
Shawn O. Pearce 6f64f6d9d2 Correct a few types to be unsigned in fast-import.
The length of an atom string cannot be negative.  So make it
explicit and declare it as an unsigned value.

The shift width in a mark table node also cannot be negative.
I'm also moving it to after the pointer arrays to prevent any
possible alignment problems on a 64 bit system.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-17 01:13:22 -05:00
Shawn O. Pearce 2104838bf9 Corrected BNF input documentation for fast-import.
Now that fast-import uses uintmax_t (the largest available unsigned
integer type) for marks we don't want to say its an unsigned 32
bit integer in ASCII base 10 notation.  It could be much larger,
especially on 64 bit systems, and especially if a frontend uses
a very large number of marks (1 per file revision on a very, very
large import).

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-17 00:33:18 -05:00
Shawn O. Pearce 2369ed7907 Print out the edge commits for each packfile in fast-import.
To help callers repack very large repositories into a series of
packfiles fast-import now outputs the last commits/tags it wrote to
a packfile when it prints out the packfile name.  This information
can be feed to pack-objects --revs to repack.  For the first pack
of an initial import this is pretty easy (just feed those SHA1s on
stdin) but for subsequent packs you want to feed the subsequent
pack's final SHA1s but also all prior pack's SHA1s prefixed with
the negation operator.  This way the prior pack's data does not
get included into the subsequent pack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 16:18:44 -05:00
Shawn O. Pearce a7ddc48765 Correct object_count type and stat output in fast-import.
Since object_count is limited to 'unsigned long' (really an
unsigned 32 bit integer value) by the pack file format we may as
well use exactly that type here in fast-import for that counter.
An earlier change by me incorrectly made it uintmax_t.

But since object_count is a counter for the current packfile only,
we don't want to output its value at the end.  Instead we should
sum up the individual type counters and report that total, as that
will cover all of the packfiles.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 04:55:41 -05:00
Shawn O. Pearce eec11c2484 Correct max_packsize default in fast-import.
Apparently amd64 has defined 'unsigned long' to be a 64 bit value,
which means -1 was way over the 4 GiB packfile limit.  Whoops.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 04:25:12 -05:00
Shawn O. Pearce 0fcbcae753 Remove unnecessary pack_fd global in fast-import.
Much like the pack_sha1 the pack_fd is an unnecessary global
variable, we already have the fd stored in our struct packed_git
*pack_data so that the core library functions in sha1_file.c are
able to lookup and decompress object data that we have previously
written.  Keeping an extra copy of this value in our own variable
is just a hold-over from earlier versions of fast-import and is
now completely unnecessary.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 01:20:57 -05:00
Shawn O. Pearce 1280158738 Ensure we close the packfile after creating it in fast-import.
Because we are renaming the packfile into its file destination we
need to be sure its not open when the rename is called, otherwise
some operating systems (e.g. Windows) may prevent the rename from
occurring.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 01:17:47 -05:00
Shawn O. Pearce 8455e48476 Use .keep files in fast-import during processing.
Because fast-import automatically updates all references (heads
and tags) at the end of its run the repository is corrupt unless
the objects are available in the .git/objects/pack directory prior
to the refs being modified.  The easiest way to ensure that is true
is to move the packfile and its associated index directly into the
.git/objects/pack directory as soon as we have finished output to it.

But the only safe way to do this is to create the a temporary .keep
file for that pack, so we use the same tricks that index-pack uses
when its being invoked by receive-pack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 01:15:31 -05:00
Shawn O. Pearce 09543c96bb Reuse sha1 in packed_git in fast-import.
Rather than maintaing our own packfile level sha1 variable we
can make use of the one already available in struct packed_git.
Its meant for the SHA1 of the index but it can also hold the
SHA1 of the packfile itself between final checksumming of the
packfile and creation of the index.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 00:44:48 -05:00
Shawn O. Pearce 6cf0926193 Replace redundant yread() with read_in_full() in fast-import.
Prior to git having read_in_full() fast-import used its own private
function yread to perform the header reading task.  No sense in
keeping that around now that read_in_full is a public, stable
function.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 00:35:41 -05:00
Shawn O. Pearce 0ea9f045f4 Use uintmax_t for marks in fast-import.
If a frontend wants to use a mark per file revision and per commit
and is doing a truly huge import (such as a 32 GiB SVN repository)
we may need more than 2**32 unique mark values, especially if the
frontend is unable (or unwilling) to recycle mark values.  For mark
idnums we should use the largest unsigned integer type available,
hoping that will be at least 64 bits when we are compiled as a 64
bit executable.  This way we may consume huge amounts of memory
storing our mark table, but we'll at least be able to process
the entire import without failing.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-16 00:33:36 -05:00
Shawn O. Pearce 5d6f3ef641 Corrected buffer overflow during automatic checkpoint in fast-import.
If we previously were using a delta but we needed to checkpoint the
current packfile and switch to a new packfile we need to throw away
the delta and compress the raw object by itself, as delta chains
cannot span non-thin packfiles.  Unfortunately the output buffer
in this case needs to grow, as the size of the compressed object
may be quite a bit larger than the size of the compressed delta.

I've also avoided recompressing the object if we are checkpointing
and we didn't use a delta.  In this case the output buffer is the
correct size and has already been populated with the right data,
we just need to close out the current packfile and open a new one.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15 23:40:27 -05:00
Shawn O. Pearce 9d1b1b5ed7 Print the packfile names to stdout from fast-import.
Caller scripts may want to know what packfiles the fast-import
process just wrote out for them.  This is now output to stdout,
one packfile name per line, after we checkpoint each packfile.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15 08:05:01 -05:00
Shawn O. Pearce d9ee53ce45 Implemented automatic checkpoints within fast-import.
When the number of objects or number of bytes gets close to the limit
allowed by the packfile format (or configured on the command line by
our caller) we should automatically checkpoint the current packfile
and start a new one before writing the object out.  This does however
require that we abandon the delta (if we had one) as its not valid
in a new packfile.

I also added the simple rule that if we got a delta back but the
delta itself is the same size as or larger than the uncompressed
object to ignore the delta and just store the object data.  This
should avoid some really bad behavior caused by our current delta
strategy.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15 08:00:49 -05:00
Shawn O. Pearce 2fce1f3c86 Optimize index creation on large object sets in fast-import.
When we are generating multiple packfiles at once we only need
to scan the blocks of object_entry structs which contain objects
for the current packfile.  Because the most recent blocks are at
the front of the linked list, and because all new objects going
into the current file are allocated from the front of that list,
we can stop scanning for objects as soon as we identify one which
doesn't belong to the current packfile.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15 07:12:23 -05:00
Shawn O. Pearce 3e005baf85 Don't create a final empty packfile in fast-import.
If the last packfile is going to be empty (has 0 objects) then it
shouldn't be kept after the import has terminated, as there is no
point to the packfile.  So rather than hashing it and making the
index file, just delete the packfile.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15 06:39:39 -05:00
Shawn O. Pearce 7bfe6e2613 Implemented manual packfile switching in fast-import.
To help importers which are dealing with massive amounts of data
fast-import needs to be able to close the packfile it is currently
writing to and open a new packfile for any additional data that
will be received.  A new 'checkpoint' command has been introduced
which can be used by the frontend import process to force this
to occur at any time.  This may be useful to ensure a very long
running import doesn't lose any work due to unexpected failures.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-01-15 06:35:41 -05:00