Commit graph

363 commits

Author SHA1 Message Date
Jeff King cbfd5e1cbb drop some obsolete "x = x" compiler warning hacks
In cases where the setting and access of a variable are
protected by the same conditional flag, older versions of
gcc would generate a "might be used unitialized" warning. We
silence the warning by initializing the variable to itself,
a hack that gcc recognizes.

Modern versions of gcc are smart enough to get this right,
going back to at least version 4.3.5. gcc 4.1 does get it
wrong in both cases, but is sufficiently old that we
probably don't need to care about it anymore.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21 14:06:38 -07:00
Jeff King 4db34cc134 fast-import: use pointer-to-pointer to keep list tail
This is shorter, idiomatic, and it means the compiler does
not get confused about whether our "e" pointer is valid,
letting us drop the "e = e" hack.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21 14:06:19 -07:00
Junio C Hamano 34f5130af8 Merge branch 'jc/merge-bases'
Optimise the "merge-base" computation a bit, and also update its
users that do not need the full merge-base information to call a
cheaper subset.

* jc/merge-bases:
  reduce_heads(): reimplement on top of remove_redundant()
  merge-base: "--is-ancestor A B"
  get_merge_bases_many(): walk from many tips in parallel
  in_merge_bases(): use paint_down_to_common()
  merge_bases_many(): split out the logic to paint history
  in_merge_bases(): omit unnecessary redundant common ancestor reduction
  http-push: use in_merge_bases() for fast-forward check
  receive-pack: use in_merge_bases() for fast-forward check
  in_merge_bases(): support only one "other" commit
2012-09-11 11:36:05 -07:00
Junio C Hamano a20efee9cf in_merge_bases(): support only one "other" commit
In early days of its life, I planned to make it possible to compute
"is a commit contained in all of these other commits?" with this
function, but it turned out that no caller needed it.

Just make it take two commit objects and add a comment to say what
these two functions do.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27 18:36:39 -07:00
Pete Wyckoff 06454cb9a3 fast-import: tighten parsing of datarefs
The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference.  Fast-import does not complain when garbage
appears after a mark reference in some cases.

Factor out parsing of mark references and complain if
errant characters are found.  Also be a little more careful
when parsing "inline" and SHA1s, complaining if extra
characters appear or if the form of the dataref is unrecognized.

Buggy input can cause fast-import to produce the wrong output,
silently, without error.  This makes it difficult to track
down buggy generators of fast-import streams.  An example is
seen in the last line of this commit command:

    commit refs/heads/S2
    committer Name <name@example.com> 1112912893 -0400
    data <<COMMIT
    commit message
    COMMIT
    from :1M 100644 :103 hello.c

It is missing a newline and should be:

    [...]
    from :1
    M 100644 :103 hello.c

What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^.  What the buggy
program was expecting was the contents of blob :103.  While
the resulting commit graph looked correct, the contents in
some commits were wrong.

Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-10 14:34:02 -07:00
Junio C Hamano 79efeae69d Merge branch 'jn/maint-fast-import-empty-ls' into maint
* jn/maint-fast-import-empty-ls:
  fast-import: don't allow 'ls' of path with empty components
  fast-import: leakfix for 'ls' of dirty trees
2012-03-26 12:10:25 -07:00
Jonathan Nieder 178e1deaae fast-import: don't allow 'ls' of path with empty components
As the fast-import manual explains:

	The value of <path> must be in canonical form. That is it must
	not:
	. contain an empty directory component (e.g. foo//bar is invalid),
	. end with a directory separator (e.g. foo/ is invalid),
	. start with a directory separator (e.g. /foo is invalid),

Unfortunately the "ls" command accepts these invalid syntaxes and
responds by declaring that the indicated path is missing.  This is too
subtle and causes importers to silently misbehave; better to error out
so the operator knows what's happening.

The C, R, and M commands already error out for such paths.

Reported-by: Andrew Sayers <andrew-git@pileofstuff.org>
Analysis-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-03-09 22:07:22 -06:00
Jonathan Nieder c27e559da5 fast-import: leakfix for 'ls' of dirty trees
When the chosen directory has changed since it was last written to
pack, "tree_content_get" makes a deep copy of its content to scribble
on while computing the tree name, which we forgot to free.

This leak has been present since the 'ls' command was introduced in
v1.7.5-rc0~3^2~33 (fast-import: add 'ls' command, 2010-12-02).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-03-09 22:02:44 -06:00
Thomas Rast a8ea1b7a55 fast-import: zero all of 'struct tag' to silence valgrind
When running t9300, valgrind (correctly) complains about an
uninitialized value in write_crash_report:

  ==2971== Use of uninitialised value of size 8
  ==2971==    at 0x4164F4: sha1_to_hex (hex.c:70)
  ==2971==    by 0x4073E4: die_nicely (fast-import.c:468)
  ==2971==    by 0x43284C: die (usage.c:86)
  ==2971==    by 0x40420D: main (fast-import.c:2731)
  ==2971==  Uninitialised value was created by a heap allocation
  ==2971==    at 0x4C29B3D: malloc (vg_replace_malloc.c:263)
  ==2971==    by 0x433645: xmalloc (wrapper.c:35)
  ==2971==    by 0x405DF5: pool_alloc (fast-import.c:619)
  ==2971==    by 0x407755: pool_calloc.constprop.14 (fast-import.c:634)
  ==2971==    by 0x403F33: main (fast-import.c:3324)

Fix this by zeroing all of the 'struct tag'.  We would only need to
zero out the 'sha1' field, but this way seems more future-proof.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-05 09:36:09 -08:00
Ævar Arnfjörð Bjarmason ab1900a36e Appease Sun Studio by renaming "tmpfile"
On Solaris the system headers define the "tmpfile" name, which'll
cause Git compiled with Sun Studio 12 Update 1 to whine about us
redefining the name:

    "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile     (E_PRAGMA_REDEFINE_STATIC)
    "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
    "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile   (E_PRAGMA_REDEFINE_STATIC)
    "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)

Just renaming the "tmpfile" variable to "tmp_file" in the relevant
places is the easiest way to fix this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21 10:21:04 -08:00
Junio C Hamano 2dccad3c6f Merge branch 'ab/enable-i18n'
* ab/enable-i18n:
  i18n: add infrastructure for translating Git with gettext

Conflicts:
	Makefile
2011-12-19 16:06:41 -08:00
Junio C Hamano 48b303675a Merge branch 'jc/stream-to-pack'
* jc/stream-to-pack:
  bulk-checkin: replace fast-import based implementation
  csum-file: introduce sha1file_checkpoint
  finish_tmp_packfile(): a helper function
  create_tmp_packfile(): a helper function
  write_pack_header(): a helper function

Conflicts:
	pack.h
2011-12-16 22:33:40 -08:00
Ævar Arnfjörð Bjarmason 5e9637c629 i18n: add infrastructure for translating Git with gettext
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.

This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.

This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.

The rest of the commit message goes into detail about various
sub-parts of this commit.

= Installation

Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.

= Perl

Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.

Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.

Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.

I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.

See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.

= Shell

Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.

If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.

If neither are present we'll use a dumb printf(1) fall-through
wrapper.

= About libcharset.h and langinfo.h

We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.

The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.

GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.

=Credits

This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.

[jc: squashed a small Makefile fix from Ramsay]

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-05 20:46:55 -08:00
Junio C Hamano 6c52614864 csum-file: introduce sha1file_checkpoint
It is useful to be able to rewind a check-summed file to a certain
previous state after writing data into it using sha1write() API. The
fast-import command does this after streaming a blob data to the packfile
being generated and then noticing that the same blob has already been
written, and it does this with a private code truncate_pack() that is
commented as "Yes, this is a layering violation".

Introduce two API functions, sha1file_checkpoint(), that allows the caller
to save a state of a sha1file, and then later revert it to the saved state.
Use it to reimplement truncate_pack().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-30 14:27:59 -08:00
Johan Herland 1838685780 fast-import: Fix incorrect fanout level when modifying existing notes refs
This fixes the bug uncovered by the tests added in the previous two patches.

When an existing notes ref was loaded into the fast-import machinery, the
num_notes counter associated with that ref remained == 0, even though the
true number of notes in the loaded ref was higher. This caused a fanout
level of 0 to be used, although the actual fanout of the tree could be > 0.
Manipulating the notes tree at an incorrect fanout level causes removals to
silently fail, and modifications of existing notes to instead produce an
additional note (leaving the old object in place at a different fanout level).

This patch fixes the bug by explicitly counting the number of notes in the
notes tree whenever it looks like the num_notes counter could be wrong (when
num_notes == 0). There may be false positives (i.e. triggering the counting
when the notes tree is truly empty), but in those cases, the counting should
not take long.

Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-28 16:38:46 -08:00
Junio C Hamano c13975e7fd Merge branch 'di/fast-import-empty-tag-note-fix'
* di/fast-import-empty-tag-note-fix:
  fast-import: don't allow to note on empty branch
  fast-import: don't allow to tag empty branch
2011-10-13 19:03:19 -07:00
Michael Haggerty 8d9c50105f Change check_ref_format() to take a flags argument
Change check_ref_format() to take a flags argument that indicates what
is acceptable in the reference name (analogous to "git
check-ref-format"'s "--allow-onelevel" and "--refspec-pattern").  This
is more convenient for callers and also fixes a failure in the test
suite (and likely elsewhere in the code) by enabling "onelevel" and
"refspec-pattern" to be allowed independently of each other.

Also rename check_ref_format() to check_refname_format() to make it
obvious that it deals with refnames rather than references themselves.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-05 13:45:29 -07:00
Dmitry Ivankov 0bc69881a6 fast-import: don't allow to note on empty branch
'reset' command makes fast-import start a branch from scratch. It's name
is kept in lookup table but it's sha1 is null_sha1 (special value).
'notemodify' command can be used to add a note on branch head given it's
name. lookup_branch() is used it that case and it doesn't check for
null_sha1. So fast-import writes a note for null_sha1 object instead of
giving a error.

Add a check to deny adding a note on empty branch and add a corresponding
test.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-22 13:30:59 -07:00
Dmitry Ivankov 2c9c8ee2de fast-import: don't allow to tag empty branch
'reset' command makes fast-import start a branch from scratch. It's name
is kept in lookup table but it's sha1 is null_sha1 (special value).
'tag' command can be used to tag a branch by it's name. lookup_branch()
is used it that case and it doesn't check for null_sha1. So fast-import
writes a tag for null_sha1 object instead of giving a error.

Add a check to deny tagging an empty branch and add a corresponding test.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-22 13:30:57 -07:00
Junio C Hamano 0dc691a4f3 Merge branch 'di/fast-import-tagging'
* di/fast-import-tagging:
  fast-import: allow to tag newly created objects
  fast-import: add tests for tagging blobs
2011-08-28 21:18:48 -07:00
Junio C Hamano 05d88e6f7e Merge branch 'di/fast-import-blob-tweak'
* di/fast-import-blob-tweak:
  fast-import: treat cat-blob as a delta base hint for next blob
  fast-import: count and report # of calls to diff_delta in stats
2011-08-28 21:18:47 -07:00
Junio C Hamano 45792b64c1 Merge branch 'di/fast-import-deltified-tree'
* di/fast-import-deltified-tree:
  fast-import: prevent producing bad delta
  fast-import: add a test for tree delta base corruption
2011-08-28 21:18:47 -07:00
Junio C Hamano 0b98954975 Merge branch 'di/fast-import-ident'
* di/fast-import-ident:
  fsck: improve committer/author check
  fsck: add a few committer name tests
  fast-import: check committer name more strictly
  fast-import: don't fail on omitted committer name
  fast-import: add input format tests
2011-08-28 21:18:47 -07:00
Dmitry Ivankov 6c447f633c fast-import: allow to tag newly created objects
fast-import allows to tag objects by sha1 and to query sha1 of objects
being imported. So it should allow to tag these objects, make it do so.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-23 11:25:59 -07:00
Dmitry Ivankov 2efe38e7da fast-import: add tests for tagging blobs
fast-import allows to create an annotated tag that annotates a blob,
via mark or direct sha1 specification.

For mark it works, for sha1 it tries to read the object. It tries to
do so via read_sha1_file, and then checks the size to be at least 46.

That's weird, let's just allow to (annotated) tag any object referenced
by sha1. If the object originates from our packfile, we still fail though.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-23 11:25:56 -07:00
Dmitry Ivankov a7e9c34126 fast-import: treat cat-blob as a delta base hint for next blob
Delta base for blobs is chosen as a previously saved blob. If we
treat cat-blob's blob as a delta base for the next blob, nothing
is likely to become worse.

For fast-import stream producer like svn-fe cat-blob is used like
following:
- svn-fe reads file delta in svn format
- to apply it, svn-fe asks cat-blob 'svn delta base'
- applies 'svn delta' to the response
- produces a blob command to store the result

Currently there is no way for svn-fe to give fast-import a hint on
object delta base. While what's requested in cat-blob is most of
the time a best delta base possible. Of course, it could be not a
good delta base, but we don't know any better one anyway.

So do treat cat-blob's result as a delta base for next blob. The
profit is nice: 2x to 7x reduction in pack size AND 1.2x to 3x
time speedup due to diff_delta being faster on good deltas. git gc
--aggressive can compress it even more, by 10% to 70%, utilizing
more cpu time, real time and 3 cpu cores.

Tested on 213M and 2.7G fast-import streams, resulting packs are 22M
and 113M, import time is 7s and 60s, both streams are produced by
svn-fe, sniffed and then used as raw input for fast-import.

For git-fast-export produced streams there is no change as it doesn't
use cat-blob and doesn't try to reorder blobs in some smart way to
make successive deltas small.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Acked-by: David Barr <davidbarr@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22 11:57:07 -07:00
Dmitry Ivankov 94c3b48247 fast-import: count and report # of calls to diff_delta in stats
It's an interesting number, how often do we try to deltify each type of
objects and how often do we succeed. So do add it to stats.

Success doesn't mean much gain in pack size though. As we allow delta to
be as big as (data.len - 20). And delta close to data.len gains nothing
compared to no delta at all even after zlib compression (delta is pretty
much the same as data, just with few modifications).

We should try to make less attempts that result in huge deltas as these
consume more cpu than trivial small deltas. Either by choosing a better
delta base or reducing delta size upper bound or doing less delta attempts
at all.

Currently, delta base for blobs is a waste literally. Each blob delta
base is chosen as a previously stored blob. Disabling deltas for blobs
doesn't increase pack size and reduce import time, or at least doesn't
increase time for all fast-import streams I've tried.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Acked-by: David Barr <davidbarr@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22 11:57:06 -07:00
Dmitry Ivankov 8fb3ad76b1 fast-import: prevent producing bad delta
To produce deltas for tree objects fast-import tracks two versions
of tree's entries - base and current one. Base version stands both
for a delta base of this tree, and for a entry inside a delta base
of a parent tree. So care should be taken to keep it in sync.

tree_content_set cuts away a whole subtree and replaces it with a
new one (or NULL for lazy load of a tree with known sha1). It
keeps a base sha1 for this subtree (needed for parent tree). And
here is the problem, 'subtree' tree root doesn't have the implied
base version entries.

Adjusting the subtree to include them would mean a deep rewrite of
subtree. Invalidating the subtree base version would mean recursive
invalidation of parents' base versions. So just mark this tree as
do-not-delta me. Abuse setuid bit for this purpose.

tree_content_replace is the same as tree_content_set except that is
is used to replace the root, so just clearing base sha1 here (instead
of setting the bit) is fine.

[di: log message]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-14 14:40:01 -07:00
Dmitry Ivankov 4b4963c0e1 fast-import: check committer name more strictly
The documentation declares following identity format:
(<name> SP)? LT <email> GT
where name is any string without LF and LT characters.
But fast-import just accepts any string up to first GT
instead of checking the whole format, and moreover just
writes it as is to the commit object.

git-fsck checks for [^<\n]* <[^<>\n]*> format. Note that the
space is mandatory. And the space quirk is already handled via
extending the string to the left when needed.

Modify fast-import input identity format to a slightly stricter
one - deny LF, LT and GT in both <name> and <email>. And check
for it.

This is stricter then git-fsck as fsck accepts "Name> <email>"
currently, but soon fsck check will be adjusted likewise.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 12:21:03 -07:00
Dmitry Ivankov 17fb00721b fast-import: don't fail on omitted committer name
fast-import format declares 'committer_name SP' to be optional in
'committer_name SP LT email GT'. But for a (commit) object SP is
obligatory while zero length committer_name is ok. git-fsck checks
that SP is present, so fast-import must prepend it if the name SP
part is omitted. It doesn't do so and thus for "LT email GT" ident
it writes a bad object.

Name cannot contain LT or GT, ident always comes after SP in fast-import.
So if ident starts with LT reuse the SP as if a valid 'SP LT email GT'
ident was passed.

This fixes a ident parsing bug for a well-formed fast-import input.
Though the parsing is still loose and can accept a ill-formed input.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 12:20:56 -07:00
Junio C Hamano 59d9ba869e Merge branch 'sr/transport-helper-fix'
* sr/transport-helper-fix: (21 commits)
  transport-helper: die early on encountering deleted refs
  transport-helper: implement marks location as capability
  transport-helper: Use capname for refspec capability too
  transport-helper: change import semantics
  transport-helper: update ref status after push with export
  transport-helper: use the new done feature where possible
  transport-helper: check status code of finish_command
  transport-helper: factor out push_update_refs_status
  fast-export: support done feature
  fast-import: introduce 'done' command
  git-remote-testgit: fix error handling
  git-remote-testgit: only push for non-local repositories
  remote-curl: accept empty line as terminator
  remote-helpers: export GIT_DIR variable to helpers
  git_remote_helpers: push all refs during a non-local export
  transport-helper: don't feed bogus refs to export push
  git-remote-testgit: import non-HEAD refs
  t5800: document some non-functional parts of remote helpers
  t5800: use skip_all instead of prereq
  t5800: factor out some ref tests
  ...
2011-08-01 15:00:14 -07:00
Sverre Rabbelier be56862f19 fast-import: introduce 'done' command
Add a 'done' command that causes fast-import to stop reading from the
stream and exit.

If the new --done command line flag was passed on the command line
(or a "feature done" declaration included at the start of the stream),
make the 'done' command mandatory.  So "git fast-import --done"'s
input format will be prefix-free, making errors easier to detect when
they show up as early termination at some convenient time of the
upstream of a pipe writing to fast-import.

Another possible application of the 'done' command would to be allow a
fast-import stream that is only a small part of a larger encapsulating
stream to be easily parsed, leaving the file offset after the "done\n"
so the other application can pick up from there.  This patch does not
teach fast-import to do that --- fast-import still uses buffered input
(stdio).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-19 11:17:47 -07:00
Junio C Hamano d907bf8ef3 Merge branch 'jc/index-pack'
* jc/index-pack:
  verify-pack: use index-pack --verify
  index-pack: show histogram when emulating "verify-pack -v"
  index-pack: start learning to emulate "verify-pack -v"
  index-pack: a miniscule refactor
  index-pack --verify: read anomalous offsets from v2 idx file
  write_idx_file: need_large_offset() helper function
  index-pack: --verify
  write_idx_file: introduce a struct to hold idx customization options
  index-pack: group the delta-base array entries also by type

Conflicts:
	builtin/verify-pack.c
	cache.h
	sha1_file.c
2011-07-19 09:54:51 -07:00
Junio C Hamano ef49a7a012 zlib: zlib can only process 4GB at a time
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:52:15 -07:00
Junio C Hamano 225a6f1068 zlib: wrap deflateBound() too
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:18:17 -07:00
Junio C Hamano 55bb5c9147 zlib: wrap deflate side of the API
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().

There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:10:29 -07:00
Sverre Rabbelier 4cce4ef2d5 fast-import: fix option parser for no-arg options
While refactoring the options parser in bc3c79a (fast-import: add
(non-)relative-marks feature, 2009-12-04), it was made too lenient
for options that take no argument, fix that.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-05 21:21:24 -07:00
Junio C Hamano f28d2e33c6 Merge branch 'jc/pack-objects-bigfile' into maint
* jc/pack-objects-bigfile:
  Teach core.bigfilethreashold to pack-objects
2011-05-04 14:57:38 -07:00
Junio C Hamano 15366280c2 Teach core.bigfilethreashold to pack-objects
The pack-objects command should take notice of the object file and
refrain from attempting to delta large ones, to be consistent with
the fast-import command.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-05 20:25:49 -07:00
Stephen Boyd c2e86addb8 Fix sparse warnings
Fix warnings from 'make check'.

 - These files don't include 'builtin.h' causing sparse to complain that
   cmd_* isn't declared:

   builtin/clone.c:364, builtin/fetch-pack.c:797,
   builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78,
   builtin/merge-index.c:69, builtin/merge-recursive.c:22
   builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426
   builtin/notes.c:822, builtin/pack-redundant.c:596,
   builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149,
   builtin/remote.c:1512, builtin/remote-ext.c:240,
   builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384,
   builtin/unpack-file.c:25, builtin/var.c:75

 - These files have symbols which should be marked static since they're
   only file scope:

   submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13,
   submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79,
   unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123,
   url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48

 - These files redeclare symbols to be different types:

   builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571,
   usage.c:49, usage.c:58, usage.c:63, usage.c:72

 - These files use a literal integer 0 when they really should use a NULL
   pointer:

   daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362

While we're in the area, clean up some unused #includes in builtin files
(mostly exec_cmd.h).

Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22 10:16:54 -07:00
Junio C Hamano b2f6eab402 Merge branch 'maint'
* maint:
  Prepare draft release notes to 1.7.4.2
  gitweb: highlight: replace tabs with spaces
  make_absolute_path: return the input path if it points to our buffer
  valgrind: ignore SSE-based strlen invalid reads
  diff --submodule: split into bite-sized pieces
  cherry: split off function to print output lines
  branch: split off function that writes tracking info and commit subject
  standardize brace placement in struct definitions
  compat: make gcc bswap an inline function
  enums: omit trailing comma for portability

Conflicts:
	RelNotes
2011-03-16 16:59:30 -07:00
Jonathan Nieder 9cba13ca5d standardize brace placement in struct definitions
In a struct definitions, unlike functions, the prevailing style is for
the opening brace to go on the same line as the struct name, like so:

 struct foo {
	int bar;
	char *baz;
 };

Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many
matches as 'struct [a-z_]*$'.

Linus sayeth:

 Heretic people all over the world have claimed that this inconsistency
 is ...  well ...  inconsistent, but all right-thinking people know that
 (a) K&R are _right_ and (b) K&R are right.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16 12:49:02 -07:00
Junio C Hamano 674ef90904 Merge branch 'sp/maint-fd-limit'
* sp/maint-fd-limit:
  sha1_file.c: Don't retain open fds on small packs
  mingw: add minimum getrlimit() compatibility stub
  Limit file descriptors used by packs
2011-03-15 14:22:23 -07:00
Shawn O. Pearce d131b7afea sha1_file.c: Don't retain open fds on small packs
If a pack file is small enough that its entire contents fits within
one mmap window, mmap the file and then immediately close its file
descriptor.  This reduces the number of file descriptors that are
needed to read from repositories with many tiny pack files, such
as one that has received 1000 pushes (and created 1000 small pack
files) since its last repack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-02 11:25:30 -08:00
Jonathan Nieder 6288e3e180 fast-import: make code "-Wpointer-arith" clean
The dereference() function to peel a tree-ish and find the underlying
tree expects arithmetic to (void *) to work on byte addresses.  We
should be reading the text of objects through a char * anyway.

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-28 15:25:12 -06:00
Junio C Hamano ebcfb3791a write_idx_file: introduce a struct to hold idx customization options
Remove two globals, pack_idx_default version and pack_idx_off32_limit,
and place them in a pack_idx_option structure.  Allow callers to pass
it to write_idx_file() as a parameter.

Adjust all callers to the API change.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27 23:29:03 -08:00
David Barr 8dc6a373d2 fast-import: add 'ls' command
Lazy fast-import frontend authors that want to rely on the backend to
keep track of the content of the imported trees _almost_ have what
they need in the 'cat-blob' command (v1.7.4-rc0~30^2~3, 2010-11-28).
But it is not quite enough, since

 (1) cat-blob can be used to retrieve the content of files, but
     not their mode, and

 (2) using cat-blob requires the frontend to keep track of a name
     (mark number or object id) for each blob to be retrieved

Introduce an 'ls' command to complement cat-blob and take care of the
remaining needs.  The 'ls' command finds what is at a given path
within a given tree-ish (tag, commit, or tree):

	'ls' SP <dataref> SP <path> LF

or in fast-import's active commit:

	'ls' SP <path> LF

The response is a single line sent through the cat-blob channel,
imitating ls-tree output.  So for example:

	FE> ls :1 Documentation
	gfi> 040000 tree 9e6c2b599341d28a2a375f8207507e0a2a627fe9	Documentation
	FE> ls 9e6c2b599341d28a2a375f8207507e0a2a627fe9 git-fast-import.txt
	gfi> 100644 blob 4f92954396e3f0f97e75b6838a5635b583708870	git-fast-import.txt
	FE> ls :1 RelNotes
	gfi> 120000 blob b942e49944	RelNotes
	FE> cat-blob b942e49944
	gfi> b942e49944 blob 32
	gfi> Documentation/RelNotes/1.7.4.txt

The most interesting parts of the reply are the first word, which is
a 6-digit octal mode (regular file, executable, symlink, directory,
or submodule), and the part from the second space to the tab, which is
a <dataref> that can be used in later cat-blob, ls, and filemodify (M)
commands to refer to the content (blob, tree, or commit) at that path.

If there is nothing there, the response is "missing some/path".

The intent is for this command to be used to read files from the
active commit, so a frontend can apply patches to them, and to copy
files and directories from previous revisions.

For example, proposed updates to svn-fe use this command in place of
its internal representation of the repository directory structure.
This simplifies the frontend a great deal and means support for
resuming an import in a separate fast-import run (i.e., incremental
import) is basically free.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Sverre Rabbelier <srabbelier@gmail.com>
2011-02-26 04:57:58 -06:00
Junio C Hamano fc180d98a2 Merge branch 'rr/fi-import-marks-if-exists'
* rr/fi-import-marks-if-exists:
  fast-import: Introduce --import-marks-if-exists
2011-02-09 16:41:16 -08:00
Junio C Hamano a8e4a5943a Merge branch 'maint-1.7.0' into maint
* maint-1.7.0:
  fast-import: introduce "feature notes" command
  fast-import: clarify documentation of "feature" command

Conflicts:
	Documentation/git-fast-import.txt
2011-02-09 16:40:12 -08:00
Jonathan Nieder 547e8b9205 fast-import: introduce "feature notes" command
Here is a 'feature' command for streams to use to require support for
the notemodify (N) command.

When the 'feature' facility was introduced (v1.7.0-rc0~95^2~4,
2009-12-04), the notes import feature was old news (v1.6.6-rc0~21^2~8,
2009-10-09) and it was not obvious it deserved to be a named feature.
But now that is clear, since all major non-git fast-import backends
lack support for it.

Details: on git version with this patch applied, any "feature notes"
command in the features/options section at the beginning of a stream
will be treated as a no-op.  On fast-import implementations without
the feature (and older git versions), the command instead errors out
with a message like

	This version of fast-import does not support feature notes.

So by declaring use of notes at the beginning of a stream, frontends
can avoid wasting time and other resources when the backend does not
support notes.  (This would be especially important for backends that
do not support rewinding history after a botched import.)

Improved-by: Thomas Rast <trast@student.ethz.ch>
Improved-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-09 16:06:51 -08:00
Junio C Hamano 99e63ef24e Merge branch 'maint'
* maint:
  rebase -i: clarify in-editor documentation of "exec"
  tests: sanitize more git environment variables
  fast-import: treat filemodify with empty tree as delete
  rebase: give a better error message for bogus branch
  rebase: use explicit "--" with checkout

Conflicts:
	t/t9300-fast-import.sh
2011-01-27 10:27:49 -08:00
Junio C Hamano 5ce3258122 Merge branch 'jn/fast-import-empty-tree-removal' into maint
* jn/fast-import-empty-tree-removal:
  fast-import: treat filemodify with empty tree as delete
2011-01-27 10:23:53 -08:00
Jonathan Nieder 8fe533f686 fast-import: treat filemodify with empty tree as delete
Normal git processes do not allow one to build a tree with an empty
subtree entry without trying hard at it.  This is in keeping with the
general UI philosophy: git tracks content, not empty directories.

v1.7.3-rc0~75^2 (2010-06-30) changed that by making it easy to include
an empty subtree in fast-import's active commit:

	M 040000 4b825dc642 subdir

One can trigger this by reading an empty tree (for example, the tree
corresponding to an empty root commit) and trying to move it to a
subtree.  It is better and more closely analogous to 'git read-tree
--prefix' to treat such commands as requests to remove the subtree.

Noticed-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-27 10:22:37 -08:00
Junio C Hamano 267684f0b7 Merge branch 'jn/maint-fast-import-object-reuse' into maint
* jn/maint-fast-import-object-reuse:
  fast-import: insert new object entries at start of hash bucket
2011-01-19 08:25:46 -08:00
Ramkumar Ramachandra dded4f12a4 fast-import: Introduce --import-marks-if-exists
When a frontend uses a marks file to ensure its state persists between
runs, it may represent "clean slate" when bootstrapping with "no marks
yet". In such a case, feeding the last state with --import-marks and
saving the state after the current run with --export-marks would be a
natural thing to do.

The --import-marks option however errors out when the specified marks file
doesn't exist; this makes bootstrapping a bit difficult.  The location of
the marks file becomes backend-dependent when --relative-marks is in
effect, and the frontend cannot check for the existence of the file in
such a case.

The --import-marks-if-exists option does the same thing as --import-marks
but does not flag an error if the named file does not exist yet to help
these frontends.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18 07:07:01 -08:00
Junio C Hamano 914584266c Merge branch 'jn/fast-import-blob-access'
* jn/fast-import-blob-access:
  t9300: avoid short reads from dd
  t9300: remove unnecessary use of /dev/stdin
  fast-import: Allow cat-blob requests at arbitrary points in stream
  fast-import: let importers retrieve blobs
  fast-import: clarify documentation of "feature" command
  fast-import: stricter parsing of integer options

Conflicts:
	fast-import.c
2010-12-16 12:58:38 -08:00
Junio C Hamano c835288be4 Merge branch 'jn/maint-fast-import-object-reuse'
* jn/maint-fast-import-object-reuse:
  fast-import: insert new object entries at start of hash bucket
2010-12-16 12:49:16 -08:00
Junio C Hamano f73c3e9704 Merge branch 'jn/fast-import-ondemand-checkpoint'
* jn/fast-import-ondemand-checkpoint:
  fast-import: treat SIGUSR1 as a request to access objects early
2010-12-16 12:49:11 -08:00
Junio C Hamano 5e738ae820 Merge branch 'jj/icase-directory'
* jj/icase-directory:
  Support case folding in git fast-import when core.ignorecase=true
  Support case folding for git add when core.ignorecase=true
  Add case insensitivity support when using git ls-files
  Add case insensitivity support for directories when using git status
  Case insensitivity support for .gitignore via core.ignorecase
  Add string comparison functions that respect the ignore_case variable.
  Makefile & configure: add a NO_FNMATCH_CASEFOLD flag
  Makefile & configure: add a NO_FNMATCH flag

Conflicts:
	Makefile
	config.mak.in
	configure.ac
	fast-import.c
2010-12-03 16:10:34 -08:00
Jonathan Nieder 777f80d742 fast-import: Allow cat-blob requests at arbitrary points in stream
The new rule: a "cat-blob" can be inserted wherever a comment is
allowed, which means at the start of any line except in the middle of
a "data" command.

This saves frontends from having to loop over everything they want to
commit in the next commit and cat-ing the necessary objects in
advance.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-01 13:28:04 -08:00
David Barr 85c62395b1 fast-import: let importers retrieve blobs
New objects written by fast-import are not available immediately.
Until a checkpoint has been started and finishes writing the pack
index, any new blobs will not be accessible using standard git tools.

So introduce a new way to access them: a "cat-blob" command in the
command stream requests for fast-import to print a blob to stdout or a
file descriptor specified by the argument to --cat-blob-fd.  The value
for cat-blob-fd cannot be specified in the stream because that would
be a layering violation: the decision of where to direct a stream has
to be made when fast-import is started anyway, so we might as well
make the stream format is independent of that detail.

Output uses the same format as "git cat-file --batch".

Thanks to Sverre Rabbelier and Sam Vilain for guidance in designing
the protocol.

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-01 13:27:37 -08:00
Jonathan Nieder a9ff277e58 fast-import: stricter parsing of integer options
Check the result from strtoul to avoid accepting arguments like
--depth=-1 and --active-branches=foo,bar,baz.

Requested-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-01 13:26:52 -08:00
Junio C Hamano ed8298dc34 Merge branch 'jn/fast-import-fix'
* jn/fast-import-fix:
  fast-import: do not clear notes in do_change_note_fanout()
  t9300 (fast-import): another test for the "replace root" feature
  fast-import: tighten M 040000 syntax
  fast-import: filemodify after M 040000 <tree> "" crashes
2010-11-29 17:52:32 -08:00
Jonathan Nieder dc01f59d21 fast-import: treat SIGUSR1 as a request to access objects early
It can be tedious to wait for a multi-million-revision import.
Unfortunately it is hard to spy on the import because fast-import
works by continuously streaming out objects, without updating the pack
index or refs until a checkpoint command or the end of the stream.

So allow the impatient operator to request checkpoints by sending a
signal, like so:

	killall -USR1 git-fast-import

When receiving such a signal, fast-import would schedule a checkpoint
to take place after the current top-level command (usually a "commit"
or "blob" request) finishes.

Caveats: just like ordinary checkpoint commands, such requests slow
down the import.  Switching to a new pack at a suboptimal moment is
also likely to result in a less dense initial collection of packs.
That's the price.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 15:01:24 -08:00
David Barr b7c1ce4f14 fast-import: insert new object entries at start of hash bucket
More often than not, find_object is called for recently inserted objects.
Optimise for this case by inserting new entries at the start of the chain.
This doesn't affect the cost of new inserts but reduces the cost of find
and insert for existing object entries.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 11:25:16 -08:00
Junio C Hamano d4c4369752 Sync with 1.7.3.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-21 17:16:10 -07:00
Jonathan Nieder b21241253b fast-import: do not clear notes in do_change_note_fanout()
Commit 5edde51 (fast-import: filemodify after M 040000 <tree> ""
crashes, 2010-10-17) taught fast-import to load trees from the
object db as needed when it is time to access them.

But it went too far.  In change_note_fanout(), an empty,
not-loaded tree is not meant to destroy notes, so calling
load_tree() at that point is exactly the wrong thing to do.

Kudos to Johan Herland for t9301, which caught this failure.

Reported-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-20 14:35:58 -07:00
Jonathan Nieder 3421578393 fast-import: tighten M 040000 syntax
When tree_content_set() is asked to modify the path "foo/bar/",
it first recurses like so:

	tree_content_set(root, "foo/bar/", sha1, S_IFDIR) ->
	 tree_content_set(root:foo, "bar/", ...) ->
	  tree_content_set(root:foo/bar, "", ...)

And as a side-effect of 2794ad5 (fast-import: Allow filemodify to set
the root, 2010-10-10), this last call is accepted and changes
the tree entry for root:foo/bar to refer to the specified tree.

That seems safe enough but let's reject the new syntax (we never meant
to support it) and make it harder for frontends to introduce pointless
incompatibilities with git fast-import 1.7.3.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-18 16:42:26 -07:00
Jonathan Nieder 5edde51018 fast-import: filemodify after M 040000 <tree> "" crashes
Until M 040000 <tree> "" syntax was introduced in commit 2794ad5
(fast-import: Allow filemodify to set the root, 2010-10-10), it
was impossible for the root entry to refer to an unloaded tree.
Update various functions to take that possibility into account.
Otherwise

	M 040000 <tree> ""
	M 100644 :1 "foo"

and similar commands (using D, C, or R after resetting the root
tree) segfault.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-18 16:41:27 -07:00
David Barr 2794ad5244 fast-import: Allow filemodify to set the root
v1.7.3-rc0~75^2 (Teach fast-import to import subtrees named by tree id,
2010-06-30) has a shortcoming - it doesn't allow the root to be set.
Extend this behaviour by allowing the root to be referenced as the
empty path, "".

For a command (like filter-branch --subdirectory-filter) that wants
to commit a lot of trees that already exist in the object db, writing
undeltified objects as loose files only to repack them later can
involve a significant amount of overhead.
(23% slow-down observed on Linux 2.6.35, worse on Mac OS X 10.6)

Fortunately we have fast-import (which is one of the only git commands
that will write to a pack directly) but there is not an advertised way
to tell fast-import to commit a given tree without unpacking it.

This patch changes that, by allowing

	M 040000 <tree id> ""

as a filemodify line in a commit to reset to a particular tree without
any need to parse it.  For example,

	M 040000 4b825dc642 ""

is a synonym for the deleteall command and the fast-import equivalent of

	git read-tree 4b825dc642

Signed-off-by: David Barr <david.barr@cordelta.com>
Commit-message-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Sverre Rabbelier <srabbelier@gmail.com>
Tested-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-13 15:10:31 -07:00
Štěpán Němec 62b4698e55 Use angles for placeholders consistently
Signed-off-by: Štěpán Němec <stepnem@gmail.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-08 12:29:52 -07:00
Joshua Jensen 50906e04e8 Support case folding in git fast-import when core.ignorecase=true
When core.ignorecase=true, imported file paths will be folded to match
existing directory case.

Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06 11:21:57 -07:00
Junio C Hamano 3f29dd6c23 Merge branch 'en/d-f-conflict-fix'
* en/d-f-conflict-fix:
  merge-recursive: Avoid excessive output for and reprocessing of renames
  merge-recursive: Fix multiple file rename across D/F conflict
  t6031: Add a testcase covering multiple renames across a D/F conflict
  merge-recursive: Fix typo
  Mark tests that use symlinks as needing SYMLINKS prerequisite
  t/t6035-merge-dir-to-symlink.sh: Remove TODO on passing test
  fast-import: Improve robustness when D->F changes provided in wrong order
  fast-export: Fix output order of D/F changes
  merge_recursive: Fix renames across paths below D/F conflicts
  merge-recursive: Fix D/F conflicts
  Add a rename + D/F conflict testcase
  Add additional testcases for D/F conflicts

Conflicts:
	merge-recursive.c
2010-08-31 16:23:58 -07:00
Junio C Hamano ebb561bcfc Merge branch 'jn/fast-import-subtree'
* jn/fast-import-subtree:
  Teach fast-import to import subtrees named by tree id
2010-08-18 12:14:41 -07:00
Raja R Harinath 7e7db5e452 fast-import: export correctly marks larger than 2^20-1
dump_marks_helper() has a bug when dumping marks larger than 2^20-1,
i.e., when the sparse array has more than two levels.  The bug was
that the 'base' counter was being shifted by 20 bits at level 3, and
then again by 10 bits at level 2, rather than a total shift of 20 bits
in this argument to the recursive call:

  (base + k) << m->shift

There are two ways to fix this correctly, the elegant:

  (base + k) << 10

and the one I chose due to edit distance:

  base + (k << m->shift)

Signed-off-by: Raja R Harinath <harinath@hurrynot.org>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-11 10:45:15 -07:00
Elijah Newren 253fb5f889 fast-import: Improve robustness when D->F changes provided in wrong order
When older versions of fast-export came across a directory changing to a
symlink (or regular file), it would output the changes in the form
  M 120000 :239821 dir-changing-to-symlink
  D dir-changing-to-symlink/filename1
When fast-import sees the first line, it deletes the directory named
dir-changing-to-symlink (and any files below it) and creates a symlink in
its place.  When fast-import came across the second line, it was previously
trying to remove the file and relevant leading directories in
tree_content_remove(), and as a side effect it would delete the symlink
that was just created.  This resulted in the symlink silently missing from
the resulting repository.

To improve robustness, we ignore file deletions underneath directory names
that correspond to non-directories.  This can also be viewed as a minor
optimization: since there cannot be a file and a directory with the same
name in the same directory, the file clearly can't exist so nothing needs
to be done to delete it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-09 16:16:47 -07:00
Jonathan Nieder 334fba656b Teach fast-import to import subtrees named by tree id
To simulate the svn cp command, it would be very useful to be
replace an arbitrary file in the current revision by an
arbitrary directory from a previous one.  Modify the filemodify
command to allow that:

 M 040000 <tree id> pathname

This would be most useful in combination with a facility to
print the commit ids for new revisions as they are written.

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-05 12:11:33 -07:00
Junio C Hamano 8d676d85f7 Merge branch 'gv/portable'
* gv/portable:
  test-lib: use DIFF definition from GIT-BUILD-OPTIONS
  build: propagate $DIFF to scripts
  Makefile: Tru64 portability fix
  Makefile: HP-UX 10.20 portability fixes
  Makefile: HPUX11 portability fixes
  Makefile: SunOS 5.6 portability fix
  inline declaration does not work on AIX
  Allow disabling "inline"
  Some platforms lack socklen_t type
  Make NO_{INET_NTOP,INET_PTON} configured independently
  Makefile: some platforms do not have hstrerror anywhere
  git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition
  test_cmp: do not use "diff -u" on platforms that lack one
  fixup: do not unconditionally disable "diff -u"
  tests: use "test_cmp", not "diff", when verifying the result
  Do not use "diff" found on PATH while building and installing
  enums: omit trailing comma for portability
  Makefile: -lpthread may still be necessary when libc has only pthread stubs
  Rewrite dynamic structure initializations to runtime assignment
  Makefile: pass CPPFLAGS through to fllow customization

Conflicts:
	Makefile
	wt-status.h
2010-06-21 06:02:44 -07:00
Gary V. Vaughan 4b05548fc0 enums: omit trailing comma for portability
Without this patch at least IBM VisualAge C 5.0 (I have 5.0.2) on AIX
5.1 fails to compile git.

enum style is inconsistent already, with some enums declared on one
line, some over 3 lines with the enum values all on the middle line,
sometimes with 1 enum value per line... and independently of that the
trailing comma is sometimes present and other times absent, often
mixing with/without trailing comma styles in a single file, and
sometimes in consecutive enum declarations.

Clearly, omitting the comma is the more portable style, and this patch
changes all enum declarations to use the portable omitted dangling
comma style consistently.

Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-31 16:59:27 -07:00
Sverre Rabbelier 580d5f83e7 fast-import: always create marks_file directories
CC: "Shawn O. Pearce" <spearce@spearce.org>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-31 09:37:26 -07:00
Michael Lukashov 1b22b6c897 refactor duplicated encode_header in pack-objects and fast-import
The following function is duplicated:

  encode_header

Move this function to sha1_file.c and rename it 'encode_in_pack_object_header',
as suggested by Junio C Hamano

Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 15:30:20 -08:00
Nicolas Pitre b500d5e11e fast-import: use the diff_delta() max_delta_size argument
This let diff_delta() abort early if it is going to bust the given
size limit.  Also, only objects larger than 20 bytes are considered
as objects smaller than that are most certainly going to produce
larger deltas than the original object due to the additional headers.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:44 -08:00
Nicolas Pitre 8c2ca8dd8a fast-import: honor pack.indexversion and pack.packsizelimit config vars
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:44 -08:00
Nicolas Pitre 89e0a3a131 fast-import: make default pack size unlimited
Now that fast-import is creating packs with index version 2, there is
no point limiting the pack size by default.  A pack split will still
happen if off_t is not sufficiently large to hold large offsets.

While updating the doc, let's remove the "packfiles fit on CDs"
suggestion.  Pack files created by fast-import are still suboptimal and
a 'git repack -a -f -d' or even 'git gc --aggressive' would be a pretty
good idea before considering storage on CDs.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:43 -08:00
Nicolas Pitre 427cb22c40 fast-import: use write_idx_file() instead of custom code
This allows for the creation of pack index version 2 with its object
CRC and the possibility for a pack to be larger than 4 GB.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:43 -08:00
Nicolas Pitre 212818160d fast-import: use sha1write() for pack data
This is in preparation for using write_idx_file().  Also, by using
sha1write() we get some buffering to reduces the number of write
syscalls, and the written data is SHA1 summed which allows for the extra
data integrity validation check performed in fixup_pack_header_footer()
(details on this in commit abeb40e5aa).

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:42 -08:00
Nicolas Pitre 3fc366bdbb fast-import: start using struct pack_idx_entry
This is in preparation for using write_idx_file().

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-17 11:08:42 -08:00
Junio C Hamano 4d0cc22437 fast-import: count --max-pack-size in bytes
Similar in spirit to 07cf0f2 (make --max-pack-size argument to 'git
pack-object' count in bytes, 2010-02-03) which made the option by the same
name to pack-objects, this counts the pack size limit in bytes.

In order not to cause havoc with people used to the previous megabyte
scale an integer smaller than 8192 is interpreted in megabytes but the
user gets a warning.  Also a minimum size of 1 MiB is enforced to avoid an
explosion of pack files.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
2010-02-04 15:12:17 -08:00
Junio C Hamano 76ea93ccb5 fast-import.c: Fix big-file-threshold parsing bug
Manual merge made at 844ad3d (Merge branch 'sp/maint-fast-import-large-blob'
into sp/fast-import-large-blob, 2010-02-01) did not correctly reflect the change
of unit in which this variable's value is counted from its previous version.

Now it counts in bytes, not in megabytes.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
2010-02-04 09:09:50 -08:00
Junio C Hamano 844ad3d9a0 Merge branch 'sp/maint-fast-import-large-blob' into sp/fast-import-large-blob
* sp/maint-fast-import-large-blob:
  fast-import: Stream very large blobs directly to pack
  bash: don't offer remote transport helpers as subcommands

Conflicts:
	fast-import.c
2010-02-01 12:42:00 -08:00
Shawn O. Pearce 5eef828bc0 fast-import: Stream very large blobs directly to pack
If a blob is larger than the configured big-file-threshold, instead
of reading it into a single buffer obtained from malloc, stream it
onto the end of the current pack file.  Streaming the larger objects
into the pack avoids the 4+ GiB memory footprint that occurs when
fast-import is processing 2+ GiB blobs.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-01 12:09:47 -08:00
Junio C Hamano 5fc9df08b5 Merge branch 'jh/notes' (early part)
* 'jh/notes' (early part):
  Add more testcases to test fast-import of notes
  Rename t9301 to t9350, to make room for more fast-import tests
  fast-import: Proper notes tree manipulation
2010-01-20 20:28:49 -08:00
Junio C Hamano e33fd3c326 Merge branch 'maint'
* maint:
  Update draft release notes to 1.6.6.1
  grep: NUL terminate input from a file
  fast-import: tag may point to any object type
2010-01-18 18:16:19 -08:00
Junio C Hamano 6304c4068e Merge branch 'dp/maint-1.6.5-fast-import-non-commit-tag' into maint
* dp/maint-1.6.5-fast-import-non-commit-tag:
  fast-import: tag may point to any object type
2010-01-18 18:15:12 -08:00
Junio C Hamano fa232d457e Merge branch 'sr/gfi-options'
* sr/gfi-options:
  fast-import: add (non-)relative-marks feature
  fast-import: allow for multiple --import-marks= arguments
  fast-import: test the new option command
  fast-import: add option command
  fast-import: add feature command
  fast-import: put marks reading in its own function
  fast-import: put option parsing code in separate functions
2010-01-17 15:58:11 -08:00
Dmitry Potapov 8db751a8f9 fast-import: tag may point to any object type
If you tried to export the official git repository, and then to import it
back then git-fast-import would die complaining that "Mark :1 not a commit".

Accordingly to a generated crash file, Mark 1 is not a commit but a blob,
which is pointed by junio-gpg-pub tag. Because git-tag allows to create such
tags, git-fast-import should import them.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-14 21:51:06 -08:00
Shawn O. Pearce 74fbd1182a fast-import: Document author/committer/tagger name is optional
The fast-import parser does not validate that the author, committer
or tagger name component contains both a name and an email address.
Therefore the name component has always been optional.  Correct the
documentation to match the implementation.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-31 14:46:08 -08:00
Johan Herland 2a113aee9b fast-import: Proper notes tree manipulation
This patch teaches 'git fast-import' to automatically organize note objects
in a fast-import stream into an appropriate fanout structure. The notes API
in notes.h is NOT used to accomplish this, because trying to keep the
fast-import and notes data structures in sync would yield a significantly
larger patch with higher complexity.

Note objects are added with the 'N' command, and accounted for with a
per-branch counter, which is used to trigger fanout restructuring when
needed. Note that when restructuring the branch tree, _any_ entry whose
path consists of 40 hex chars (not including directory separators) will
be recognized as a note object. It is therefore not advisable to
manipulate note entries with M/D/R/C commands.

Since note objects are stored in the same tree structure as other objects,
the unloading and reloading of a fast-import branches handle note objects
transparently.

This patch has been improved by the following contributions:
- Shawn O. Pearce: Several style- and logic-related improvements

Cc: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-07 13:52:52 -08:00
Sverre Rabbelier bc3c79aefc fast-import: add (non-)relative-marks feature
After specifying 'feature relative-marks' the paths specified with
'feature import-marks' and 'feature export-marks' are relative to an
internal directory in the current repository.

In git-fast-import this means that the paths are relative to the
'.git/info/fast-import' directory. However, other importers may use a
different location.

Add 'feature non-relative-marks' to disable this behavior, this way
it is possible to, for example, specify the import-marks location as
relative, and the export-marks location as non-relative.

Also add tests to verify this behavior.

Cc: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-05 12:43:24 -08:00
Sverre Rabbelier 081751c882 fast-import: allow for multiple --import-marks= arguments
The --import-marks= option may be specified multiple times on the
commandline and should result in all marks being read in. Only one
import-marks feature may be specified in the stream, which is
overriden by any --import-marks= commandline options.

If one wishes to specify import-marks files in addition to the one
specified in the stream, it is easy to repeat the stream option as a
--import-marks= commandline option.

Also verify this behavior with tests.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-04 16:10:59 -08:00