Commit graph

428 commits

Author SHA1 Message Date
Jeff King
34fa79a6cd prefer memcpy to strcpy
When we already know the length of a string (e.g., because
we just malloc'd to fit it), it's nicer to use memcpy than
strcpy, as it makes it more obvious that we are not going to
overflow the buffer (because the size we pass matches the
size in the allocation).

This also eliminates calls to strcpy, which make auditing
the code base harder.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Jeff King
c7ab0ba340 avoid sprintf and strcpy with flex arrays
When we are allocating a struct with a FLEX_ARRAY member, we
generally compute the size of the array and then sprintf or
strcpy into it. Normally we could improve a dynamic allocation
like this by using xstrfmt, but it doesn't work here; we
have to account for the size of the rest of the struct.

But we can improve things a bit by storing the length that
we use for the allocation, and then feeding it to xsnprintf
or memcpy, which makes it more obvious that we are not
writing more than the allocated number of bytes.

It would be nice if we had some kind of helper for
allocating generic flex arrays, but it doesn't work that
well:

 - the call signature is a little bit unwieldy:

      d = flex_struct(sizeof(*d), offsetof(d, path), fmt, ...);

   You need offsetof here instead of just writing to the
   end of the base size, because we don't know how the
   struct is packed (partially this is because FLEX_ARRAY
   might not be zero, though we can account for that; but
   the size of the struct may actually be rounded up for
   alignment, and we can't know that).

 - some sites do clever things, like over-allocating because
   they know they will write larger things into the buffer
   later (e.g., struct packed_git here).

So we're better off to just write out each allocation (or
add type-specific helpers, though many of these are one-off
allocations anyway).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Jeff King
ef1286d3c0 use xsnprintf for generating git object headers
We generally use 32-byte buffers to format git's "type size"
header fields. These should not generally overflow unless
you can produce some truly gigantic objects (and our types
come from our internal array of constant strings). But it is
a good idea to use xsnprintf to make sure this is the case.

Note that we slightly modify the interface to
write_sha1_file_prepare, which nows uses "hdrlen" as an "in"
parameter as well as an "out" (on the way in it stores the
allocated size of the header, and on the way out it returns
the ultimate size of the header).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King
547ed71636 fast-import: switch crash-report date to iso8601
When fast-import emits a crash report, it does so in the
user's local timezone. But because we omit the timezone
completely for DATE_LOCAL, a reader of the report does not
immediately know which time zone was used. Let's switch this
to ISO8601 instead, which includes the time zone.

This does mean we will show the time in UTC, but that's not
a big deal. A crash report like this will either be looked
at immediately (in which case nobody even looks at the
timestamp), or it will be passed along to a developer to
debug, in which case the original timezone is less likely to
be of interest.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-03 15:36:34 -07:00
Junio C Hamano
8c9155e031 Merge branch 'jk/git-path'
git_path() and mkpath() are handy helper functions but it is easy
to misuse, as the callers need to be careful to keep the number of
active results below 4.  Their uses have been reduced.

* jk/git-path:
  memoize common git-path "constant" files
  get_repo_path: refactor path-allocation
  find_hook: keep our own static buffer
  refs.c: remove_empty_directories can take a strbuf
  refs.c: avoid git_path assignment in lock_ref_sha1_basic
  refs.c: avoid repeated git_path calls in rename_tmp_log
  refs.c: simplify strbufs in reflog setup and writing
  path.c: drop git_path_submodule
  refs.c: remove extra git_path calls from read_loose_refs
  remote.c: drop extraneous local variable from migrate_file
  prefer mkpathdup to mkpath in assignments
  prefer git_pathdup to git_path in some possibly-dangerous cases
  add_to_alternates_file: don't add duplicate entries
  t5700: modernize style
  cache.h: complete set of git_path_submodule helpers
  cache.h: clarify documentation for git_path, et al
2015-08-19 14:48:56 -07:00
Junio C Hamano
51a22ce147 Merge branch 'jc/finalize-temp-file'
Long overdue micro clean-up.

* jc/finalize-temp-file:
  sha1_file.c: rename move_temp_to_file() to finalize_object_file()
2015-08-19 14:48:55 -07:00
Jeff King
fcd12db6af prefer git_pathdup to git_path in some possibly-dangerous cases
Because git_path uses a static buffer that is shared with
calls to git_path, mkpath, etc, it can be dangerous to
assign the result to a variable or pass it to a non-trivial
function. The value may change unexpectedly due to other
calls.

None of the cases changed here has a known bug, but they're
worth converting away from git_path because:

  1. It's easy to use git_pathdup in these cases.

  2. They use constructs (like assignment) that make it
     hard to tell whether they're safe or not.

The extra malloc overhead should be trivial, as an
allocation should be an order of magnitude cheaper than a
system call (which we are clearly about to make, since we
are constructing a filename). The real cost is that we must
remember to free the result.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 15:37:12 -07:00
Junio C Hamano
cb5add5868 sha1_file.c: rename move_temp_to_file() to finalize_object_file()
Since 5a688fe4 ("core.sharedrepository = 0mode" should set, not
loosen, 2009-03-25), we kept reminding ourselves:

    NEEDSWORK: this should be renamed to finalize_temp_file() as
    "moving" is only a part of what it does, when no patch between
    master to pu changes the call sites of this function.

without doing anything about it.  Let's do so.

The purpose of this function was not to move but to finalize.  The
detail of the primarily implementation of finalizing was to link the
temporary file to its final name and then to unlink, which wasn't
even "moving".  The alternative implementation did "move" by calling
rename(2), which is a fun tangent.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 11:10:37 -07:00
Junio C Hamano
d939af12bd Merge branch 'jk/date-mode-format'
Teach "git log" and friends a new "--date=format:..." option to
format timestamps using system's strftime(3).

* jk/date-mode-format:
  strbuf: make strbuf_addftime more robust
  introduce "format" date-mode
  convert "enum date_mode" into a struct
  show-branch: use DATE_RELATIVE instead of magic number
2015-08-03 11:01:27 -07:00
Junio C Hamano
3ecca8879a Merge branch 'mh/fast-import-optimize-current-from'
Often a fast-import stream builds a new commit on top of the
previous commit it built, and it often unconditionally emits a
"from" command to specify the first parent, which can be omitted in
such a case.  This caused fast-import to forget the tree of the
previous commit and then re-read it from scratch, which was
inefficient.  Optimize for this common case.

* mh/fast-import-optimize-current-from:
  fast-import: do less work when given "from" matches current branch head
2015-08-03 11:01:24 -07:00
Junio C Hamano
c0d503433f Merge branch 'mh/fast-import-get-mark'
"git fast-import" learned to respond to the get-mark command via
its cat-blob-fd interface.

* mh/fast-import-get-mark:
  fast-import: add a get-mark command
2015-08-03 11:01:23 -07:00
Mike Hommey
0df3245721 fast-import: do less work when given "from" matches current branch head
When building a fast-import stream, it's easy to forget the fact
that for non-merge commits happening on top of the current branch
head, there is no need for a "from" command. That is corroborated by
the fact that at least git-p4, hg-fast-export and felipec's
git-remote-hg all unconditionally use a "from" command.

Unfortunately, giving a "from" command always resets the branch
tree, forcing it to be re-read, and in many cases, the pack is also
closed and reopened through gfi_unpack_entry.  Both are unnecessary
overhead, and the latter is particularly slow at least on OSX.

Avoid resetting the tree when it's unmodified, and avoid calling
gfi_unpack_entry when the given mark points to the same commit as
the current branch head.

Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-13 09:36:05 -07:00
Michael Haggerty
28c7b1f7b7 fast-import: add a get-mark command
It is sometimes useful for importers to be able to read the SHA-1
corresponding to a mark that they have created via fast-import. For
example, they might want to embed the SHA-1 into the commit message of
a later commit. Or it might be useful for internal bookkeeping uses,
or for logging.

Add a "get-mark" command to "git fast-import" that allows the importer
to ask for the value of a mark that has been created earlier.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-01 09:29:59 -07:00
Jeff King
a5481a6c94 convert "enum date_mode" into a struct
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.

Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor.  However, the tricky case is where we use the
enum labels as constants, like:

  show_date(t, tz, DATE_NORMAL);

Ideally we could say:

  show_date(t, tz, &{ DATE_NORMAL });

but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:

  1. Manually add a "struct date_mode d = { DATE_NORMAL }"
     definition to each caller, and pass "&d". This makes
     the callers uglier, because they sometimes do not even
     have their own scope (e.g., they are inside a switch
     statement).

  2. Provide a pre-made global "date_normal" struct that can
     be passed by address. We'd also need "date_rfc2822",
     "date_iso8601", and so forth. But at least the ugliness
     is defined in one place.

  3. Provide a wrapper that generates the correct struct on
     the fly. The big downside is that we end up pointing to
     a single global, which makes our wrapper non-reentrant.
     But show_date is already not reentrant, so it does not
     matter.

This patch implements 3, along with a minor macro to keep
the size of the callers sane.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-29 11:39:07 -07:00
Michael Haggerty
a1c9eb918b update_ref(): don't read old reference value before delete
If we are deleting the reference, then we don't need to read the
reference's old value. It doesn't provide any race safety, because the
value read just before the delete is no "better" than the value that
would be read under lock during the delete. And even if the reference
previously didn't exist, we can call delete_ref() on it if we don't
provide an old_sha1 value.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 13:17:13 -07:00
Junio C Hamano
68a2e6a2c8 Merge branch 'nd/multiple-work-trees'
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.

* nd/multiple-work-trees: (41 commits)
  prune --worktrees: fix expire vs worktree existence condition
  t1501: fix test with split index
  t2026: fix broken &&-chain
  t2026 needs procondition SANITY
  git-checkout.txt: a note about multiple checkout support for submodules
  checkout: add --ignore-other-wortrees
  checkout: pass whole struct to parse_branchname_arg instead of individual flags
  git-common-dir: make "modules/" per-working-directory directory
  checkout: do not fail if target is an empty directory
  t2025: add a test to make sure grafts is working from a linked checkout
  checkout: don't require a work tree when checking out into a new one
  git_path(): keep "info/sparse-checkout" per work-tree
  count-objects: report unused files in $GIT_DIR/worktrees/...
  gc: support prune --worktrees
  gc: factor out gc.pruneexpire parsing code
  gc: style change -- no SP before closing parenthesis
  checkout: clean up half-prepared directories in --to mode
  checkout: reject if the branch is already checked out elsewhere
  prune: strategies for linked checkouts
  checkout: support checking out into a new working directory
  ...
2015-05-11 14:23:39 -07:00
Junio C Hamano
6902c4da58 Merge branch 'rs/deflate-init-cleanup'
Code simplification.

* rs/deflate-init-cleanup:
  zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-17 16:01:26 -07:00
René Scharfe
9a6f1287fb zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-05 15:46:03 -08:00
Junio C Hamano
fd9de868c3 Merge branch 'mh/refs-have-new'
Simplify the ref transaction API around how "the ref should be
pointing at this object" is specified.

* mh/refs-have-new:
  refs.h: remove duplication in function docstrings
  update_ref(): improve documentation
  ref_transaction_verify(): new function to check a reference's value
  ref_transaction_delete(): check that old_sha1 is not null_sha1
  ref_transaction_create(): check that new_sha1 is valid
  commit: avoid race when creating orphan commits
  commit: add tests of commit races
  ref_transaction_delete(): remove "have_old" parameter
  ref_transaction_update(): remove "have_old" parameter
  struct ref_update: move "have_old" into "flags"
  refs.c: change some "flags" to "unsigned int"
  refs: remove the gap in the REF_* constant values
  refs: move REF_DELETING to refs.c
2015-03-05 12:45:39 -08:00
Junio C Hamano
1585dfeda7 Merge branch 'jk/fast-import-die-nicely-fix'
"git fast-import" used to crash when it could not close and
conclude the resulting packfile cleanly.

* jk/fast-import-die-nicely-fix:
  fast-import: avoid running end_packfile recursively
2015-02-25 15:40:15 -08:00
Michael Haggerty
1d147bdff0 ref_transaction_update(): remove "have_old" parameter
Instead, verify the reference's old value if and only if old_sha1 is
non-NULL.

ref_transaction_delete() will get the same treatment in a moment.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-17 11:22:50 -08:00
Jeff King
5e915f3085 fast-import: avoid running end_packfile recursively
When an import has finished, we run end_packfile() to
finalize the data and move the packfile into place. If this
process fails, we call die() and end up in our die_nicely()
handler.  Which unfortunately includes running end_packfile
to save any progress we made. We enter the function again,
and start operating on the pack_data struct while it is in
an inconsistent state, leading to a segfault.

One way to trigger this is to simply start two identical
fast-imports at the same time. They will both create the
same packfiles, which will then try to create identically
named ".keep" files. One will win the race, and the other
will die(), and end up with the segfault.

Since 3c078b9, we already reset the pack_data pointer to
NULL at the end of end_packfile. That covers the case of us
calling die() right after end_packfile, before we have
reinitialized the pack_data pointer. This new problem is
quite similar, except that we are worried about calling
die() _during_ end_packfile, not right after. Ideally we
would simply set pack_data to NULL as soon as we enter the
function, and operate on a copy of the pointer.

Unfortunately, it is not so easy. pack_data is a global, and
end_packfile calls into other functions which operate on the
global directly. We would have to teach each of these to
take an argument, and there is no guarantee that we would
catch all of the spots.

Instead, we can simply use a static flag to avoid
recursively entering the function. This is a little less
elegant, but it's short and fool-proof.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-10 10:35:32 -08:00
Nguyễn Thái Ngọc Duy
aaa26805ad fast-import: use git_path() for accessing .git dir instead of get_git_dir()
This allows git_path() to redirect info/fast-import to another place
if needed

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:12 -08:00
Nguyễn Thái Ngọc Duy
dcf692625a path.c: make get_pathname() call sites return const char *
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.

After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.

Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.

This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:10 -08:00
Ronnie Sahlberg
db7516ab9f refs.c: pass the ref log message to _create/delete/update instead of _commit
Change the ref transaction API so that we pass the reflog message to the
create/delete/update functions instead of to ref_transaction_commit.
This allows different reflog messages for each ref update in a multi-ref
transaction.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:22 -07:00
Michael Haggerty
f70f0565b3 dump_marks(): reimplement using fdopen_lock_file()
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 14:20:22 -07:00
Michael Haggerty
697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Michael Haggerty
32c3ec258e dump_marks(): remove a redundant call to rollback_lock_file()
When commit_lock_file() fails, it now always calls
rollback_lock_file() internally, so there is no need to call that
function here.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:48:59 -07:00
Junio C Hamano
5b830a8588 Merge branch 'mb/fast-import-delete-root' into maint
An attempt to remove the entire tree in the "git fast-import" input
stream caused it to misbehave.

* mb/fast-import-delete-root:
  fast-import: fix segfault in store_tree()
  t9300: test filedelete command
2014-09-29 22:09:48 -07:00
Junio C Hamano
1c2ea2cdc0 Merge branch 'rs/realloc-array'
Code cleanup.

* rs/realloc-array:
  use REALLOC_ARRAY for changing the allocation size of arrays
  add macro REALLOC_ARRAY
2014-09-26 14:39:45 -07:00
Junio C Hamano
04481347ec Merge branch 'jk/fast-import-fixes' into maint
* jk/fast-import-fixes:
  fast-import: fix buffer overflow in dump_tags
  fast-import: clean up pack_data pointer in end_packfile
2014-09-19 14:05:12 -07:00
Junio C Hamano
73da5a1e85 Merge branch 'mb/fast-import-delete-root'
An attempt to remove the entire tree in the "git fast-import" input
stream caused it to misbehave.

* mb/fast-import-delete-root:
  fast-import: fix segfault in store_tree()
  t9300: test filedelete command
2014-09-19 11:38:34 -07:00
Junio C Hamano
9ff700ebac Merge branch 'jk/commit-author-parsing'
Code clean-up.

* jk/commit-author-parsing:
  determine_author_info(): copy getenv output
  determine_author_info(): reuse parsing functions
  date: use strbufs in date-formatting functions
  record_author_date(): use find_commit_header()
  record_author_date(): fix memory leak on malformed commit
  commit: provide a function to find a header in a buffer
2014-09-19 11:38:33 -07:00
René Scharfe
2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Junio C Hamano
b6a1261751 Merge branch 'jk/fast-import-fixes'
With sufficiently long refnames, fast-import could have overflown
an on-stack buffer.

* jk/fast-import-fixes:
  fast-import: fix buffer overflow in dump_tags
  fast-import: clean up pack_data pointer in end_packfile
2014-09-11 10:33:34 -07:00
Junio C Hamano
01d678a226 Merge branch 'rs/ref-transaction-1'
The second batch of the transactional ref update series.

* rs/ref-transaction-1: (22 commits)
  update-ref --stdin: pass transaction around explicitly
  update-ref --stdin: narrow scope of err strbuf
  refs.c: make delete_ref use a transaction
  refs.c: make prune_ref use a transaction to delete the ref
  refs.c: remove lock_ref_sha1
  refs.c: remove the update_ref_write function
  refs.c: remove the update_ref_lock function
  refs.c: make lock_ref_sha1 static
  walker.c: use ref transaction for ref updates
  fast-import.c: use a ref transaction when dumping tags
  receive-pack.c: use a reference transaction for updating the refs
  refs.c: change update_ref to use a transaction
  branch.c: use ref transaction for all ref updates
  fast-import.c: change update_branch to use ref transactions
  sequencer.c: use ref transactions for all ref updates
  commit.c: use ref transactions for updates
  replace.c: use the ref transaction functions for updates
  tag.c: use ref transactions when doing updates
  refs.c: add transaction.status and track OPEN/CLOSED
  refs.c: make ref_transaction_begin take an err argument
  ...
2014-09-11 10:33:31 -07:00
Ronnie Sahlberg
3f09ba7543 fast-import.c: use a ref transaction when dumping tags
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:14 -07:00
Ronnie Sahlberg
de7e86f522 fast-import.c: change update_branch to use ref transactions
Change update_branch() to use ref transactions for updates.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:12 -07:00
Maxim Bublis
2668d692eb fast-import: fix segfault in store_tree()
Branch tree is NULLified by filedelete command if we are trying
to delete root tree. Add sanity check and use load_tree() in that case.

Signed-off-by: Maxim Bublis <satori@yandex-team.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 10:31:14 -07:00
Jeff King
c33ddc2e33 date: use strbufs in date-formatting functions
Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).

Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:

  1. In test-date.c, where we don't care about efficiency.

  2. In determine_author_info, which is not performance
     critical (and where the use of a strbuf will help later
     refactoring).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 10:32:56 -07:00
Jeff King
c252785982 fast-import: fix buffer overflow in dump_tags
When creating a new annotated tag, we sprintf the refname
into a static-sized buffer. If we have an absurdly long
tagname, like:

  git init repo &&
  cd repo &&
  git commit --allow-empty -m foo &&
  git tag -m message mytag &&
  git fast-export mytag |
  perl -lpe '/^tag/ and s/mytag/"a" x 8192/e' |
  git fast-import <input

we'll overflow the buffer. We can fix it by using a strbuf.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-25 12:20:57 -07:00
Jeff King
3c078b9c86 fast-import: clean up pack_data pointer in end_packfile
We have a global pointer pack_data pointing to the current
pack we have open. Inside end_packfile we have two new
pointers, old_p and new_p. The latter points to pack_data,
and the former points to the new "installed" version of the
packfile we get when we hand the file off to the regular
sha1_file machinery. When then free old_p.

Presumably the extra old_p pointer was there so that we
could overwrite pack_data with new_p and still free old_p,
but we don't do that. We just leave pack_data pointing to
bogus memory, and don't overwrite it until we call
start_packfile again (if ever).

This can cause problems for our die routine, which calls
end_packfile to clean things up. If we die at the wrong
moment, we can end up looking at invalid memory in
pack_data left after the last end_packfile().

Instead, let's make sure we set pack_data to NULL after we
free it, and make calling endfile() again with a NULL
pack_data a noop (there is nothing to end).

We can further make things less confusing by dropping old_p
entirely, and moving new_p closer to its point of use.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-25 12:20:24 -07:00
Tanay Abhra
536900e5b2 fast-import.c: replace git_config() with git_config_get_*() family
Use `git_config_get_*()` family instead of `git_config()` to take
advantage of the config-set API which provides a cleaner control flow.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-13 12:36:02 -07:00
René Scharfe
14576df044 fast-import: use hashcmp() for SHA1 hash comparison
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-18 12:14:47 -07:00
Jeff King
e814c39c2f fast-import: refactor parsing of spaces
When we see a file change in a commit, we expect one of:

  1. A mark.

  2. An "inline" keyword.

  3. An object sha1.

The handling of spaces is inconsistent between the three
options. Option 1 calls a sub-function which checks for the
space, but doesn't parse past it. Option 2 parses the space,
then deliberately avoids moving the pointer past it. Option
3 detects the space locally but doesn't move past it.

This is confusing, because it looks like option 1 forgets to
check for the space (it's just buried). And option 2 checks
for "inline ", but only moves strlen("inline") characters
forward, which looks like a bug but isn't.

We can make this more clear by just having each branch move
past the space as it is checked (and we can replace the
doubled use of "inline" with a call to skip_prefix).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:45:19 -07:00
Jeff King
97313bef2a fast-import: use skip_prefix for parsing input
Fast-import does a lot of parsing of commands and
dispatching to sub-functions. For example, given "option
foo", we might recognize "option " using starts_with, and
then hand it off to parse_option() to do the rest.

However, we do not let parse_option know that we have parsed
the first part already. It gets the full buffer, and has to
skip past the uninteresting bits. Some functions simply add
a magic constant:

  char *option = command_buf.buf + 7;

Others use strlen:

  char *option = command_buf.buf + strlen("option ");

And others use strchr:

  char *option = strchr(command_buf.buf, ' ') + 1;

All of these are brittle and easy to get wrong (especially
given that the starts_with call and the code that assumes
the presence of the prefix are far apart). Instead, we can
use skip_prefix, and just pass each handler a pointer to its
arguments.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:45 -07:00
Jeff King
ae021d8791 use skip_prefix to avoid magic numbers
It's a common idiom to match a prefix and then skip past it
with a magic number, like:

  if (starts_with(foo, "bar"))
	  foo += 3;

This is easy to get wrong, since you have to count the
prefix string yourself, and there's no compiler check if the
string changes.  We can use skip_prefix to avoid the magic
numbers here.

Note that some of these conversions could be much shorter.
For example:

  if (starts_with(arg, "--foo=")) {
	  bar = arg + 6;
	  continue;
  }

could become:

  if (skip_prefix(arg, "--foo=", &bar))
	  continue;

However, I have left it as:

  if (skip_prefix(arg, "--foo=", &v)) {
	  bar = v;
	  continue;
  }

to visually match nearby cases which need to actually
process the string. Like:

  if (skip_prefix(arg, "--foo=", &v)) {
	  bar = atoi(v);
	  continue;
  }

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:45 -07:00
Jeff King
ff45c0d4a3 fast-import: fix read of uninitialized argv memory
Fast-import shares code between its command-line parser and
the "option" command. To do so, it strips the "--" from any
command-line options and passes them to the option parser.
However, it does not confirm that the option even begins
with "--" before blindly passing "arg + 2".

It does confirm that the option starts with "-", so the only
affected case was:

  git fast-import -

which would read uninitialized memory after the argument. We
can fix it by using skip_prefix and checking the result. As
a bonus, this gets rid of some magic numbers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:44 -07:00
Felipe Contreras
4ee1b225b9 fast-import: add support to delete refs
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-21 11:47:34 -07:00
Rohit Mani
2c5495f7b6 use strchrnul() in place of strchr() and strlen()
Avoid scanning strings twice, once with strchr() and then with
strlen(), by using strchrnul().

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Mani <rohit.mani@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-10 08:35:30 -07:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Junio C Hamano
9a86b89941 Merge branch 'bk/refs-multi-update'
Give "update-refs" a "--stdin" option to read multiple update
requests and perform them in an all-or-none fashion.

* bk/refs-multi-update:
  update-ref: add test cases covering --stdin signature
  update-ref: support multiple simultaneous updates
  refs: add update_refs for multiple simultaneous updates
  refs: add function to repack without multiple refs
  refs: factor delete_ref loose ref step into a helper
  refs: factor update_ref steps into helpers
  refs: report ref type from lock_any_ref_for_update
  reset: rename update_refs to reset_refs
2013-09-20 12:36:12 -07:00
Junio C Hamano
89dde7882f Merge branch 'rh/ishes-doc'
We liberally use "committish" and "commit-ish" (and "treeish" and
"tree-ish"); as these are non-words, let's unify these terms to
their dashed form.  More importantly, clarify the documentation on
object peeling using these terms.

* rh/ishes-doc:
  glossary: fix and clarify the definition of 'ref'
  revisions.txt: fix and clarify <rev>^{<type>}
  glossary: more precise definition of tree-ish (a.k.a. treeish)
  use 'commit-ish' instead of 'committish'
  use 'tree-ish' instead of 'treeish'
  glossary: define commit-ish (a.k.a. committish)
  glossary: mention 'treeish' as an alternative to 'tree-ish'
2013-09-17 11:42:51 -07:00
Richard Hansen
a8a5406ab3 use 'commit-ish' instead of 'committish'
Replace 'committish' in documentation and comments with 'commit-ish'
to match gitglossary(7) and to be consistent with 'tree-ish'.

The only remaining instances of 'committish' are:
  * variable, function, and macro names
  * "(also committish)" in the definition of commit-ish in
    gitglossary[7]

Signed-off-by: Richard Hansen <rhansen@bbn.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-04 15:03:03 -07:00
Richard Hansen
bb8040f9f9 use 'tree-ish' instead of 'treeish'
Replace 'treeish' in documentation and comments with 'tree-ish' to
match gitglossary(7).

The only remaining instances of 'treeish' are:
  * variable, function, and macro names
  * "(also treeish)" in the definition of tree-ish in gitglossary(7)

Signed-off-by: Richard Hansen <rhansen@bbn.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-04 15:02:56 -07:00
Junio C Hamano
7e39472020 Merge branch 'jk/fast-import-empty-ls'
* jk/fast-import-empty-ls:
  fast-import: allow moving the root tree
  fast-import: allow ls or filecopy of the root tree
  fast-import: set valid mode on root tree in "ls"
  t9300: document fast-import empty path issues
2013-09-04 12:23:35 -07:00
Brad King
9bbb0fa1fd refs: report ref type from lock_any_ref_for_update
Expose lock_ref_sha1_basic's type_p argument to callers of
lock_any_ref_for_update.  Update all call sites to ignore it by passing
NULL for now.

Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-30 14:57:28 -07:00
John Keeping
62bfa11cc9 fast-import: allow moving the root tree
Because fast-import.c::tree_content_remove does not check for the empty
path, it is not possible to move the root tree to a subdirectory.
Instead the error "Path  not in branch" is produced (note the double
space where the empty path has been inserted).

Fix this by explicitly checking for the empty path and handling it.

Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-23 14:22:28 -07:00
John Keeping
e0eb6b9720 fast-import: allow ls or filecopy of the root tree
Commit 178e1de (fast-import: don't allow 'ls' of path with empty
components, 2012-03-09) restricted paths which:

    . contain an empty directory component (e.g. foo//bar is invalid),
    . end with a directory separator (e.g. foo/ is invalid),
    . start with a directory separator (e.g. /foo is invalid).

However, the implementation also caught the empty path, which should
represent the root tree.  Relax this restriction so that the empty path
is explicitly allowed and refers to the root tree.

Reported-by: Dave Abrahams <dave@boostpro.com>
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-23 14:22:28 -07:00
John Keeping
adefdba536 fast-import: set valid mode on root tree in "ls"
This prevents a failure later when we lift the restriction on ls with
the empty path.

Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-23 14:22:28 -07:00
Junio C Hamano
dbbc93b221 Merge branch 'fc/fast-export-persistent-marks'
Optimization for fast-export by avoiding unnecessarily resolving
arbitrary object name and parsing object when only presence and
type information is necessary, etc.

* fc/fast-export-persistent-marks:
  fast-{import,export}: use get_sha1_hex() to read from marks file
  fast-export: don't parse commits while reading marks file
  fast-export: do not parse non-commit objects while reading marks file
2013-06-02 15:48:28 -07:00
Felipe Contreras
45c5d4a56b fast-{import,export}: use get_sha1_hex() to read from marks file
It's wrong to call get_sha1() if they should be SHA-1s, plus
inefficient.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-07 16:20:40 -07:00
Ramsay Jones
84d32bf767 sparse: Fix mingw_main() argument number/type errors
Sparse issues 68 errors (two errors for each main() function) such
as the following:

      SP git.c
  git.c:510:5: error: too many arguments for function mingw_main
  git.c:510:5: error: symbol 'mingw_main' redeclared with different type \
    (originally declared at git.c:510) - different argument counts

The errors are caused by the 'main' macro used by the MinGW build
to provide a replacement main() function. The original main function
is effectively renamed to 'mingw_main' and is called from the new
main function. The replacement main is used to execute certain actions
common to all git programs on MinGW (e.g. ensure the standard I/O
streams are in binary mode).

In order to suppress the errors, we change the macro to include the
parameters in the declaration of the mingw_main function.

Unfortunately, this change provokes both sparse and gcc to complain
about 9 calls to mingw_main(), such as the following:

      CC git.o
  git.c: In function 'main':
  git.c:510: warning: passing argument 2 of 'mingw_main' from \
    incompatible pointer type
  git.c:510: note: expected 'const char **' but argument is of \
    type 'char **'

In order to suppress these warnings, since both of the main
functions need to be declared with the same prototype, we
change the declaration of the 9 main functions, thus:

    int main(int argc, char **argv)

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-28 12:32:08 -07:00
Ramsay Jones
0a34594c83 fast-import: Fix an gcc -Wuninitialized warning
Commit cbfd5e1c ("drop some obsolete "x = x" compiler warning hacks",
21-03-2013) removed a gcc hack that suppressed an "might be used
uninitialized" warning issued by older versions of gcc.

However, commit 3aa99df8 ('fast-import: clarify "inline" logic in
file_change_m', 21-03-2013) addresses an (almost) identical issue
(with very similar code), but includes additional code in it's
resolution. The solution used by this commit, unlike that used by
commit cbfd5e1c, also suppresses the -Wuninitialized warning on
older versions of gcc.

In order to suppress the warning (against the 'oe' symbol) in the
note_change_n() function, we adopt the same solution used by commit
3aa99df8.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-29 23:46:55 -07:00
Jeff King
3aa99df802 fast-import: clarify "inline" logic in file_change_m
When we read a fast-import line like:

  M 100644 :1 foo.c

we point the local object_entry variable "oe" to the object
named by the mark ":1". When the input uses the "inline"
construct, however, we do not have such an object_entry.

The current code is careful not to access "oe" in the inline
case, but we can make the assumption even more obvious (and
catch violations of it) by setting oe to NULL and adding a
comment. As a bonus, this also squelches an over-zealous gcc
-Wuninitialized warning, which means we can drop the "oe =
oe" initialization hack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21 14:06:49 -07:00
Jeff King
cbfd5e1cbb drop some obsolete "x = x" compiler warning hacks
In cases where the setting and access of a variable are
protected by the same conditional flag, older versions of
gcc would generate a "might be used unitialized" warning. We
silence the warning by initializing the variable to itself,
a hack that gcc recognizes.

Modern versions of gcc are smart enough to get this right,
going back to at least version 4.3.5. gcc 4.1 does get it
wrong in both cases, but is sufficiently old that we
probably don't need to care about it anymore.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21 14:06:38 -07:00
Jeff King
4db34cc134 fast-import: use pointer-to-pointer to keep list tail
This is shorter, idiomatic, and it means the compiler does
not get confused about whether our "e" pointer is valid,
letting us drop the "e = e" hack.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21 14:06:19 -07:00
Junio C Hamano
34f5130af8 Merge branch 'jc/merge-bases'
Optimise the "merge-base" computation a bit, and also update its
users that do not need the full merge-base information to call a
cheaper subset.

* jc/merge-bases:
  reduce_heads(): reimplement on top of remove_redundant()
  merge-base: "--is-ancestor A B"
  get_merge_bases_many(): walk from many tips in parallel
  in_merge_bases(): use paint_down_to_common()
  merge_bases_many(): split out the logic to paint history
  in_merge_bases(): omit unnecessary redundant common ancestor reduction
  http-push: use in_merge_bases() for fast-forward check
  receive-pack: use in_merge_bases() for fast-forward check
  in_merge_bases(): support only one "other" commit
2012-09-11 11:36:05 -07:00
Junio C Hamano
a20efee9cf in_merge_bases(): support only one "other" commit
In early days of its life, I planned to make it possible to compute
"is a commit contained in all of these other commits?" with this
function, but it turned out that no caller needed it.

Just make it take two commit objects and add a comment to say what
these two functions do.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27 18:36:39 -07:00
Pete Wyckoff
06454cb9a3 fast-import: tighten parsing of datarefs
The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference.  Fast-import does not complain when garbage
appears after a mark reference in some cases.

Factor out parsing of mark references and complain if
errant characters are found.  Also be a little more careful
when parsing "inline" and SHA1s, complaining if extra
characters appear or if the form of the dataref is unrecognized.

Buggy input can cause fast-import to produce the wrong output,
silently, without error.  This makes it difficult to track
down buggy generators of fast-import streams.  An example is
seen in the last line of this commit command:

    commit refs/heads/S2
    committer Name <name@example.com> 1112912893 -0400
    data <<COMMIT
    commit message
    COMMIT
    from :1M 100644 :103 hello.c

It is missing a newline and should be:

    [...]
    from :1
    M 100644 :103 hello.c

What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^.  What the buggy
program was expecting was the contents of blob :103.  While
the resulting commit graph looked correct, the contents in
some commits were wrong.

Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-10 14:34:02 -07:00
Junio C Hamano
79efeae69d Merge branch 'jn/maint-fast-import-empty-ls' into maint
* jn/maint-fast-import-empty-ls:
  fast-import: don't allow 'ls' of path with empty components
  fast-import: leakfix for 'ls' of dirty trees
2012-03-26 12:10:25 -07:00
Jonathan Nieder
178e1deaae fast-import: don't allow 'ls' of path with empty components
As the fast-import manual explains:

	The value of <path> must be in canonical form. That is it must
	not:
	. contain an empty directory component (e.g. foo//bar is invalid),
	. end with a directory separator (e.g. foo/ is invalid),
	. start with a directory separator (e.g. /foo is invalid),

Unfortunately the "ls" command accepts these invalid syntaxes and
responds by declaring that the indicated path is missing.  This is too
subtle and causes importers to silently misbehave; better to error out
so the operator knows what's happening.

The C, R, and M commands already error out for such paths.

Reported-by: Andrew Sayers <andrew-git@pileofstuff.org>
Analysis-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-03-09 22:07:22 -06:00
Jonathan Nieder
c27e559da5 fast-import: leakfix for 'ls' of dirty trees
When the chosen directory has changed since it was last written to
pack, "tree_content_get" makes a deep copy of its content to scribble
on while computing the tree name, which we forgot to free.

This leak has been present since the 'ls' command was introduced in
v1.7.5-rc0~3^2~33 (fast-import: add 'ls' command, 2010-12-02).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-03-09 22:02:44 -06:00
Thomas Rast
a8ea1b7a55 fast-import: zero all of 'struct tag' to silence valgrind
When running t9300, valgrind (correctly) complains about an
uninitialized value in write_crash_report:

  ==2971== Use of uninitialised value of size 8
  ==2971==    at 0x4164F4: sha1_to_hex (hex.c:70)
  ==2971==    by 0x4073E4: die_nicely (fast-import.c:468)
  ==2971==    by 0x43284C: die (usage.c:86)
  ==2971==    by 0x40420D: main (fast-import.c:2731)
  ==2971==  Uninitialised value was created by a heap allocation
  ==2971==    at 0x4C29B3D: malloc (vg_replace_malloc.c:263)
  ==2971==    by 0x433645: xmalloc (wrapper.c:35)
  ==2971==    by 0x405DF5: pool_alloc (fast-import.c:619)
  ==2971==    by 0x407755: pool_calloc.constprop.14 (fast-import.c:634)
  ==2971==    by 0x403F33: main (fast-import.c:3324)

Fix this by zeroing all of the 'struct tag'.  We would only need to
zero out the 'sha1' field, but this way seems more future-proof.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-05 09:36:09 -08:00
Ævar Arnfjörð Bjarmason
ab1900a36e Appease Sun Studio by renaming "tmpfile"
On Solaris the system headers define the "tmpfile" name, which'll
cause Git compiled with Sun Studio 12 Update 1 to whine about us
redefining the name:

    "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile     (E_PRAGMA_REDEFINE_STATIC)
    "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
    "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile   (E_PRAGMA_REDEFINE_STATIC)
    "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)

Just renaming the "tmpfile" variable to "tmp_file" in the relevant
places is the easiest way to fix this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21 10:21:04 -08:00
Junio C Hamano
2dccad3c6f Merge branch 'ab/enable-i18n'
* ab/enable-i18n:
  i18n: add infrastructure for translating Git with gettext

Conflicts:
	Makefile
2011-12-19 16:06:41 -08:00
Junio C Hamano
48b303675a Merge branch 'jc/stream-to-pack'
* jc/stream-to-pack:
  bulk-checkin: replace fast-import based implementation
  csum-file: introduce sha1file_checkpoint
  finish_tmp_packfile(): a helper function
  create_tmp_packfile(): a helper function
  write_pack_header(): a helper function

Conflicts:
	pack.h
2011-12-16 22:33:40 -08:00
Ævar Arnfjörð Bjarmason
5e9637c629 i18n: add infrastructure for translating Git with gettext
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.

This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.

This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.

The rest of the commit message goes into detail about various
sub-parts of this commit.

= Installation

Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.

= Perl

Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.

Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.

Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.

I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.

See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.

= Shell

Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.

If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.

If neither are present we'll use a dumb printf(1) fall-through
wrapper.

= About libcharset.h and langinfo.h

We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.

The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.

GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.

=Credits

This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.

[jc: squashed a small Makefile fix from Ramsay]

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-05 20:46:55 -08:00
Junio C Hamano
6c52614864 csum-file: introduce sha1file_checkpoint
It is useful to be able to rewind a check-summed file to a certain
previous state after writing data into it using sha1write() API. The
fast-import command does this after streaming a blob data to the packfile
being generated and then noticing that the same blob has already been
written, and it does this with a private code truncate_pack() that is
commented as "Yes, this is a layering violation".

Introduce two API functions, sha1file_checkpoint(), that allows the caller
to save a state of a sha1file, and then later revert it to the saved state.
Use it to reimplement truncate_pack().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-30 14:27:59 -08:00
Johan Herland
1838685780 fast-import: Fix incorrect fanout level when modifying existing notes refs
This fixes the bug uncovered by the tests added in the previous two patches.

When an existing notes ref was loaded into the fast-import machinery, the
num_notes counter associated with that ref remained == 0, even though the
true number of notes in the loaded ref was higher. This caused a fanout
level of 0 to be used, although the actual fanout of the tree could be > 0.
Manipulating the notes tree at an incorrect fanout level causes removals to
silently fail, and modifications of existing notes to instead produce an
additional note (leaving the old object in place at a different fanout level).

This patch fixes the bug by explicitly counting the number of notes in the
notes tree whenever it looks like the num_notes counter could be wrong (when
num_notes == 0). There may be false positives (i.e. triggering the counting
when the notes tree is truly empty), but in those cases, the counting should
not take long.

Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-28 16:38:46 -08:00
Junio C Hamano
c13975e7fd Merge branch 'di/fast-import-empty-tag-note-fix'
* di/fast-import-empty-tag-note-fix:
  fast-import: don't allow to note on empty branch
  fast-import: don't allow to tag empty branch
2011-10-13 19:03:19 -07:00
Michael Haggerty
8d9c50105f Change check_ref_format() to take a flags argument
Change check_ref_format() to take a flags argument that indicates what
is acceptable in the reference name (analogous to "git
check-ref-format"'s "--allow-onelevel" and "--refspec-pattern").  This
is more convenient for callers and also fixes a failure in the test
suite (and likely elsewhere in the code) by enabling "onelevel" and
"refspec-pattern" to be allowed independently of each other.

Also rename check_ref_format() to check_refname_format() to make it
obvious that it deals with refnames rather than references themselves.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-05 13:45:29 -07:00
Dmitry Ivankov
0bc69881a6 fast-import: don't allow to note on empty branch
'reset' command makes fast-import start a branch from scratch. It's name
is kept in lookup table but it's sha1 is null_sha1 (special value).
'notemodify' command can be used to add a note on branch head given it's
name. lookup_branch() is used it that case and it doesn't check for
null_sha1. So fast-import writes a note for null_sha1 object instead of
giving a error.

Add a check to deny adding a note on empty branch and add a corresponding
test.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-22 13:30:59 -07:00
Dmitry Ivankov
2c9c8ee2de fast-import: don't allow to tag empty branch
'reset' command makes fast-import start a branch from scratch. It's name
is kept in lookup table but it's sha1 is null_sha1 (special value).
'tag' command can be used to tag a branch by it's name. lookup_branch()
is used it that case and it doesn't check for null_sha1. So fast-import
writes a tag for null_sha1 object instead of giving a error.

Add a check to deny tagging an empty branch and add a corresponding test.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-22 13:30:57 -07:00
Junio C Hamano
0dc691a4f3 Merge branch 'di/fast-import-tagging'
* di/fast-import-tagging:
  fast-import: allow to tag newly created objects
  fast-import: add tests for tagging blobs
2011-08-28 21:18:48 -07:00
Junio C Hamano
05d88e6f7e Merge branch 'di/fast-import-blob-tweak'
* di/fast-import-blob-tweak:
  fast-import: treat cat-blob as a delta base hint for next blob
  fast-import: count and report # of calls to diff_delta in stats
2011-08-28 21:18:47 -07:00
Junio C Hamano
45792b64c1 Merge branch 'di/fast-import-deltified-tree'
* di/fast-import-deltified-tree:
  fast-import: prevent producing bad delta
  fast-import: add a test for tree delta base corruption
2011-08-28 21:18:47 -07:00
Junio C Hamano
0b98954975 Merge branch 'di/fast-import-ident'
* di/fast-import-ident:
  fsck: improve committer/author check
  fsck: add a few committer name tests
  fast-import: check committer name more strictly
  fast-import: don't fail on omitted committer name
  fast-import: add input format tests
2011-08-28 21:18:47 -07:00
Dmitry Ivankov
6c447f633c fast-import: allow to tag newly created objects
fast-import allows to tag objects by sha1 and to query sha1 of objects
being imported. So it should allow to tag these objects, make it do so.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-23 11:25:59 -07:00
Dmitry Ivankov
2efe38e7da fast-import: add tests for tagging blobs
fast-import allows to create an annotated tag that annotates a blob,
via mark or direct sha1 specification.

For mark it works, for sha1 it tries to read the object. It tries to
do so via read_sha1_file, and then checks the size to be at least 46.

That's weird, let's just allow to (annotated) tag any object referenced
by sha1. If the object originates from our packfile, we still fail though.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-23 11:25:56 -07:00
Dmitry Ivankov
a7e9c34126 fast-import: treat cat-blob as a delta base hint for next blob
Delta base for blobs is chosen as a previously saved blob. If we
treat cat-blob's blob as a delta base for the next blob, nothing
is likely to become worse.

For fast-import stream producer like svn-fe cat-blob is used like
following:
- svn-fe reads file delta in svn format
- to apply it, svn-fe asks cat-blob 'svn delta base'
- applies 'svn delta' to the response
- produces a blob command to store the result

Currently there is no way for svn-fe to give fast-import a hint on
object delta base. While what's requested in cat-blob is most of
the time a best delta base possible. Of course, it could be not a
good delta base, but we don't know any better one anyway.

So do treat cat-blob's result as a delta base for next blob. The
profit is nice: 2x to 7x reduction in pack size AND 1.2x to 3x
time speedup due to diff_delta being faster on good deltas. git gc
--aggressive can compress it even more, by 10% to 70%, utilizing
more cpu time, real time and 3 cpu cores.

Tested on 213M and 2.7G fast-import streams, resulting packs are 22M
and 113M, import time is 7s and 60s, both streams are produced by
svn-fe, sniffed and then used as raw input for fast-import.

For git-fast-export produced streams there is no change as it doesn't
use cat-blob and doesn't try to reorder blobs in some smart way to
make successive deltas small.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Acked-by: David Barr <davidbarr@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22 11:57:07 -07:00
Dmitry Ivankov
94c3b48247 fast-import: count and report # of calls to diff_delta in stats
It's an interesting number, how often do we try to deltify each type of
objects and how often do we succeed. So do add it to stats.

Success doesn't mean much gain in pack size though. As we allow delta to
be as big as (data.len - 20). And delta close to data.len gains nothing
compared to no delta at all even after zlib compression (delta is pretty
much the same as data, just with few modifications).

We should try to make less attempts that result in huge deltas as these
consume more cpu than trivial small deltas. Either by choosing a better
delta base or reducing delta size upper bound or doing less delta attempts
at all.

Currently, delta base for blobs is a waste literally. Each blob delta
base is chosen as a previously stored blob. Disabling deltas for blobs
doesn't increase pack size and reduce import time, or at least doesn't
increase time for all fast-import streams I've tried.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Acked-by: David Barr <davidbarr@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22 11:57:06 -07:00
Dmitry Ivankov
8fb3ad76b1 fast-import: prevent producing bad delta
To produce deltas for tree objects fast-import tracks two versions
of tree's entries - base and current one. Base version stands both
for a delta base of this tree, and for a entry inside a delta base
of a parent tree. So care should be taken to keep it in sync.

tree_content_set cuts away a whole subtree and replaces it with a
new one (or NULL for lazy load of a tree with known sha1). It
keeps a base sha1 for this subtree (needed for parent tree). And
here is the problem, 'subtree' tree root doesn't have the implied
base version entries.

Adjusting the subtree to include them would mean a deep rewrite of
subtree. Invalidating the subtree base version would mean recursive
invalidation of parents' base versions. So just mark this tree as
do-not-delta me. Abuse setuid bit for this purpose.

tree_content_replace is the same as tree_content_set except that is
is used to replace the root, so just clearing base sha1 here (instead
of setting the bit) is fine.

[di: log message]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-14 14:40:01 -07:00
Dmitry Ivankov
4b4963c0e1 fast-import: check committer name more strictly
The documentation declares following identity format:
(<name> SP)? LT <email> GT
where name is any string without LF and LT characters.
But fast-import just accepts any string up to first GT
instead of checking the whole format, and moreover just
writes it as is to the commit object.

git-fsck checks for [^<\n]* <[^<>\n]*> format. Note that the
space is mandatory. And the space quirk is already handled via
extending the string to the left when needed.

Modify fast-import input identity format to a slightly stricter
one - deny LF, LT and GT in both <name> and <email>. And check
for it.

This is stricter then git-fsck as fsck accepts "Name> <email>"
currently, but soon fsck check will be adjusted likewise.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 12:21:03 -07:00
Dmitry Ivankov
17fb00721b fast-import: don't fail on omitted committer name
fast-import format declares 'committer_name SP' to be optional in
'committer_name SP LT email GT'. But for a (commit) object SP is
obligatory while zero length committer_name is ok. git-fsck checks
that SP is present, so fast-import must prepend it if the name SP
part is omitted. It doesn't do so and thus for "LT email GT" ident
it writes a bad object.

Name cannot contain LT or GT, ident always comes after SP in fast-import.
So if ident starts with LT reuse the SP as if a valid 'SP LT email GT'
ident was passed.

This fixes a ident parsing bug for a well-formed fast-import input.
Though the parsing is still loose and can accept a ill-formed input.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 12:20:56 -07:00
Junio C Hamano
59d9ba869e Merge branch 'sr/transport-helper-fix'
* sr/transport-helper-fix: (21 commits)
  transport-helper: die early on encountering deleted refs
  transport-helper: implement marks location as capability
  transport-helper: Use capname for refspec capability too
  transport-helper: change import semantics
  transport-helper: update ref status after push with export
  transport-helper: use the new done feature where possible
  transport-helper: check status code of finish_command
  transport-helper: factor out push_update_refs_status
  fast-export: support done feature
  fast-import: introduce 'done' command
  git-remote-testgit: fix error handling
  git-remote-testgit: only push for non-local repositories
  remote-curl: accept empty line as terminator
  remote-helpers: export GIT_DIR variable to helpers
  git_remote_helpers: push all refs during a non-local export
  transport-helper: don't feed bogus refs to export push
  git-remote-testgit: import non-HEAD refs
  t5800: document some non-functional parts of remote helpers
  t5800: use skip_all instead of prereq
  t5800: factor out some ref tests
  ...
2011-08-01 15:00:14 -07:00
Sverre Rabbelier
be56862f19 fast-import: introduce 'done' command
Add a 'done' command that causes fast-import to stop reading from the
stream and exit.

If the new --done command line flag was passed on the command line
(or a "feature done" declaration included at the start of the stream),
make the 'done' command mandatory.  So "git fast-import --done"'s
input format will be prefix-free, making errors easier to detect when
they show up as early termination at some convenient time of the
upstream of a pipe writing to fast-import.

Another possible application of the 'done' command would to be allow a
fast-import stream that is only a small part of a larger encapsulating
stream to be easily parsed, leaving the file offset after the "done\n"
so the other application can pick up from there.  This patch does not
teach fast-import to do that --- fast-import still uses buffered input
(stdio).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-19 11:17:47 -07:00
Junio C Hamano
d907bf8ef3 Merge branch 'jc/index-pack'
* jc/index-pack:
  verify-pack: use index-pack --verify
  index-pack: show histogram when emulating "verify-pack -v"
  index-pack: start learning to emulate "verify-pack -v"
  index-pack: a miniscule refactor
  index-pack --verify: read anomalous offsets from v2 idx file
  write_idx_file: need_large_offset() helper function
  index-pack: --verify
  write_idx_file: introduce a struct to hold idx customization options
  index-pack: group the delta-base array entries also by type

Conflicts:
	builtin/verify-pack.c
	cache.h
	sha1_file.c
2011-07-19 09:54:51 -07:00
Junio C Hamano
ef49a7a012 zlib: zlib can only process 4GB at a time
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:52:15 -07:00
Junio C Hamano
225a6f1068 zlib: wrap deflateBound() too
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:18:17 -07:00