Commit graph

106 commits

Author SHA1 Message Date
Geert Bosch
aa7e44bf57 Unify write_index_file functions
This patch unifies the write_index_file functions in
builtin-pack-objects.c and index-pack.c.  As the name
"index" is overloaded in git, move in the direction of
using "idx" and "pack idx" when refering to the pack index.
There should be no change in functionality.

Signed-off-by: Geert Bosch <bosch@gnat.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-06-02 13:14:18 -07:00
Johan Herland
8a912bcb25 Ensure return value from xread() is always stored into an ssize_t
This patch fixes all calls to xread() where the return value is not
stored into an ssize_t. The patch should not have any effect whatsoever,
other than putting better/more appropriate type names on variables.

Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-15 21:16:03 -07:00
Shawn O. Pearce
cbc6bdab08 Reuse fixup_pack_header_footer in index-pack
Now that fast-import is using a "library function" to handle
correcting its packfile's object count and trailing SHA-1 we
should reuse the same function in index-pack, to reduce the
size of the code we must maintain.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-05-02 13:24:21 -04:00
Nicolas Pitre
13aaf14825 make progress "title" part of the common progress interface
If the progress bar ends up in a box, better provide a title for it too.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-22 22:18:05 -07:00
Nicolas Pitre
96a02f8f6d common progress display support
Instead of having this code duplicated in multiple places, let's have
a common interface for progress display.  If someday someone wishes to
display a cheezy progress bar instead then only one file will have to
be changed.

Note: I left merge-recursive.c out since it has a strange notion of
progress as it apparently increase the expected total number as it goes.
Someone with more intimate knowledge of what that is supposed to mean
might look at converting it to the common progress interface.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-22 22:18:05 -07:00
Nicolas Pitre
4ba7d71153 allow forcing index v2 and 64-bit offset treshold
This is necessary for testing the new capabilities in some automated
way without having an actual 4GB+ pack.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:48:14 -07:00
Nicolas Pitre
d1a46a9eab index-pack: learn about pack index version 2
Like previous patch but for index-pack.

[ There is quite some code duplication between pack-objects and index-pack
  for generating a pack index (and fast-import as well I suppose).  This
  should be reworked into a common function eventually. But not now. ]

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:48:14 -07:00
Nicolas Pitre
ee5743ce19 compute object CRC32 with index-pack
Same as previous patch but for index-pack.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:48:14 -07:00
Nicolas Pitre
d7dd02231f add overflow tests on pack offset variables
Change a few size and offset variables to more appropriate type, then
add overflow tests on those offsets.  This prevents any bad data to be
generated/processed if off_t happens to not be large enough to handle
some big packs.

Better be safe than sorry.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:48:14 -07:00
Nicolas Pitre
8723f21626 make overflow test on delta base offset work regardless of variable size
This patch introduces the MSB() macro to obtain the desired number of
most significant bits from a given variable independently of the variable
type.

It is then used to better implement the overflow test on the OBJ_OFS_DELTA
base offset variable with the property of always working correctly
regardless of the type/size of that variable.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:48:14 -07:00
Nicolas Pitre
bbf4b41baf Plug memory leak in index-pack collision checking codepath. 2007-04-03 19:04:56 -07:00
Nicolas Pitre
0e55181f29 make it more obvious that temporary files are temporary files
When some operations are interrupted (or "die()'d" or crashed) then the
partial object/pack/index file may remain around.  Make it more obvious
in their name that those files are temporary stuff and can be cleaned up
if no operation is in progress.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-24 22:32:39 -07:00
Nicolas Pitre
9096c660a8 index-pack: more validation checks and cleanups
When appending objects to a pack, make sure the appended data is really
what we expect instead of simply loading potentially corrupted objects
and legitimating them by computing a SHA1 of that corrupt data.

With this the sha1_object() can lose its test_for_collision parameter
which is now redundent.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-20 22:09:59 -07:00
Nicolas Pitre
ce9fbf16e0 index-pack: use hash_sha1_file()
Use hash_sha1_file() instead of duplicating code to compute object SHA1.
While at it make it accept a const pointer.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-20 22:09:57 -07:00
Nicolas Pitre
8685da4256 don't ever allow SHA1 collisions to exist by fetching a pack
Waaaaaaay back Git was considered to be secure as it never overwrote
an object it already had.  This was ensured by always unpacking the
packfile received over the network (both in fetch and receive-pack)
and our already existing logic to not create a loose object for an
object we already have.

Lately however we keep "large-ish" packfiles on both fetch and push
by running them through index-pack instead of unpack-objects.  This
would let an attacker perform a birthday attack.

How?  Assume the attacker knows a SHA-1 that has two different
data streams.  He knows the client is likely to have the "good"
one.  So he sends the "evil" variant to the other end as part of
a "large-ish" packfile.  The recipient keeps that packfile, and
indexes it.  Now since this is a birthday attack there is a SHA-1
collision; two objects exist in the repository with the same SHA-1.
They have *very* different data streams.  One of them is "evil".

Currently the poor recipient cannot tell the two objects apart,
short of by examining the timestamp of the packfiles.  But lets
say the recipient repacks before he realizes he's been attacked.
We may wind up packing the "evil" version of the object, and deleting
the "good" one.  This is made *even more likely* by Junio's recent
rearrange_packed_git patch (b867092f).

It is extremely unlikely for a SHA1 collisions to occur, but if it
ever happens with a remote (hence untrusted) object we simply must
not let the fetch succeed.

Normally received packs should not contain objects we already have.
But when they do we must ensure duplicated objects with the same SHA1
actually contain the same data.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-20 22:08:25 -07:00
Shawn O. Pearce
3a55602eec General const correctness fixes
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime.  Likewise we should always be
treating unsigned values as unsigned values, not as signed values.

Most of these are very straightforward.  The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case.  Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 10:47:10 -08:00
Junio C Hamano
77b50ab009 Merge branch 'js/bundle'
* js/bundle:
  bundle: reword missing prerequisite error message
  git-bundle: record commit summary in the prerequisite data
  git-bundle: fix 'create --all'
  git-bundle: avoid fork() in verify_bundle()
  git-bundle: assorted fixes
  Add git-bundle: move objects and references by archive
2007-02-28 14:38:36 -08:00
Junio C Hamano
597388f6a1 Merge branch 'np/types'
* np/types:
  Cleanup check_valid in commit-tree.
  make sure enum object_type is signed
  get rid of lookup_object_type()
  convert object type handling from a string to a number
  formalize typename(), and add its reverse type_from_string()
  sha1_file.c: don't ignore an error condition in sha1_loose_object_info()
  sha1_file.c: cleanup "offset" usage
  sha1_file.c: cleanup hdr usage
2007-02-28 11:58:27 -08:00
Junio C Hamano
c4f8f82755 Merge branch 'maint'
* maint:
  builtin-fmt-merge-msg: fix bugs in --file option
  index-pack: Loop over pread until data loading is complete.
  blameview: Fix the browse behavior in blameview
  Fix minor typos/grammar in user-manual.txt
  Correct ordering in git-cvsimport's option documentation
  git-show: Reject native ref
  Fix git-show man page formatting in the EXAMPLES section
2007-02-27 22:15:42 -08:00
Shawn O. Pearce
a91d49cd36 index-pack: Loop over pread until data loading is complete.
A filesystem might not be able to completely supply our pread
request in one system call, such as if we are reading data from a
network file system and the requested length is just simply huge.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 21:58:46 -08:00
Nicolas Pitre
21666f1aae convert object type handling from a string to a number
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value.  One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.

This is an initial step for the removal of the version using a char array
found in object reading code paths.  The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 01:34:21 -08:00
Nicolas Pitre
df8436622f formalize typename(), and add its reverse type_from_string()
Sometime typename() is used, sometimes type_names[] is accessed directly.
Let's enforce typename() all the time which allows for validating the
type.

Also let's add a function to go from a name to a type and use it instead
of manual memcpy() when appropriate.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 01:34:21 -08:00
Johannes Schindelin
fa257b0554 git-bundle: assorted fixes
This patch fixes issues mentioned by Junio, Nico and Simon:

- I forgot to convert the usage string when removing the "--" from
  the subcommands,
- a style fix in the bundle_header,
- use xread() instead of read(),
- use write_or_die() instead of write(),
- make the bundle header extensible,
- fail if the whitespace after a sha1 of a reference is missing,
- close() the fds passed to a subprocess,
- in verify_bundle(), do not use "rev-list --stdin", but rather
  pass the revs directly (avoiding a fork()),
- fix a corrupted comment in show_object(), and
- fix the size check in index_pack.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-22 22:30:33 -08:00
Johannes Schindelin
2e0afafebd Add git-bundle: move objects and references by archive
Some workflows require use of repositories on machines that cannot be
connected, preventing use of git-fetch / git-push to transport objects and
references between the repositories.

git-bundle provides an alternate transport mechanism, effectively allowing
git-fetch and git-pull to operate using sneakernet transport. `git-bundle
create` allows the user to create a bundle containing one or more branches
or tags, but with specified basis assumed to exist on the target
repository. At the receiving end, git-bundle acts like git-fetch-pack,
allowing the user to invoke git-fetch or git-pull using the bundle file as
the URL. git-fetch and git-ls-remote determine they have a bundle URL by
checking that the URL points to a file, but are otherwise unchanged in
operation with bundles.

The original patch was done by Mark Levedahl <mdl123@verizon.net>.

It was updated to make git-bundle a builtin, and get rid of the tar
format: now, the first line is supposed to say "# v2 git bundle", the next
lines either contain a prerequisite ("-" followed by the hash of the
needed commit), or a ref (the hash of a commit, followed by the name of
the ref), and finally the pack. As a result, the bundle argument can be
"-" now.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-22 22:30:33 -08:00
Junio C Hamano
cc44c7655f Mechanical conversion to use prefixcmp()
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily.  Leftover from this will be fixed in a separate step, including
idiotic conversions like

    if (!strncmp("foo", arg, 3))

  =>

    if (!(-prefixcmp(arg, "foo")))

This was done by using this script in px.perl

   #!/usr/bin/perl -i.bak -p
   if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
           s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
   }
   if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
           s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
   }

and running:

   $ git grep -l strncmp -- '*.c' | xargs perl px.perl

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-20 22:03:15 -08:00
Junio C Hamano
d1b2ddc863 index-pack: write-or-die instead of unchecked write-in-full.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-11 13:19:31 -08:00
Andy Whitcroft
93822c2239 short i/o: fix calls to write to use xwrite or write_in_full
We have a number of badly checked write() calls.  Often we are
expecting write() to write exactly the size we requested or fail,
this fails to handle interrupts or short writes.  Switch to using
the new write_in_full().  Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xwrite().

Note, the changes to config handling are much larger and handled
in the next patch in the sequence.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-08 15:44:47 -08:00
Andy Whitcroft
93d26e4cb9 short i/o: fix calls to read to use xread or read_in_full
We have a number of badly checked read() calls.  Often we are
expecting read() to read exactly the size we requested or fail, this
fails to handle interrupts or short reads.  Add a read_in_full()
providing those semantics.  Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xread().

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-08 15:44:47 -08:00
Nicolas Pitre
08a19d873c clarify some error messages wrt unknown object types
If ever new object types are added for future extensions then better
have current git version report them as "unknown" instead of
"corrupted".

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 10:46:34 -08:00
Junio C Hamano
85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Nicolas Pitre
6d2fa7f1b4 index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
It was reported by Randal L. Schwartz <merlyn@stonehenge.com> that
indexing the Linux repository ~150MB pack takes about an hour on OS x
while it's a minute on Linux.  It seems that the OS X mmap()
implementation is more than 2 orders of magnitude slower than the Linux
one.

Linus proposed a patch replacing mmap() with pread() bringing index-pack
performance on OS X in line with the Linux one.  The performances on
Linux also improved by a small margin.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 00:42:10 -08:00
Jim Meyering
554a2636f7 Don't use memcpy when source and dest. buffers may overlap
git-index-pack can call memcpy with overlapping source and destination
buffers.  The patch below makes it use memmove instead.

If you want to demonstrate a failure, add the following two lines

+               if (input_offset < input_len)
+                 abort ();

before the existing memcpy call (shown in the patch below),
and then run this:

  (cd t; sh ./t5500-fetch-pack.sh)

Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-11 14:04:43 -08:00
Rene Scharfe
a6e8a76770 sparse fix: non-ANSI function declaration
The declaration of discard_cache() in cache.h already has its "void".

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-18 11:40:00 -08:00
Nicolas Pitre
576162a45f remove .keep pack lock files when done with refs update
This makes both git-fetch and git-push (fetch-pack and receive-pack)
safe against a possible race with aparallel git-repack -a -d that could
prune the new pack while it is not yet referenced, and remove the .keep
file after refs have been updated.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-03 00:24:07 -08:00
Nicolas Pitre
9ca4a201ea have index-pack create .keep file more carefully
If by chance we receive a pack which content (list of objects) matches
another pack that we already have, and if that pack is marked with a
.keep file, then we should not overwrite it.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-03 00:24:07 -08:00
Nicolas Pitre
bed006fbdd Allow pack header preprocessing before unpack-objects/index-pack.
Some applications which invoke unpack-objects or index-pack --stdin
may want to examine the pack header to determine the number of
objects contained in the pack and use that value to determine which
executable to invoke to handle the rest of the pack stream.

However if the caller consumes the pack header from the input stream
then its no longer available for unpack-objects or index-pack --stdin,
both of which need the version and object count to process the stream.

This change introduces --pack_header=ver,cnt as a command line option
that the caller can supply to indicate it has already consumed the
pack header and what version and object count were found in that
header.  As this option is only meant for low level applications
such as receive-pack we are not documenting it at this time.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-03 00:24:07 -08:00
Shawn Pearce
b807770924 Teach git-index-pack how to keep a pack file.
To prevent a race condition between `index-pack --stdin` and
`repack -a -d` where the repack deletes the newly created pack
file before any refs are updated to reference objects contained
within it we mark the pack file as one that should be kept.  This
removes it from the list of packs that `repack -a -d` will consider
for removal.

Callers such as `receive-pack` which want to invoke `index-pack`
should use this new --keep option to prevent the newly created pack
and index file pair from being deleted before they have finished any
related ref updates.  Only after all ref updates have been finished
should the associated .keep file be removed.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-29 13:45:03 -08:00
Nicolas Pitre
b89c4e93cc index-pack: minor fixes to comment and function name
Use proper english. Be more exact in one comment.

[jc: I threw in a bit of style clean-up as well]

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-27 15:00:19 -07:00
Nicolas Pitre
9bee247851 mimic unpack-objects when --stdin is used with index-pack
It appears that git-unpack-objects writes the last part of the input
buffer to stdout after the pack has been parsed.  This looks a bit
suspicious since the last fill() might have filled the buffer up to
the 4096 byte limit and more data might still be pending on stdin,
but since this is about being a drop-in replacement for unpack-objects
let's simply duplicate the same behavior for now.

[jc: with fix-up appeared in Nico's sleep]

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-26 18:27:44 -07:00
Nicolas Pitre
3c9af36646 add progress status to index-pack
This is more interesting to look at when performing a big fetch.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-26 00:20:13 -07:00
Nicolas Pitre
636171cb80 make index-pack able to complete thin packs.
A new flag, --fix-thin, instructs git-index-pack to append any missing
objects to a thin pack to make it self contained and indexable. Of course
objects missing from the pack must be present elsewhere in the local
repository.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-26 00:20:07 -07:00
Nicolas Pitre
e42797f5b6 enable index-pack streaming capability
A new flag, --stdin, allows for a pack to be received over a stream.
When this flag is provided, the pack content is written to either
the named pack file or directly to the object repository under the
same name as produced by git-repack.  The pack index is written as
well with the corresponding base name, unless the index name is
overriden with -o.

With this patch, git-index-pack could be used instead of
git-unpack-objects when fetching remote objects but only with
non "thin" packs for now.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-25 14:39:07 -07:00
Nicolas Pitre
2d477051ef add the capability for index-pack to read from a stream
This patch only adds the streaming capability to index-pack.  Although
the code is different it has the exact same functionality as before to
make sure nothing broke.

This is in preparation for receiving packs over the net, parse them on
the fly, fix them up if they are "thin" packs, and keep the resulting
pack instead of exploding it into loose objects.  But such functionality
should come separately.

One immediate advantage of this patch is that index-pack can now deal
with packs up to 4GB in size even on 32-bit architectures since the pack
is not entirely mmap()'d all at once anymore.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-20 16:51:46 -07:00
Nicolas Pitre
3c552873c6 index-pack: compare only the first 20-bytes of the key.
The "union delta_base" is a strange beast.  It is a 20-byte
binary blob key to search a binary searchable deltas[] array,
each element of which uses it to represent its base object with
either a full 20-byte SHA-1 or an offset in the pack.  Which
representation is used is determined by another field of the
deltas[] array element, obj->type, so there is no room for
confusion, as long as we make sure we compare the keys for the
same type only with appropriate length.  The code compared the
full union with memcmp().

When storing the in-pack offset, the union was first cleared
before storing an unsigned long, so comparison worked fine.

On 64-bit architectures, however, the union typically is 24-byte
long; the code did not clear the remaining 4-byte alignment
padding when storing a full 20-byte SHA-1 representation.  Using
memcmp() to compare the whole union was wrong.

This fixes the comparison to look at the first 20-bytes of the
union, regardless of the architecture.  As long as ulong is
smaller than 20-bytes this works fine.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-18 10:07:49 -07:00
Nicolas Pitre
53dda6ff62 teach git-index-pack about deltas with offset to base
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 00:12:00 -07:00
Nicolas Pitre
eb32d236df introduce delta objects with offset to base
This adds a new object, namely OBJ_OFS_DELTA, renames OBJ_DELTA to
OBJ_REF_DELTA to better make the distinction between those two delta
objects, and adds support for the handling of those new delta objects
in sha1_file.c only.

The OBJ_OFS_DELTA contains a relative offset from the delta object's
position in a pack instead of the 20-byte SHA1 reference to identify
the base object.  Since the base is likely to be not so far away, the
relative offset is more likely to have a smaller encoding on average
than an absolute offset.  And for those delta objects the base must
always be stored first because there is no way to know the distance of
later objects when streaming a pack.  Hence this relative offset is
always meant to be negative.

The offset encoding is slightly denser than the one used for object
size -- credits to <linux@horizon.com> (whoever this is) for bringing
it to my attention.

This allows for pack size reduction between 3.2% (Linux-2.6) to over 5%
(linux-historic).  Runtime pack access should be faster too since delta
replay does skip a search in the pack index for each delta in a chain.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 00:11:59 -07:00
Shawn Pearce
e702496e43 Convert memcpy(a,b,20) to hashcpy(a,b).
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.

A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*.  This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.

[jc: Splitted the patch to "master" part, to be followed by a
 patch for merge-recursive.c which is not in "master" yet.

 Fixed the cast in the latter hunk to combine-diff.c which was
 wrong in the original.

 Also converted ones left-over in combine-diff.c, diff-lib.c and
 upload-pack.c ]

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23 13:53:10 -07:00
David Rientjes
a89fccd281 Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length.
Introduces global inline:

	hashcmp(const unsigned char *sha1, const unsigned char *sha2)

Uses memcmp for comparison and returns the result based on the length of
the hash name (a future runtime decision).

Acked-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-17 14:23:53 -07:00
Rene Scharfe
5bb1cda5f7 drop length argument of has_extension
As Fredrik points out the current interface of has_extension() is
potentially confusing.  Its parameters include both a nul-terminated
string and a length-limited string.

This patch drops the length argument, requiring two nul-terminated
strings; all callsites are updated.  I checked that all of them indeed
provide nul-terminated strings.  Filenames need to be nul-terminated
anyway if they are to be passed to open() etc.  The performance penalty
of the additional strlen() is negligible compared to the system calls
which inevitably surround has_extension() calls.

Additionally, change has_extension() to use size_t inside instead of
int, as that is the exact type strlen() returns and memcmp() expects.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-11 16:06:34 -07:00
Rene Scharfe
83a2b841d6 Add has_extension()
The little helper has_extension() documents through its name what we are
trying to do and makes sure we don't forget the underrun check.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-10 14:13:53 -07:00
Peter Eriksen
8e44025925 Use blob_, commit_, tag_, and tree_type throughout.
This replaces occurences of "blob", "commit", "tag", and "tree",
where they're really used as type specifiers, which we already
have defined global constants for.

Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04 00:11:19 -07:00
Nicolas Pitre
d60fc1c864 remove delta-against-self bit
After experimenting with code to add the ability to encode a delta
against part of the deltified file, it turns out that resulting packs
are _bigger_ than when this ability is not used.  The raw delta output
might be smaller, but it doesn't compress as well using gzip with a
negative net saving on average.

Said bit would in fact be more useful to allow for encoding the copying
of chunks larger than 64KB providing more savings with large files.
This will correspond to packs version 3.

While the current code still produces packs version 2, it is made future
proof so pack versions 2 and 3 are accepted.  Any pack version 2 are
compatible with version 3 since the redefined bit was never used before.
When enough time has passed, code to use that bit to produce version 3
packs could be added.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-09 21:06:38 -08:00
Junio C Hamano
7e4a2a8483 avoid asking ?alloc() for zero bytes.
Avoid asking for zero bytes when that change simplifies overall
logic.  Later we would change the wrapper to ask for 1 byte on
platforms that return NULL for zero byte request.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-26 17:23:59 -08:00
Pavel Roskin
6689f08735 An off-by-one bug found by valgrind
Insufficient memory is allocated in index-pack.c to hold the *.idx name.
One more byte should be allocated to hold the terminating 0.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-21 13:00:31 -08:00
Junio C Hamano
84c8d8aec5 Fix packname hash generation.
This changes the generation of hash packfiles have in their names, from
"hash of object names as fed to us" to "hash of object names in the
resulting pack, in the order they appear in the index file".  The new
"git-index-pack" command is taught to output the computed hash value
to its standard output.

With this, we can store downloaded pack in a temporary file without
knowing its final name, run git-index-pack to generate idx for it
while finding out its final name, and then rename the pack and idx to
their final names.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-12 18:32:02 -07:00
Sergey Vlasov
9cf6d3357a Add git-index-pack utility
git-index-pack builds a pack index file for an existing packed
archive.  With this utility a packed archive which was transferred
without the corresponding pack index can be added to objects/pack/
without repacking.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-12 18:32:02 -07:00