Commit graph

360 commits

Author SHA1 Message Date
Junio C Hamano 2ac4b4b222 Merge branch 'sp/safecrlf'
* sp/safecrlf:
  safecrlf: Add mechanism to warn about irreversible crlf conversions
2008-02-16 17:59:20 -08:00
Junio C Hamano d5558581d2 Merge branch 'maint'
* maint:
  commit: discard index after setting up partial commit
  filter-branch: handle filenames that need quoting
  diff: Fix miscounting of --check output
  hg-to-git: fix parent analysis
  mailinfo: feed only one line to handle_filter() for QP input
  diff.c: add "const" qualifier to "char *cmd" member of "struct ll_diff_driver"
  Add "const" qualifier to "char *excludes_file".
  Add "const" qualifier to "char *editor_program".
  Add "const" qualifier to "char *pager_program".
  config: add 'git_config_string' to refactor string config variables.
  diff.c: remove useless check for value != NULL
  fast-import: check return value from unpack_entry()
  Validate nicknames of remote branches to prohibit confusing ones
  diff.c: replace a 'strdup' with 'xstrdup'.
  diff.c: fixup garding of config parser from value=NULL
2008-02-16 00:20:37 -08:00
Junio C Hamano 0ef617f4b6 diff: Fix miscounting of --check output
c1795bb (Unify whitespace checking) incorrectly made the
checking function return without incrementing the line numbers
when there is no whitespace problem is found on a '+' line.

This resurrects the earlier behaviour.

Noticed and reported by Jay Soffian.  The test script was stolen
from Jay's independent fix.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-15 23:06:57 -08:00
Christian Couder b20a60d0c0 diff.c: add "const" qualifier to "char *cmd" member of "struct ll_diff_driver"
Also use "git_config_string" to simplify code where "cmd" is set.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-15 21:24:54 -08:00
Christian Couder 2c778210f8 diff.c: remove useless check for value != NULL
It is not necessary to check if value != NULL before calling
'parse_lldiff_command' as there is already a check inside this
function.

By the way this patch also improves the existing check inside
'parse_lldiff_command' by using:
	return config_error_nonbool(var);
instead of:
	return error("%s: lacks value", var);

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-15 21:24:53 -08:00
Christian Couder 8ca496e97d diff.c: replace a 'strdup' with 'xstrdup'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-15 14:11:28 -08:00
Junio C Hamano 588071112c diff.c: fixup garding of config parser from value=NULL
Christian Couder noticed that there still were a handcrafted error()
call that we should have converted to config_error_nonbool() where
parse_lldiff_command() parses the configuration file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-15 09:37:54 -08:00
Junio C Hamano e0197c9aae Merge branch 'lt/in-core-index'
* lt/in-core-index:
  lazy index hashing
  Create pathname-based hash-table lookup into index
  read-cache.c: introduce is_racy_timestamp() helper
  read-cache.c: fix a couple more CE_REMOVE conversion
  Also use unpack_trees() in do_diff_cache()
  Make run_diff_index() use unpack_trees(), not read_tree()
  Avoid running lstat(2) on the same cache entry.
  index: be careful when handling long names
  Make on-disk index representation separate from in-core one
2008-02-11 16:46:20 -08:00
Junio C Hamano 64f30e948b diff.c: guard config parser from value=NULL
diff.external, diff.*.command, diff.color.*, color.diff.* and
diff.*.funcname configuration variables expect a string value.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-11 13:11:36 -08:00
Steffen Prohaska 21e5ad50fc safecrlf: Add mechanism to warn about irreversible crlf conversions
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout.  A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git.  For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.

If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes.  Right
after committing you still have the original file in your work
tree and this file is not yet corrupted.  You can explicitly tell
git that this file is binary and git will handle the file
appropriately.

Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished.  In both cases CRLFs are removed
in an irreversible way.  For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.

This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert.  The
mechanism is controlled by the variable core.safecrlf, with the
following values:

 - false: disable safecrlf mechanism
 - warn: warn about irreversible conversions
 - true: refuse irreversible conversions

The default is to warn.  Users are only affected by this default
if core.autocrlf is set.  But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.

The safecrlf mechanism's details depend on the git command.  The
general principles when safecrlf is active (not false) are:

 - we warn/error out if files in the work tree can modified in an
   irreversible way without giving the user a chance to backup the
   original file.

 - for read-only operations that do not modify files in the work tree
   we do not not print annoying warnings.

There are exceptions.  Even though...

 - "git add" itself does not touch the files in the work tree, the
   next checkout would, so the safety triggers;

 - "git apply" to update a text file with a patch does touch the files
   in the work tree, but the operation is about text files and CRLF
   conversion is about fixing the line ending inconsistencies, so the
   safety does not trigger;

 - "git diff" itself does not touch the files in the work tree, it is
   often run to inspect the changes you intend to next "git add".  To
   catch potential problems early, safety triggers.

The concept of a safety check was originally proposed in a similar
way by Linus Torvalds.  Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-02-06 13:07:28 -08:00
Junio C Hamano eadb583134 Avoid running lstat(2) on the same cache entry.
Aside from the lstat(2) done for work tree files, there are
quite many lstat(2) calls in refname dwimming codepath.  This
patch is not about reducing them.

 * It adds a new ce_flag, CE_UPTODATE, that is meant to mark the
   cache entries that record a regular file blob that is up to
   date in the work tree.  If somebody later walks the index and
   wants to see if the work tree has changes, they do not have
   to be checked with lstat(2) again.

 * fill_stat_cache_info() marks the cache entry it just added
   with CE_UPTODATE.  This has the effect of marking the paths
   we write out of the index and lstat(2) immediately as "no
   need to lstat -- we know it is up-to-date", from quite a lot
   fo callers:

    - git-apply --index
    - git-update-index
    - git-checkout-index
    - git-add (uses add_file_to_index())
    - git-commit (ditto)
    - git-mv (ditto)

 * refresh_cache_ent() also marks the cache entry that are clean
   with CE_UPTODATE.

 * write_index is changed not to write CE_UPTODATE out to the
   index file, because CE_UPTODATE is meant to be transient only
   in core.  For the same reason, CE_UPDATE is not written to
   prevent an accident from happening.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-21 12:44:31 -08:00
Jeff King 472ca78077 color unchanged lines as "plain" in "diff --color-words"
These were mistakenly being colored in "meta" color.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-18 01:29:38 -08:00
Bill Lear 9f6fe82233 Correct spelling in diff.c comment
Correct a spelling mistake in a comment.

Signed-off-by: Bill Lear <rael@zopyra.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-16 11:37:58 -08:00
Junio C Hamano 23707811c5 diff: do not chomp hunk-header in the middle of a character
We truncate hunk-header line at 80 bytes, but that 80th byte
could be in the middle of a character, which is bad.  This uses
pick_one_utf8_char() function to make sure we do not cut a character
in the middle.

This assumes that the internal representation of the text is
UTF-8.  This needs to be extended in the future but the optimal
direction has not been decided yet.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-06 22:44:44 -08:00
Jeff King cae6c25a7f diff: remove lazy config loading
There is no point to this. Either:

  1. The program has already loaded git_diff_ui_config, in
     which case this is a noop.
  2. The program didn't, which means it is plumbing that
     does not _want_ git_diff_ui_config to be loaded.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-04 16:22:07 -08:00
Jeff King e467193ff3 diff: load funcname patterns in "basic" config
The funcname patterns influence the "comment" on @@ lines of
the diff. They are safe to use with plumbing since they
don't fundamentally change the meaning of the diff in any
way.

Since all diff users call either diff_ui_config or
diff_basic_config, we can get rid of the lazy reading of the
config.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-04 16:22:01 -08:00
Jeff King 9a1805a872 add a "basic" diff config callback
The diff porcelain uses git_diff_ui_config to set
porcelain-ish config options, like automatically turning on
color. The plumbing specifically avoids calling this
function, since it doesn't want things like automatic color
or rename detection.

However, some diff options should be set for both plumbing
and porcelain. For example, one can still turn on color in
git-diff-files using the --color command line option. This
means we want the color config from color.diff.* (so that
once color is on, we use the user's preferred scheme), but
_not_ the color.diff variable.

We split the diff config into "ui" and "basic", where
"basic" is suitable for use by plumbing (so _most_ things
affecting the output should still go into the "ui" part).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-04 16:05:23 -08:00
Junio C Hamano d56250911f Fix rewrite_diff() name quoting.
This moves the logic to quote two paths (prefix + path) in
C-style introduced in the previous commit from the
dump_quoted_path() in combine-diff.c to quote.c, and uses it to
fix rewrite_diff() that never C-quoted the pathnames correctly.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-26 17:13:36 -08:00
Johannes Schindelin eab9a40b6d Teach diff machinery to display other prefixes than "a/" and "b/"
With the new options "--src-prefix=<prefix>", "--dst-prefix=<prefix>"
and "--no-prefix", you can now control the path prefixes of the diff
machinery.  These used to by hardwired to "a/" for the source prefix
and "b/" for the destination prefix.

Initial patch by Pascal Obry.  Sane option names suggested by Linus.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-20 01:10:39 -08:00
Johannes Schindelin cbe021004f Support config variable diff.external
We had the diff.external variable in the documentation of the config
file since its conception, but failed to respect it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-17 20:49:18 -08:00
Wincent Colaiuta 45e2a4b2b0 Make "diff --check" output match "git apply"
For consistency, make the two tools report whitespace errors in the
same way (the output of "diff --check" has been tweaked to match
that of "git apply").

Note that although the textual content is basically the same only
"git diff --check" provides a colorized version of the problematic
lines; making "git apply" do colorization will require more extensive
changes (figuring out the diff colorization preferences of the user)
and so that will be a subject for another commit.

Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-13 23:43:58 -08:00
Wincent Colaiuta c1795bb08a Unify whitespace checking
This commit unifies three separate places where whitespace checking was
performed:

 - the whitespace checking previously done in builtin-apply.c is
extracted into a function in ws.c

 - the equivalent logic in "git diff" is removed

 - the emit_line_with_ws() function is also removed because that also
rechecks the whitespace, and its functionality is rolled into ws.c

The new function is called check_and_emit_line() and it does two things:
checks a line for whitespace errors and optionally emits it. The checking
is based on lines of content rather than patch lines (in other words, the
caller must strip the leading "+" or "-"); this was suggested by Junio on
the mailing list to allow for a future extension to "git show" to display
whitespace errors in blobs.

At the same time we teach it to report all classes of whitespace errors
found for a given line rather than reporting only the first found error.

Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-13 23:43:58 -08:00
Junio C Hamano da31b358fb diff --check: minor fixups
There is no reason --exit-code and --check-diff must be mutually
exclusive, so assign different bits to different results and allow them
to be returned from the command.  Introduce diff_result_code() to factor
out the common code to decide final status code based on diffopt
settings and use it everywhere.

Update tests to match the above fix.

Turning pager off when "diff --check" is used is a regression.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-13 23:40:27 -08:00
Wincent Colaiuta 62c64895cf "diff --check" should affect exit status
"git diff" has a --check option that can be used to check for whitespace
problems but it only reported by printing warnings to the
console.

Now when the --check option is used we give a non-zero exit status,
making "git diff --check" nicer to use in scripts and hooks.

Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-13 23:05:42 -08:00
Junio C Hamano c279d7e986 xdl_diff: identify call sites.
This inserts a new function xdi_diff() that currently does not
do anything other than calling the underlying xdl_diff() to the
callchain of current callers of xdl_diff() function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-13 23:04:26 -08:00
Wincent Colaiuta 86f8c23685 Fix "diff --check" whitespace detection
"diff --check" would only detect spaces before tabs if a tab was the
last character in the leading indent. Fix that and add a test case to
make sure the bug doesn't regress in the future.

Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-12 11:24:33 -08:00
Junio C Hamano f604652e05 git-diff --numstat -z: make it machine readable
The "-z" format is all about machine parsability, but showing renamed
paths as "common/{a => b}/suffix" makes it impossible.  The scripts would
never have successfully parsed "--numstat -z -M" in the old format.

This fixes the output format in a (hopefully minimally) backward
incompatible way.

 * The output without -z is not changed.  This has given a good way for
   humans to view added and deleted lines separately, and showing the
   path in combined, shorter way would preserve readability.

 * The output with -z is unchanged for paths that do not involve renames.
   Existing scripts that do not pass -M/-C are not affected at all.

 * The output with -z for a renamed path is shown in a format that can
   easily be distinguished from an unrenamed path.

This is based on Jakub Narebski's patch.  Bugs and documentation typos
are mine.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-12 10:59:22 -08:00
Wincent Colaiuta 0ac7903ee3 Use "whitespace" consistently
For consistency, change "white space" and "whitespaces" to
"whitespace", fixing a couple of adjacent grammar problems in the
docs.

Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-12 10:59:22 -08:00
Junio C Hamano 4eb39e9bcc Merge branch 'jc/spht'
* jc/spht:
  Use gitattributes to define per-path whitespace rule
  core.whitespace: documentation updates.
  builtin-apply: teach whitespace_rules
  builtin-apply: rename "whitespace" variables and fix styles
  core.whitespace: add test for diff whitespace error highlighting
  git-diff: complain about >=8 consecutive spaces in initial indent
  War on whitespace: first, a bit of retreat.

Conflicts:

	cache.h
	config.c
	diff.c
2007-12-09 01:23:48 -08:00
Junio C Hamano cf1b7869f0 Use gitattributes to define per-path whitespace rule
The `core.whitespace` configuration variable allows you to define what
`diff` and `apply` should consider whitespace errors for all paths in
the project (See gitlink:git-config[1]).  This attribute gives you finer
control per path.

For example, if you have these in the .gitattributes:

    frotz   whitespace
    nitfol  -whitespace
    xyzzy   whitespace=-trailing

all types of whitespace problems known to git are noticed in path 'frotz'
(i.e. diff shows them in diff.whitespace color, and apply warns about
them), no whitespace problem is noticed in path 'nitfol', and the
default types of whitespace problems except "trailing whitespace" are
noticed for path 'xyzzy'.  A project with mixed Python and C might want
to have:

    *.c    whitespace
    *.py   whitespace=-indent-with-non-tab

in its toplevel .gitattributes file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-06 00:45:30 -08:00
Junio C Hamano 0f6f5a4022 git config --get-colorbool
This adds an option to help scripts find out color settings from
the configuration file.

    git config --get-colorbool color.diff

inspects color.diff variable, and exits with status 0 (i.e. success) if
color is to be used.  It exits with status 1 otherwise.

If a script wants "true"/"false" answer to the standard output of the
command, it can pass an additional boolean parameter to its command
line, telling if its standard output is a terminal, like this:

    git config --get-colorbool color.diff true

When called like this, the command outputs "true" to its standard output
if color is to be used (i.e. "color.diff" says "always", "auto", or
"true"), and "false" otherwise.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-05 17:57:11 -08:00
Junio C Hamano f8b6809d52 Fix "quote" misconversion for rewrite diff output.
663af3422a (Full rework of
quote_c_style and write_name_quoted.) mistakenly used puts()
when writing out a fixed string when it did not want to add a
terminating LF.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-21 23:06:44 -08:00
Pierre Habouzit d054680c7d Reorder diff_opt_parse options more logically per topics.
This is a line reordering patch _only_.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-11 16:54:15 -08:00
Pierre Habouzit 8f67f8aefb Make the diff_options bitfields be an unsigned with explicit masks.
reverse_diff was a bit-value in disguise, it's merged in the flags now.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-11 16:54:15 -08:00
Junio C Hamano 459fa6d0fe git-diff: complain about >=8 consecutive spaces in initial indent
This introduces a new whitespace error type, "indent-with-non-tab".
The error is about starting a line with 8 or more SP, instead of
indenting it with a HT.

This is not enabled by default, as some projects employ an
indenting policy to use only SPs and no HTs.

The kernel folks and git contributors may want to enable this
detection with:

	[core]
		whitespace = indent-with-non-tab

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-02 17:58:09 -07:00
Junio C Hamano a9cc857ada War on whitespace: first, a bit of retreat.
This introduces core.whitespace configuration variable that lets
you specify the definition of "whitespace error".

Currently there are two kinds of whitespace errors defined:

 * trailing-space: trailing whitespaces at the end of the line.

 * space-before-tab: a SP appears immediately before HT in the
   indent part of the line.

You can specify the desired types of errors to be detected by
listing their names (unique abbreviations are accepted)
separated by comma.  By default, these two errors are always
detected, as that is the traditional behaviour.  You can disable
detection of a particular type of error by prefixing a '-' in
front of the name of the error, like this:

	[core]
		whitespace = -trailing-space

This patch teaches the code to output colored diff with
DIFF_WHITESPACE color to highlight the detected whitespace
errors to honor the new configuration.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-02 17:58:08 -07:00
Junio C Hamano 4340a813d0 Merge branch 'js/forkexec'
* js/forkexec:
  Use the asyncronous function infrastructure to run the content filter.
  Avoid a dup2(2) in apply_filter() - start_command() can do it for us.
  t0021-conversion.sh: Test that the clean filter really cleans content.
  upload-pack: Run rev-list in an asynchronous function.
  upload-pack: Move the revision walker into a separate function.
  Use the asyncronous function infrastructure in builtin-fetch-pack.c.
  Add infrastructure to run a function asynchronously.
  upload-pack: Use start_command() to run pack-objects in create_pack_file().
  Have start_command() create a pipe to read the stderr of the child.
  Use start_comand() in builtin-fetch-pack.c instead of explicit fork/exec.
  Use run_command() to spawn external diff programs instead of fork/exec.
  Use start_command() to run content filters instead of explicit fork/exec.
  Use start_command() in git_connect() instead of explicit fork/exec.
  Change git_connect() to return a struct child_process instead of a pid_t.

Conflicts:

	builtin-fetch-pack.c
2007-11-01 13:47:47 -07:00
Linus Torvalds 644797119d copy vs rename detection: avoid unnecessary O(n*m) loops
The core rename detection had some rather stupid code to check if a
pathname was used by a later modification or rename, which basically
walked the whole pathname space for all renames for each rename, in
order to tell whether it was a pure rename (no remaining users) or
should be considered a copy (other users of the source file remaining).

That's really silly, since we can just keep a count of users around, and
replace all those complex and expensive loops with just testing that
simple counter (but this all depends on the previous commit that shared
the diff_filespec data structure by using a separate reference count).

Note that the reference count is not the same as the rename count: they
behave otherwise rather similarly, but the reference count is tied to
the allocation (and decremented at de-allocation, so that when it turns
zero we can get rid of the memory), while the rename count is tied to
the renames and is decremented when we find a rename (so that when it
turns zero we know that it was a rename, not a copy).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds 9fb88419ba Ref-count the filespecs used by diffcore
Rather than copy the filespecs when introducing new versions of them
(for rename or copy detection), use a refcount and increment the count
when reusing the diff_filespec.

This avoids unnecessary allocations, but the real reason behind this is
a future enhancement: we will want to track shared data across the
copy/rename detection.  In order to efficiently notice when a filespec
is used by a rename, the rename machinery wants to keep track of a
rename usage count which is shared across all different users of the
filespec.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:05 -07:00
René Scharfe c32f749fec Correct some sizeof(size_t) != sizeof(unsigned long) typing errors
Fix size_t vs. unsigned long pointer mismatch warnings introduced
with the addition of strbuf_detach().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 00:00:40 -04:00
Johannes Sixt d5535ec75c Use run_command() to spawn external diff programs instead of fork/exec.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:40 -04:00
Junio C Hamano 66d4035e10 Merge branch 'ph/strbuf'
* ph/strbuf: (44 commits)
  Make read_patch_file work on a strbuf.
  strbuf_read_file enhancement, and use it.
  strbuf change: be sure ->buf is never ever NULL.
  double free in builtin-update-index.c
  Clean up stripspace a bit, use strbuf even more.
  Add strbuf_read_file().
  rerere: Fix use of an empty strbuf.buf
  Small cache_tree_write refactor.
  Make builtin-rerere use of strbuf nicer and more efficient.
  Add strbuf_cmp.
  strbuf_setlen(): do not barf on setting length of an empty buffer to 0
  sq_quote_argv and add_to_string rework with strbuf's.
  Full rework of quote_c_style and write_name_quoted.
  Rework unquote_c_style to work on a strbuf.
  strbuf API additions and enhancements.
  nfv?asprintf are broken without va_copy, workaround them.
  Fix the expansion pattern of the pseudo-static path buffer.
  builtin-for-each-ref.c::copy_name() - do not overstep the buffer.
  builtin-apply.c: fix a tiny leak introduced during xmemdupz() conversion.
  Use xmemdupz() in many places.
  ...
2007-10-03 03:06:02 -07:00
Junio C Hamano 8ae92e6389 rename diff_free_filespec_data_large() to diff_free_filespec_blob()
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-02 21:02:09 -07:00
Jeff King eede7b7d11 diffcore-rename: cache file deltas
We find rename candidates by computing a fingerprint hash of
each file, and then comparing those fingerprints. There are
inherently O(n^2) comparisons, so it pays in CPU time to
hoist the (rather expensive) computation of the fingerprint
out of that loop (or to cache it once we have computed it once).

Previously, we didn't keep the filespec information around
because then we had the potential to consume a great deal of
memory. However, instead of keeping all of the filespec
data, we can instead just keep the fingerprint.

This patch implements and uses diff_free_filespec_data_large
to accomplish that goal. We also have to change
estimate_similarity not to needlessly repopulate the
filespec data when we already have the hash.

Practical tests showed 4.5x speedup for a 10% memory usage
increase.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-02 21:02:03 -07:00
Pierre Habouzit b315c5c081 strbuf change: be sure ->buf is never ever NULL.
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.

strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.

as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.

API changes:
  * strbuf_setlen now always works, so just make strbuf_reset a convenience
    macro.
  * strbuf_detatch takes a size_t* optional argument (meaning it can be
    NULL) to copy the buffer's len, as it was needed for this refactor to
    make the code more readable, and working like the callers.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-29 02:13:33 -07:00
Pierre Habouzit 663af3422a Full rework of quote_c_style and write_name_quoted.
* quote_c_style works on a strbuf instead of a wild buffer.
* quote_c_style is now clever enough to not add double quotes if not needed.

* write_name_quoted inherits those advantages, but also take a different
  set of arguments. Now instead of asking for quotes or not, you pass a
  "terminator". If it's \0 then we assume you don't want to escape, else C
  escaping is performed. In any case, the terminator is also appended to the
  stream. It also no longer takes the prefix/prefix_len arguments, as it's
  seldomly used, and makes some optimizations harder.

* write_name_quotedpfx is created to work like write_name_quoted and take
  the prefix/prefix_len arguments.

Thanks to those API changes, diff.c has somehow lost weight, thanks to the
removal of functions that were wrappers around the old write_name_quoted
trying to give it a semantics like the new one, but performing a lot of
allocations for this goal. Now we always write directly to the stream, no
intermediate allocation is performed.

As a side effect of the refactor in builtin-apply.c, the length of the bar
graphs in diffstats are not affected anymore by the fact that the path was
clipped.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
2007-09-20 23:45:49 -07:00
Pierre Habouzit 182af8343c Use xmemdupz() in many places.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18 17:42:17 -07:00
Junio C Hamano 39bd2eb56a Merge branch 'master' into ph/strbuf
* master: (94 commits)
  Fixed update-hook example allow-users format.
  Documentation/git-svn: updated design philosophy notes
  t/t4014: test "am -3" with mode-only change.
  git-commit.sh: Shell script cleanup
  preserve executable bits in zip archives
  Fix lapsus in builtin-apply.c
  git-push: documentation and tests for pushing only branches
  git-svnimport: Use separate arguments in the pipe for git-rev-parse
  contrib/fast-import: add perl version of simple example
  contrib/fast-import: add simple shell example
  rev-list --bisect: Bisection "distance" clean up.
  rev-list --bisect: Move some bisection code into best_bisection.
  rev-list --bisect: Move finding bisection into do_find_bisection.
  Document ls-files --with-tree=<tree-ish>
  git-commit: partial commit of paths only removed from the index
  git-commit: Allow partial commit of file removal.
  send-email: make message-id generation a bit more robust
  git-apply: fix whitespace stripping
  git-gui: Disable native platform text selection in "lists"
  apply --index-info: fall back to current index for mode changes
  ...
2007-09-18 17:42:15 -07:00
Pierre Habouzit ba3ed09728 Now that cache.h needs strbuf.h, remove useless includes.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 17:30:03 -07:00
Pierre Habouzit 5ecd293d14 Rewrite convert_to_{git,working_tree} to use strbuf's.
* Now, those functions take an "out" strbuf argument, where they store their
  result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
  call them this way:
    convert_to_git(path, sb->buf, sb->len, sb);
  When doable, conversions are done in place for real, else the strbuf
  content is just replaced with the new one, transparentely for the caller.

If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:

    int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
    {
        int ret = 0;
        ret |= filter1(...., src, len, sb);
        if (ret) {
            src = sb->buf;
            len = sb->len;
        }
        ret |= filter2(...., src, len, sb);
        if (ret) {
            src = sb->buf;
            len = sb->len;
        }
        ....
        return ret | filtern(..., src, len, sb);
    }

That's why subfilters the convert_to_* functions called were also rewritten
to work this way.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 17:30:03 -07:00