Commit graph

213 commits

Author SHA1 Message Date
Jeff King c7ab0ba340 avoid sprintf and strcpy with flex arrays
When we are allocating a struct with a FLEX_ARRAY member, we
generally compute the size of the array and then sprintf or
strcpy into it. Normally we could improve a dynamic allocation
like this by using xstrfmt, but it doesn't work here; we
have to account for the size of the rest of the struct.

But we can improve things a bit by storing the length that
we use for the allocation, and then feeding it to xsnprintf
or memcpy, which makes it more obvious that we are not
writing more than the allocated number of bytes.

It would be nice if we had some kind of helper for
allocating generic flex arrays, but it doesn't work that
well:

 - the call signature is a little bit unwieldy:

      d = flex_struct(sizeof(*d), offsetof(d, path), fmt, ...);

   You need offsetof here instead of just writing to the
   end of the base size, because we don't know how the
   struct is packed (partially this is because FLEX_ARRAY
   might not be zero, though we can account for that; but
   the size of the struct may actually be rounded up for
   alignment, and we can't know that).

 - some sites do clever things, like over-allocating because
   they know they will write larger things into the buffer
   later (e.g., struct packed_git here).

So we're better off to just write out each allocation (or
add type-specific helpers, though many of these are one-off
allocations anyway).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Jeff King d59f765ac9 use sha1_to_hex_r() instead of strcpy
Before sha1_to_hex_r() existed, a simple way to get hex
sha1 into a buffer was with:

  strcpy(buf, sha1_to_hex(sha1));

This isn't wrong (assuming the buf is 41 characters), but it
makes auditing the code base for bad strcpy() calls harder,
as these become false positives.

Let's convert them to sha1_to_hex_r(), and likewise for
some calls to find_unique_abbrev(). While we're here, we'll
double-check that all of the buffers are correctly sized,
and use the more obvious GIT_SHA1_HEXSZ constant.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Junio C Hamano 3adc4ec7b9 Sync with v2.5.4 2015-09-28 19:16:54 -07:00
Junio C Hamano 11a458befc Sync with 2.4.10 2015-09-28 15:33:56 -07:00
Junio C Hamano 6343e2f6f2 Sync with 2.3.10 2015-09-28 15:28:31 -07:00
Jeff King 3efb988098 react to errors in xdi_diff
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later.  This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).

In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper).  But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.

Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.

We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).

There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 14:57:10 -07:00
Jeff King 95a4fb0eac blame: handle --first-parent
The revision.c options-parser will parse "--first-parent"
for us, but the blame code does not actually respect it, as
we simply iterate over the whole list returned by
first_scapegoat(). We can fix this by returning a
truncated parent list.

Note that we could technically also do so by limiting the
return value of num_scapegoats(), but that is less robust.
We would rely on nobody ever looking at the "next" pointer
from the returned list.

Combining "--reverse" with "--first-parent" is more
complicated, and will probably involve cooperation from
revision.c. Since the desired semantics are not even clear,
let's punt on this for now, but explicitly disallow it to
avoid confusing users (this is not really a regression,
since it did something nonsensical before).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-16 09:59:05 -07:00
Jeff King add00ba2de date: make "local" orthogonal to date format
Most of our "--date" modes are about the format of the date:
which items we show and in what order. But "--date=local" is
a bit of an oddball. It means "show the date in the normal
format, but using the local timezone". The timezone we use
is orthogonal to the actual format, and there is no reason
we could not have "localized iso8601", etc.

This patch adds a "local" boolean field to "struct
date_mode", and drops the DATE_LOCAL element from the
date_mode_type enum (it's now just DATE_NORMAL plus
local=1). The new feature is accessible to users by adding
"-local" to any date mode (e.g., "iso-local"), and we retain
"local" as an alias for "default-local" for backwards
compatibility.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-03 15:45:26 -07:00
Nguyễn Thái Ngọc Duy a62bc310bf blame: remove obsolete comment
That "someday" in the comment happened two years later in
b65982b (Optimize "diff-index --cached" using cache-tree - 2009-05-20)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-25 09:41:44 -07:00
Jeff King f932729cc7 memoize common git-path "constant" files
One of the most common uses of git_path() is to pass a
constant, like git_path("MERGE_MSG"). This has two
drawbacks:

  1. The return value is a static buffer, and the lifetime
     is dependent on other calls to git_path, etc.

  2. There's no compile-time checking of the pathname. This
     is OK for a one-off (after all, we have to spell it
     correctly at least once), but many of these constant
     strings appear throughout the code.

This patch introduces a series of functions to "memoize"
these strings, which are essentially globals for the
lifetime of the program. We compute the value once, take
ownership of the buffer, and return the cached value for
subsequent calls.  cache.h provides a helper macro for
defining these functions as one-liners, and defines a few
common ones for global use.

Using a macro is a little bit gross, but it does nicely
document the purpose of the functions. If we need to touch
them all later (e.g., because we learned how to change the
git_dir variable at runtime, and need to invalidate all of
the stored values), it will be much easier to have the
complete list.

Note that the shared-global functions have separate, manual
declarations. We could do something clever with the macros
(e.g., expand it to a declaration in some places, and a
declaration _and_ a definition in path.c). But there aren't
that many, and it's probably better to stay away from
too-magical macros.

Likewise, if we abandon the C preprocessor in favor of
generating these with a script, we could get much fancier.
E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz".
But the small amount of saved typing is probably not worth
the resulting confusion to readers who want to grep for the
function's definition.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 15:37:14 -07:00
Junio C Hamano d939af12bd Merge branch 'jk/date-mode-format'
Teach "git log" and friends a new "--date=format:..." option to
format timestamps using system's strftime(3).

* jk/date-mode-format:
  strbuf: make strbuf_addftime more robust
  introduce "format" date-mode
  convert "enum date_mode" into a struct
  show-branch: use DATE_RELATIVE instead of magic number
2015-08-03 11:01:27 -07:00
Junio C Hamano be9cb560e3 Merge branch 'mh/init-delete-refs-api'
Clean up refs API and make "git clone" less intimate with the
implementation detail.

* mh/init-delete-refs-api:
  delete_ref(): use the usual convention for old_sha1
  cmd_update_ref(): make logic more straightforward
  update_ref(): don't read old reference value before delete
  check_branch_commit(): make first parameter const
  refs.h: add some parameter names to function declarations
  refs: move the remaining ref module declarations to refs.h
  initial_ref_transaction_commit(): check for ref D/F conflicts
  initial_ref_transaction_commit(): check for duplicate refs
  refs: remove some functions from the module's public interface
  initial_ref_transaction_commit(): function for initial ref creation
  repack_without_refs(): make function private
  prune_refs(): use delete_refs()
  prune_remote(): use delete_refs()
  delete_refs(): bail early if the packed-refs file cannot be rewritten
  delete_refs(): make error message more generic
  delete_refs(): new function for the refs API
  delete_ref(): handle special case more explicitly
  remove_branches(): remove temporary
  delete_ref(): move declaration to refs.h
2015-08-03 11:01:17 -07:00
Jeff King aa1462cc3d introduce "format" date-mode
This feeds the format directly to strftime. Besides being a
little more flexible, the main advantage is that your system
strftime may know more about your locale's preferred format
(e.g., how to spell the days of the week).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-29 11:39:10 -07:00
Jeff King a5481a6c94 convert "enum date_mode" into a struct
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.

Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor.  However, the tricky case is where we use the
enum labels as constants, like:

  show_date(t, tz, DATE_NORMAL);

Ideally we could say:

  show_date(t, tz, &{ DATE_NORMAL });

but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:

  1. Manually add a "struct date_mode d = { DATE_NORMAL }"
     definition to each caller, and pass "&d". This makes
     the callers uglier, because they sometimes do not even
     have their own scope (e.g., they are inside a switch
     statement).

  2. Provide a pre-made global "date_normal" struct that can
     be passed by address. We'd also need "date_rfc2822",
     "date_iso8601", and so forth. But at least the ugliness
     is defined in one place.

  3. Provide a wrapper that generates the correct struct on
     the fly. The big downside is that we end up pointing to
     a single global, which makes our wrapper non-reentrant.
     But show_date is already not reentrant, so it does not
     matter.

This patch implements 3, along with a minor macro to keep
the size of the callers sane.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-29 11:39:07 -07:00
Junio C Hamano 20d16da5ca Merge branch 'qn/blame-show-email'
"git blame" learned blame.showEmail configuration variable.

* qn/blame-show-email:
  blame: add blame.showEmail configuration
2015-06-24 12:21:41 -07:00
Michael Haggerty fb58c8d507 refs: move the remaining ref module declarations to refs.h
Some functions from the refs module were still declared in cache.h.
Move them to refs.h.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 13:17:12 -07:00
Junio C Hamano 6588f82ff6 Merge branch 'ah/usage-strings' into maint
A few usage string updates.

* ah/usage-strings:
  blame, log: format usage strings similarly to those in documentation
2015-06-16 14:33:50 -07:00
Junio C Hamano dfb67594e9 Merge branch 'rs/janitorial' into maint
Code clean-up.

* rs/janitorial:
  dir: remove unused variable sb
  clean: remove unused variable buf
  use file_exists() to check if a file exists in the worktree
2015-06-16 14:33:47 -07:00
Junio C Hamano 1d93ec9397 Merge branch 'tb/blame-resurrect-convert-to-git' into maint
Some time ago, "git blame" (incorrectly) lost the convert_to_git()
call when synthesizing a fake "tip" commit that represents the
state in the working tree, which broke folks who record the history
with LF line ending to make their project portabile across
platforms while terminating lines in their working tree files with
CRLF for their platform.

* tb/blame-resurrect-convert-to-git:
  blame: CRLF in the working tree and LF in the repo
2015-06-05 12:00:06 -07:00
Quentin Neill 8b504db309 blame: add blame.showEmail configuration
Complement existing --show-email option with fallback
configuration variable, with tests.

Signed-off-by: Quentin Neill <quentin.neill@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01 15:50:43 -07:00
Junio C Hamano 4ba5bb5531 Merge branch 'rs/janitorial'
Code clean-up.

* rs/janitorial:
  dir: remove unused variable sb
  clean: remove unused variable buf
  use file_exists() to check if a file exists in the worktree
2015-06-01 12:45:15 -07:00
Junio C Hamano 6e0ac8e45f Merge branch 'ah/usage-strings'
A few usage string updates.

* ah/usage-strings:
  blame, log: format usage strings similarly to those in documentation
2015-06-01 12:45:10 -07:00
René Scharfe dbe44faadb use file_exists() to check if a file exists in the worktree
Call file_exists() instead of open-coding it.  That's shorter, simpler
and the intent becomes clearer.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 13:49:10 -07:00
Junio C Hamano 5fa9e4c4f1 Merge branch 'tb/blame-resurrect-convert-to-git'
Some time ago, "git blame" (incorrectly) lost the convert_to_git()
call when synthesizing a fake "tip" commit that represents the
state in the working tree, which broke folks who record the history
with LF line ending to make their project portabile across
platforms while terminating lines in their working tree files with
CRLF for their platform.

* tb/blame-resurrect-convert-to-git:
  blame: CRLF in the working tree and LF in the repo
2015-05-11 14:23:52 -07:00
Alex Henrie ce41720cad blame, log: format usage strings similarly to those in documentation
Earlier, 9c9b4f2f (standardize usage info string format, 2015-01-13)
tried to make usage-string in line with the documentation by

    - Placing angle brackets around fill-in-the-blank parameters
    - Putting dashes in multiword parameter names
    - Adding spaces to [-f|--foobar] to make [-f | --foobar]
    - Replacing <foobar>* with [<foobar>...]

but it missed a few places.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-03 16:55:26 -07:00
Torsten Bögershausen 4bf256d67a blame: CRLF in the working tree and LF in the repo
A typical setup under Windows is to set core.eol to CRLF, and text
files are marked as "text" in .gitattributes, or core.autocrlf is
set to true.

After 4d4813a5 "git blame" no longer works as expected for such a
set-up.  Every line is annotated as "Not Committed Yet", even though
the working directory is clean.  This is because the commit removed
the conversion in blame.c for all files, with or without CRLF in the
repo.

Having files with CRLF in the repo and core.autocrlf=input is a
temporary situation, and the files, if committed as is, will be
normalized in the repo, which _will_ be a notable change.  Blaming
them with "Not Committed Yet" is the right result.  Revert commit
4d4813a5 which was a misguided attempt to "solve" a non-problem.

Add two test cases in t8003 to verify the correct CRLF conversion.

Suggested-By: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-03 11:00:10 -07:00
Junio C Hamano 007f7f6e54 Merge branch 'es/blame-commit-info-fix' into maint
"git blame" died, trying to free an uninitialized piece of memory.

* es/blame-commit-info-fix:
  builtin/blame: destroy initialized commit_info only
2015-03-05 13:13:12 -08:00
Junio C Hamano faf723a631 Merge branch 'jk/blame-commit-label' into maint
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".

* jk/blame-commit-label:
  blame.c: fix garbled error message
  use xstrdup_or_null to replace ternary conditionals
  builtin/commit.c: use xstrdup_or_null instead of envdup
  builtin/apply.c: use xstrdup_or_null instead of null_strdup
  git-compat-util: add xstrdup_or_null helper
2015-02-24 22:09:54 -08:00
Junio C Hamano 073bb8ebb8 Merge branch 'es/blame-commit-info-fix'
"git blame" died, trying to free an uninitialized piece of memory.

* es/blame-commit-info-fix:
  builtin/blame: destroy initialized commit_info only
2015-02-22 12:28:24 -08:00
Junio C Hamano bb831db677 Merge branch 'ah/usage-strings'
* ah/usage-strings:
  standardize usage info string format
2015-02-11 13:44:20 -08:00
Junio C Hamano 092c4be7f5 Merge branch 'jk/blame-commit-label'
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".

* jk/blame-commit-label:
  blame.c: fix garbled error message
  use xstrdup_or_null to replace ternary conditionals
  builtin/commit.c: use xstrdup_or_null instead of envdup
  builtin/apply.c: use xstrdup_or_null instead of null_strdup
  git-compat-util: add xstrdup_or_null helper
2015-02-11 13:39:50 -08:00
Eric Sunshine e60059276b builtin/blame: destroy initialized commit_info only
Since ea02ffa3 (mailmap: simplify map_user() interface, 2013-01-05),
find_alignment() has been invoking commit_info_destroy() on an
uninitialized auto 'struct commit_info' (when METAINFO_SHOWN is not
set). commit_info_destroy() calls strbuf_release() for each
'commit_info' strbuf member, which randomly invokes free() on
whatever random stack value happens to reside in strbuf.buf, thus
leading to periodic crashes.

Reported-by: Dilyan Palauzov <dilyan.palauzov@aegee.org>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-10 10:31:48 -08:00
Alex Henrie 9c9b4f2f8b standardize usage info string format
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:

- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-14 09:32:04 -08:00
Lukas Fleischer a46442f167 blame.c: fix garbled error message
The helper functions prepare_final() and prepare_initial() return a
pointer to a string that is a member of an object in the revs->pending
array. This array is later rebuilt when running prepare_revision_walk()
which potentially transforms the pointer target into a bogus string. Fix
this by maintaining a copy of the original string.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-13 10:05:53 -08:00
Ronnie Sahlberg 7695d118e5 refs.c: change resolve_ref_unsafe reading argument to be a flags field
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading).  Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.

While at it, swap two of the arguments in the function to put output
arguments at the end.  As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.

Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:24 -07:00
Junio C Hamano ceeacc501b Merge branch 'bb/date-iso-strict'
"log --date=iso" uses a slight variant of ISO 8601 format that is
made more human readable.  A new "--date=iso-strict" option gives
datetime output that is more strictly conformant.

* bb/date-iso-strict:
  pretty: provide a strict ISO 8601 date format
2014-09-19 11:38:32 -07:00
Junio C Hamano 929df991c2 Merge branch 'sb/blame-msg-i18n'
* sb/blame-msg-i18n:
  builtin/blame.c: add translation to warning about failed revision walk
2014-09-09 12:54:03 -07:00
Beat Bolli 466fb6742d pretty: provide a strict ISO 8601 date format
Git's "ISO" date format does not really conform to the ISO 8601
standard due to small differences, and it cannot be parsed by ISO
8601-only parsers, e.g. those of XML toolchains.

The output from "--date=iso" deviates from ISO 8601 in these ways:

  - a space instead of the `T` date/time delimiter
  - a space between time and time zone
  - no colon between hours and minutes of the time zone

Add a strict ISO 8601 date format for displaying committer and
author dates.  Use the '%aI' and '%cI' format specifiers and add
'--date=iso-strict' or '--date=iso8601-strict' date format names.

See http://thread.gmane.org/gmane.comp.version-control.git/255879 and
http://thread.gmane.org/gmane.comp.version-control.git/52414/focus=52585
for discussion.

Signed-off-by: Beat Bolli <bbolli@ewanet.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 12:37:02 -07:00
Stefan Beller 201087422d builtin/blame.c: add translation to warning about failed revision walk
Signed-off-by: Stefan Beller <stefanbeller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-12 11:01:44 -07:00
Jeff King fe24d396e1 move setting of object->type to alloc_* functions
The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-28 10:14:33 -07:00
Junio C Hamano 12621cb222 Merge branch 'rs/code-cleaning'
* rs/code-cleaning:
  remote-testsvn: use internal argv_array of struct child_process in cmd_import()
  bundle: use internal argv_array of struct child_process in create_bundle()
  fast-import: use hashcmp() for SHA1 hash comparison
  transport: simplify fetch_objs_via_rsync() using argv_array
  run-command: use internal argv_array of struct child_process in run_hook_ve()
  use commit_list_count() to count the members of commit_lists
  strbuf: use strbuf_addstr() for adding C strings
2014-07-22 10:59:37 -07:00
Junio C Hamano 10b944b37b Merge branch 'jk/alloc-commit-id'
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.

* jk/alloc-commit-id:
  diff-tree: avoid lookup_unknown_object
  object_as_type: set commit index
  alloc: factor out commit index
  add object_as_type helper for casting objects
  parse_object_buffer: do not set object type
  move setting of object->type to alloc_* functions
  alloc: write out allocator definitions
  alloc.c: remove the alloc_raw_commit_node() function
2014-07-22 10:59:25 -07:00
Junio C Hamano 9ab0882255 Merge branch 'maint'
* maint:
  use xmemdupz() to allocate copies of strings given by start and length
  use xcalloc() to allocate zero-initialized memory
2014-07-21 12:35:39 -07:00
René Scharfe 5c0b13f85a use xmemdupz() to allocate copies of strings given by start and length
Use xmemdupz() to allocate the memory, copy the data and make sure to
NUL-terminate the result, all in one step.  The resulting code is
shorter, doesn't contain the constants 1 and '\0', and avoids
duplicating function parameters.

For blame, the last copied byte (o->file.ptr[o->file.size]) is always
set to NUL by fake_working_tree_commit() or read_sha1_file(), so no
information is lost by the conversion to using xmemdupz().

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-21 10:37:02 -07:00
René Scharfe 4bbaa1eb6f use commit_list_count() to count the members of commit_lists
Call commit_list_count() instead of open-coding it repeatedly.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-17 13:36:25 -07:00
Junio C Hamano 788cef81d4 Merge branch 'nd/split-index'
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.

* nd/split-index: (32 commits)
  t1700: new tests for split-index mode
  t2104: make sure split index mode is off for the version test
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  read-tree: note about dropping split-index mode or index version
  read-tree: force split-index mode off on --index-output
  rev-parse: add --shared-index-path to get shared index path
  update-index --split-index: do not split if $GIT_DIR is read only
  update-index: new options to enable/disable split index mode
  split-index: strip pathname of on-disk replaced entries
  split-index: do not invalidate cache-tree at read time
  split-index: the reading part
  split-index: the writing part
  read-cache: mark updated entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark new entries for split index
  read-cache: split-index mode
  read-cache: save index SHA-1 after reading
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  cache-tree: mark istate->cache_changed on cache tree update
  ...
2014-07-16 11:25:40 -07:00
Junio C Hamano 5c18fde0d9 Merge branch 'jk/commit-buffer-length' into maint
A handful of code paths had to read the commit object more than
once when showing header fields that are usually not parsed.  The
internal data structure to keep track of the contents of the commit
object has been updated to reduce the need for this double-reading,
and to allow the caller find the length of the object.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-16 11:16:38 -07:00
Jeff King d36f51c13b move setting of object->type to alloc_* functions
The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:05 -07:00
Junio C Hamano 8061ae8b46 Merge branch 'jk/commit-buffer-length'
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths.  Use this to optimize the code paths to
validate GPG signatures in commit objects.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-02 12:53:02 -07:00
Junio C Hamano 8d87e35bab Merge branch 'rs/blame-refactor'
* rs/blame-refactor:
  blame: simplify prepare_lines()
  blame: factor out get_next_line()
2014-06-25 12:23:36 -07:00
Junio C Hamano 4d27d8cbc4 Merge branch 'bc/blame-crlf-test' into maint
"git blame" assigned the blame to the copy in the working-tree if
the repository is set to core.autocrlf=input and the file used CRLF
line endings.

* bc/blame-crlf-test:
  blame: correctly handle files regardless of autocrlf
2014-06-25 11:46:45 -07:00
René Scharfe 60d85e110b blame: simplify prepare_lines()
Changing get_next_line() to return the end pointer instead of NULL in
case no newline character is found treats allows us to treat complete
and incomplete lines the same, simplifying the code.  Switching to
counting lines instead of EOLs allows us to start counting at the
first character, instead of having to call get_next_line() first.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 14:52:50 -07:00
René Scharfe 29aa0b2061 blame: factor out get_next_line()
Move the code for finding the start of the next line into a helper
function in order to reduce duplication.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 14:52:16 -07:00
Jeff King 8597ea3afe commit: record buffer length in cache
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:09:38 -07:00
Jeff King b66103c3ba convert logmsg_reencode to get_commit_buffer
Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.

We return either:

  1. The original commit->buffer

  2. A newly allocated buffer from read_sha1_file

  3. A reencoded buffer (based on either 1 or 2 above).

while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), which is unlikely to change, and because this makes the
interface much simpler, it's a reasonable bending of the
rules.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Jeff King 66c2827ea4 provide a helper to set the commit buffer
Right now this is just a one-liner, but abstracting it will
make it easier to change later.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Nguyễn Thái Ngọc Duy a5400efe29 cache-tree: mark istate->cache_changed on cache tree invalidation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Jeff King b000c59b0c logmsg_reencode: return const buffer
The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const.  We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12 10:29:43 -07:00
Jeff King 10322a0aaf do not create "struct commit" with xcalloc
In both blame and merge-recursive, we sometimes create a
"fake" commit struct for convenience (e.g., to represent the
HEAD state as if we would commit it). By allocating
ourselves rather than using alloc_commit_node, we do not
properly set the "index" field of the commit. This can
produce subtle bugs if we then use commit-slab on the
resulting commit, as we will share the "0" index with
another commit.

We can fix this by using alloc_commit_node() to allocate.
Note that we cannot free the result, as it is part of our
commit allocator. However, both cases were already leaking
the allocated commit anyway, so there's nothing to fix up.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12 10:29:42 -07:00
Junio C Hamano e934c67b66 Merge branch 'bc/blame-crlf-test'
If a file contained CRLF line endings in a repository with
core.autocrlf=input, then blame always marked lines as "Not
Committed Yet", even if they were unmodified.

* bc/blame-crlf-test:
  blame: correctly handle files regardless of autocrlf
2014-06-06 11:26:50 -07:00
Junio C Hamano c7be99ea51 Merge branch 'dk/blame-reorg'
"git blame" has been optimized greatly by reorganising the data
structure that is used to keep track of the work to be done, thanks
to David Karstrup <dak@gnu.org>.

* dk/blame-reorg:
  blame: large-scale performance rewrite
2014-06-06 11:24:44 -07:00
brian m. carlson 4d4813a52f blame: correctly handle files regardless of autocrlf
If a file contained CRLF line endings in a repository with
core.autocrlf=input, then blame always marked lines as "Not
Committed Yet", even if they were unmodified.  Don't attempt to
convert the line endings when creating the fake commit so that blame
works correctly regardless of the autocrlf setting.

Reported-by: Ephrim Khong <dr.khong@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-08 14:43:49 -07:00
David Kastrup 7e6ac6e439 blame: large-scale performance rewrite
The previous implementation used a single sorted linear list of blame
entries for organizing all partial or completed work.  Every subtask had
to scan the whole list, with most entries not being relevant to the
task.  The resulting run-time was quadratic to the number of separate
chunks.

This change gives every subtask its own data to work with.  Subtasks are
organized into "struct origin" chains hanging off particular commits.
Commits are organized into a priority queue, processing them in commit
date order in order to keep most of the work affecting a particular blob
collated even in the presence of an extensive merge history.

For large files with a diversified history, a speedup by a factor of 3
or more is not unusual.

Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-28 14:38:15 -07:00
Jiang Xin dd75553b35 blame: dynamic blame_date_width for different locales
When show date in relative date format for git-blame, the max display
width of datetime is set as the length of the string "Thu Oct 19
16:00:04 2006 -0700" (30 characters long).  But actually the max width
for C locale is only 22 (the length of string "x years, xx months ago").
And for other locale, it maybe smaller.  E.g. For Chinese locale, only
needs a half (16-character width).

Set blame_date_width as the display width of _("4 years, 11 months
ago"), so that translators can make the choice.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-23 00:02:15 -07:00
Jiang Xin bccce0f809 blame: fix broken time_buf paddings in relative timestamp
Command `git blame --date relative` aligns the date field with a
fixed-width (defined by blame_date_width), and if time_str is shorter
than that, it adds spaces for padding.  But there are two bugs in the
following codes:

        time_len = strlen(time_str);
        ...
        memset(time_buf + time_len, ' ', blame_date_width - time_len);

 1. The type of blame_date_width is size_t, which is unsigned.  If
    time_len is greater than blame_date_width, the result of
    "blame_date_width - time_len" will never be a negative number, but a
    really big positive number, and will cause memory overwrite.

    This bug can be triggered if either l10n message for function
    show_date_relative() in date.c is longer than 30 characters, then
    `git blame --date relative` may exit abnormally.

 2. When show blame information with relative time, the UTF-8 characters
    in time_str will break the alignment of columns after the date field.
    This is because the time_buf padding with spaces should have a
    constant display width, not a fixed strlen size.  So we should call
    utf8_strwidth() instead of strlen() for width calibration.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-23 00:01:52 -07:00
Junio C Hamano b407d40933 Merge branch 'nd/log-show-linear-break'
Attempts to show where a single-strand-of-pearls break in "git log"
output.

* nd/log-show-linear-break:
  log: add --show-linear-break to help see non-linear history
  object.h: centralize object flag allocation
2014-04-03 12:38:11 -07:00
Nguyễn Thái Ngọc Duy 208acbfb82 object.h: centralize object flag allocation
While the field "flags" is mainly used by the revision walker, it is
also used in many other places. Centralize the whole flag allocation to
one place for a better overview (and easier to move flags if we have
too).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-25 15:09:24 -07:00
Junio C Hamano 6784fab0ac Merge branch 'dk/blame-janitorial'
Code clean-up.

* dk/blame-janitorial:
  builtin/blame.c::find_copy_in_blob: no need to scan for region end
  blame.c: prepare_lines should not call xrealloc for every line
  builtin/blame.c::prepare_lines: fix allocation size of sb->lineno
  builtin/blame.c: eliminate same_suspect()
  builtin/blame.c: struct blame_entry does not need a prev link
2014-02-27 14:01:46 -08:00
Junio C Hamano 043478308f Merge branch 'ep/varscope'
Shrink lifetime of variables by moving their definitions to an
inner scope where appropriate.

* ep/varscope:
  builtin/gc.c: reduce scope of variables
  builtin/fetch.c: reduce scope of variable
  builtin/commit.c: reduce scope of variables
  builtin/clean.c: reduce scope of variable
  builtin/blame.c: reduce scope of variables
  builtin/apply.c: reduce scope of variables
  bisect.c: reduce scope of variable
2014-02-27 14:01:30 -08:00
David Kastrup 3ee8944fa5 builtin/blame.c::find_copy_in_blob: no need to scan for region end
The region end can be looked up just like its beginning.

Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-25 09:51:24 -08:00
David Kastrup 352bbbd9f2 blame.c: prepare_lines should not call xrealloc for every line
Making a single preparation run for counting the lines will avoid memory
fragmentation.  Also, fix the allocated memory size which was wrong
when sizeof(int *) != sizeof(int), and would have been too small
for sizeof(int *) < sizeof(int), admittedly unlikely.

Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:32:41 -08:00
David Kastrup 62cf3ca95a builtin/blame.c::prepare_lines: fix allocation size of sb->lineno
If we are calling xrealloc on every single line, the least we can do
is get the right allocation size.

Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:32:41 -08:00
David Kastrup 0a88f08e28 builtin/blame.c: eliminate same_suspect()
Since the origin pointers are "interned" and reference-counted, comparing
the pointers rather than the content is enough.  The only uninterned
origins are cached values kept in commit->util, but same_suspect is not
called on them.

Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:32:21 -08:00
Elia Pinto ac39b27786 builtin/blame.c: reduce scope of variables
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-31 10:44:05 -08:00
David Kastrup a0f58c5830 builtin/blame.c: struct blame_entry does not need a prev link
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-22 11:28:01 -08:00
Junio C Hamano 3b9d69ec22 Merge branch 'js/lift-parent-count-limit'
There is no reason to have a hardcoded upper limit of the number of
parents for an octopus merge, created via the graft mechanism.

* js/lift-parent-count-limit:
  Remove the line length limit for graft files
2014-01-10 10:33:36 -08:00
Johannes Schindelin e228c1736f Remove the line length limit for graft files
Support for grafts predates Git's strbuf, and hence it is understandable
that there was a hard-coded line length limit of 1023 characters (which
was chosen a bit awkwardly, given that it is *exactly* one byte short of
aligning with the 41 bytes occupied by a commit name and the following
space or new-line character).

While regular commit histories hardly win comprehensibility in general
if they merge more than twenty-two branches in one go, it is not Git's
business to limit grafts in such a way.

In this particular developer's case, the use case that requires
substantially longer graft lines to be supported is the visualization of
the commits' order implied by their changes: commits are considered to
have an implicit relationship iff exchanging them in an interactive
rebase would result in merge conflicts.

Thusly implied branches tend to be very shallow in general, and the
resulting thicket of implied branches is usually very wide; It is
actually quite common that *most* of the commits in a topic branch have
not even one implied parent, so that a final merge commit has about as
many implied parents as there are commits in said branch.

[jc: squashed in tests by Jonathan]

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-27 16:46:25 -08:00
Junio C Hamano 5bb62059f2 Merge branch 'jk/robustify-parse-commit'
* jk/robustify-parse-commit:
  checkout: do not die when leaving broken detached HEAD
  use parse_commit_or_die instead of custom message
  use parse_commit_or_die instead of segfaulting
  assume parse_commit checks for NULL commit
  assume parse_commit checks commit->object.parsed
  log_tree_diff: die when we fail to parse a commit
2013-12-05 12:54:01 -08:00
Nguyễn Thái Ngọc Duy 4a2d5ae262 pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses
Normally parse_pathspec() is used on command line arguments where it
can do fancy thing like parsing magic on each argument or adding magic
for all pathspecs based on --*-pathspecs options.

There's another use of parse_pathspec(), where pathspec is needed, but
the input is known to be pure paths. In this case we usually don't
want --*-pathspecs to interfere. And we definitely do not want to
parse magic in these paths, regardless of --literal-pathspecs.

Add new flag PATHSPEC_LITERAL_PATH for this purpose. When it's set,
--*-pathspecs are ignored, no magic is parsed. And if the caller
allows PATHSPEC_LITERAL (i.e. the next calls can take literal magic),
then PATHSPEC_LITERAL will be set.

This fixes cases where git chokes when GIT_*_PATHSPECS are set because
parse_pathspec() indicates it won't take any magic. But
GIT_*_PATHSPECS add them anyway. These are

   export GIT_LITERAL_PATHSPECS=1
   git blame -- something
   git log --follow something
   git log --merge

"git ls-files --with-tree=path" (aka parse_pathspec() in
overlay_tree_on_cache()) is safe because the input is empty, and
producing one pathspec due to PATHSPEC_PREFER_CWD does not take any
magic into account.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-28 09:57:36 -07:00
Jeff King 0064053bd7 assume parse_commit checks commit->object.parsed
The parse_commit function will check the "parsed" flag of
the object and do nothing if it is set. There is no need
for callers to check the flag themselves, and doing so only
clutters the code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:43:50 -07:00
Junio C Hamano b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Junio C Hamano de9a25354a Merge branch 'es/blame-L-twice'
Teaches "git blame" to take more than one -L ranges.

* es/blame-L-twice:
  line-range: reject -L line numbers less than 1
  t8001/t8002: blame: add tests of -L line numbers less than 1
  line-range: teach -L^:RE to search from start of file
  line-range: teach -L:RE to search from end of previous -L range
  line-range: teach -L^/RE/ to search from start of file
  line-range-format.txt: document -L/RE/ relative search
  log: teach -L/RE/ to search from end of previous -L range
  blame: teach -L/RE/ to search from end of previous -L range
  line-range: teach -L/RE/ to search relative to anchor point
  blame: document multiple -L support
  t8001/t8002: blame: add tests of multiple -L options
  blame: accept multiple -L ranges
  blame: inline one-line function into its lone caller
  range-set: publish API for re-use by git-blame -L
  line-range-format.txt: clarify -L:regex usage form
  git-log.txt: place each -L option variation on its own line
2013-09-09 14:35:11 -07:00
Junio C Hamano 118b9d5836 Merge branch 'es/blame-L-more'
More fixes to the code to parse the "-L" option in "log" and "blame".

* es/blame-L-more:
  blame: reject empty ranges -L,+0 and -L,-0
  t8001/t8002: blame: demonstrate acceptance of bogus -L,+0 and -L,-0
  blame: reject empty ranges -LX,+0 and -LX,-0
  t8001/t8002: blame: demonstrate acceptance of bogus -LX,+0 and -LX,-0
  log: fix -L bounds checking bug
  t4211: retire soon-to-be unimplementable tests
  t4211: log: demonstrate -L bounds checking bug
  blame: fix -L bounds checking bug
  t8001/t8002: blame: add empty file & partial-line tests
  t8001/t8002: blame: demonstrate -L bounds checking bug
  t8001/t8002: blame: decompose overly-large test
2013-09-09 14:32:45 -07:00
Eric Sunshine 52f4d12648 blame: teach -L/RE/ to search from end of previous -L range
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-06 14:44:25 -07:00
Eric Sunshine 815834e9aa line-range: teach -L/RE/ to search relative to anchor point
Range specification -L/RE/ for blame/log unconditionally begins
searching at line one. Mailing list discussion [1] suggests that, in the
presence of multiple -L options, -L/RE/ should search relative to the
endpoint of the previous -L range, if any.

Teach the parsing machinery underlying blame's and log's -L options to
accept a start point for -L/RE/ searches. Follow-up patches will upgrade
blame and log to take advantage of this ability.

[1]: http://thread.gmane.org/gmane.comp.version-control.git/229755/focus=229966

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-06 14:36:34 -07:00
Eric Sunshine 58dbfa2e59 blame: accept multiple -L ranges
git-blame accepts only a single -L option or none. Clients requiring
blame information for multiple disjoint ranges are therefore forced
either to invoke git-blame multiple times, once for each range, or only
once with no -L option to cover the entire file, both of which can be
costly.  Teach git-blame to accept multiple -L ranges.  Overlapping and
out-of-order ranges are accepted.

In this patch, the X in -LX,Y is absolute (for instance, /RE/ patterns
search from line 1), and Y is relative to X. Follow-up patches provide
more flexibility over how X is anchored.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-06 14:29:35 -07:00
Eric Sunshine 753935749f blame: inline one-line function into its lone caller
As of 25ed3412 (Refactor parse_loc; 2013-03-28),
blame.c:prepare_blame_range() became effectively a one-line function
which merely passes its arguments along to another function. This
indirection does not bring clarity to the code. Simplify by inlining
prepare_blame_range() into its lone caller.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-06 14:28:09 -07:00
Eric Sunshine 164a9cf430 blame: fix -L bounds checking bug
Since inception, -LX,Y has correctly reported an out-of-range error when
Y is beyond end of file, however, X was not checked, and an out-of-range
X would cause a crash.  92f9e273 (blame: prevent a segv when -L given
start > EOF; 2010-02-08) attempted to rectify this shortcoming but has
its own off-by-one error which allows X to extend one line past end of
file.  For example, given a file with 5 lines:

  git blame -L5 foo  # OK, blames line 5
  git blame -L6 foo  # accepted, no error, no output, huh?
  git blame -L7 foo  # error "fatal: file foo has only 5 lines"

Fix this bug.

In order to avoid regressing "blame foo" when foo is an empty file, the
fix is slightly more complicated than changing '<' to '<='.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 11:54:31 -07:00
Stefan Beller d5d09d4754 Replace deprecated OPT_BOOLEAN by OPT_BOOL
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.

This patch does not change semantics of any command intentionally.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 11:32:19 -07:00
Nguyễn Thái Ngọc Duy 9a08727443 remove init_pathspec() in favor of parse_pathspec()
While at there, move free_pathspec() to pathspec.c

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Nguyễn Thái Ngọc Duy bd1928df1d remove diff_tree_{setup,release}_paths
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Junio C Hamano ed73fe5642 Merge branch 'tr/line-log'
* tr/line-log:
  git-log(1): remove --full-line-diff description
  line-log: fix documentation formatting
  log -L: improve comments in process_all_files()
  log -L: store the path instead of a diff_filespec
  log -L: test merge of parallel modify/rename
  t4211: pass -M to 'git log -M -L...' test
  log -L: fix overlapping input ranges
  log -L: check range set invariants when we look it up
  Speed up log -L... -M
  log -L: :pattern:file syntax to find by funcname
  Implement line-history search (git log -L)
  Export rewrite_parents() for 'log -L'
  Refactor parse_loc
2013-06-02 16:00:44 -07:00
Junio C Hamano e52e6f79cc Merge branch 'nd/pretty-formats'
pretty-printing body of the commit that is stored in non UTF-8
encoding did not work well.  The early part of this series fixes
it.  And then it adds %C(auto) specifier that turns the coloring on
when we are emitting to the terminal, and adds column-aligning
format directives.

* nd/pretty-formats:
  pretty: support %>> that steal trailing spaces
  pretty: support truncating in %>, %< and %><
  pretty: support padding placeholders, %< %> and %><
  pretty: add %C(auto) for auto-coloring
  pretty: split color parsing into a separate function
  pretty: two phase conversion for non utf-8 commits
  utf8.c: add reencode_string_len() that can handle NULs in string
  utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences
  utf8.c: move display_mode_esc_sequence_len() for use by other functions
  pretty: share code between format_decoration and show_decorations
  pretty-formats.txt: wrap long lines
  pretty: get the correct encoding for --pretty:format=%e
  pretty: save commit encoding from logmsg_reencode if the caller needs it
2013-04-23 11:22:48 -07:00
Nguyễn Thái Ngọc Duy 5a10d23658 pretty: save commit encoding from logmsg_reencode if the caller needs it
The commit encoding is parsed by logmsg_reencode, there's no need for
the caller to re-parse it again. The reencoded message now has the new
encoding, not the original one. The caller would need to read commit
object again before parsing.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:27 -07:00
René Scharfe de5abe9fe9 blame: handle broken commit headers gracefully
split_ident_line() can leave us with the pointers date_begin, date_end,
tz_begin and tz_end all set to NULL.  Check them before use and supply
the same fallback values as in the case of a negative return code from
split_ident_line().

The "(unknown)" is not actually shown in the output, though, because it
will be converted to a number (zero) eventually.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 14:50:45 -07:00
Thomas Rast 13b8f68c1f log -L: :pattern:file syntax to find by funcname
This new syntax finds a funcname matching /pattern/, and then takes from there
up to (but not including) the next funcname.  So you can say

  git log -L:main:main.c

and it will dig up the main() function and show its line-log, provided
there are no other funcnames matching 'main'.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 10:30:04 -07:00
Bo Yang 25ed3412f8 Refactor parse_loc
We want to use the same style of -L n,m argument for 'git log -L' as
for git-blame.  Refactor the argument parsing of the range arguments
from builtin/blame.c to the (new) file that will hold the 'git log -L'
logic.

To accommodate different data structures in blame and log -L, the file
contents are abstracted away; parse_range_arg takes a callback that it
uses to get the contents of a line of the (notional) file.

The new test is for a case that made me pause during debugging: the
'blame -L with invalid end' test was the only one that noticed an
outright failure to parse the end *at all*.  So make a more explicit
test for that.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 10:28:41 -07:00
Jeff King be5c9fb904 logmsg_reencode: lazily load missing commit buffers
Usually a commit that makes it to logmsg_reencode will have
been parsed, and the commit->buffer struct member will be
valid. However, some code paths will free commit buffers
after having used them (for example, the log traversal
machinery will do so to keep memory usage down).

Most of the time this is fine; log should only show a commit
once, and then exits. However, there are some code paths
where this does not work. At least two are known:

  1. A commit may be shown as part of a regular ref, and
     then it may be shown again as part of a submodule diff
     (e.g., if a repo contains refs to both the superproject
     and subproject).

  2. A notes-cache commit may be shown during "log --all",
     and then later used to access a textconv cache during a
     diff.

Lazily loading in logmsg_reencode does not necessarily catch
all such cases, but it should catch most of them. Users of
the commit buffer tend to be either parsing for structure
(in which they will call parse_commit, and either we will
already have parsed, or we will load commit->buffer lazily
there), or outputting (either to the user, or fetching a
part of the commit message via format_commit_message). In
the latter case, we should always be using logmsg_reencode
anyway (and typically we do so via the pretty-print
machinery).

If there are any cases that this misses, we can fix them up
to use logmsg_reencode (or handle them on a case-by-case
basis if that is inappropriate).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-26 13:28:22 -08:00
Jeff King dd0d388c44 logmsg_reencode: never return NULL
The logmsg_reencode function will return the reencoded
commit buffer, or NULL if reencoding failed or no reencoding
was necessary. Since every caller then ends up checking for NULL
and just using the commit's original buffer, anyway, we can
be a bit more helpful and just return that buffer when we
would have returned NULL.

Since the resulting string may or may not need to be freed,
we introduce a logmsg_free, which checks whether the buffer
came from the commit object or not (callers either
implemented the same check already, or kept two separate
pointers, one to mark the buffer to be used, and one for the
to-be-freed string).

Pushing this logic into logmsg_* simplifies the callers, and
will let future patches lazily load the commit buffer in a
single place.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-26 13:28:21 -08:00
Junio C Hamano 577f63e781 Merge branch 'ap/log-mailmap'
Teach commands in the "log" family to optionally pay attention to
the mailmap.

* ap/log-mailmap:
  log --use-mailmap: optimize for cases without --author/--committer search
  log: add log.mailmap configuration option
  log: grep author/committer using mailmap
  test: add test for --use-mailmap option
  log: add --use-mailmap option
  pretty: use mailmap to display username and email
  mailmap: add mailmap structure to rev_info and pp
  mailmap: simplify map_user() interface
  mailmap: remove email copy and length limitation
  Use split_ident_line to parse author and committer
  string-list: allow case-insensitive string list
2013-01-20 17:06:53 -08:00