Commit graph

160 commits

Author SHA1 Message Date
Johan Herland ef8db638cc Refactor notes code to concatenate multiple notes annotating the same object
Currently, having multiple notes referring to the same commit from various
locations in the notes tree is strongly discouraged, since only one of those
notes will be parsed and shown.

This patch teaches the notes code to _concatenate_ multiple notes that
annotate the same commit. Notes are concatenated by creating a new blob
object containing the concatenation of the notes in question, and
replacing them with the concatenated note in the internal notes tree
structure.

Getting the concatenation right requires being more proactive in unpacking
subtree entries in the internal notes tree structure, so that we don't return
a note prematurely (i.e. before having found all other notes that annotate
the same object). As such, this patch may incur a small performance penalty.

Suggested-by: Sam Vilain <sam@vilain.net>
Re-suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-19 19:00:24 -07:00
Johan Herland 23123aecf8 Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
  objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
  no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
  only the subtree with the longest matching prefix is consulted. This
  means that if the given commit is e.g. "deadbeef", and the notes tree have
  subtrees "de" and "dead", then the following paths in the notes tree are
  searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
  from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
  acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
  recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.

This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.

The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).

In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).

As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.

However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.

Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-19 19:00:24 -07:00
Johan Herland 27d5756410 Teach notes code to free its internal data structures on request
There's no need to be rude to memory-concious callers...

This patch has been improved by the following contributions:
- Junio C Hamano: avoid old-style declaration

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-19 19:00:24 -07:00
Johan Herland c56fcc89b9 Add flags to get_commit_notes() to control the format of the note string
This patch adds the following flags to get_commit_notes() for adjusting the
format of the produced note string:
- NOTES_SHOW_HEADER: Print "Notes:" line before the notes contents
- NOTES_INDENT: Indent notes contents by 4 spaces

Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-19 19:00:24 -07:00
Johannes Schindelin fd53c9eb44 Speed up git notes lookup
To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.

The idea for the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.

Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object.  This
allows you to annotate arbitrary branches, or objects.

This patch has been improved by the following contributions:
- Junio C Hamano: fixed an obvious error in initialize_hash_map()

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-19 18:59:42 -07:00
Johannes Schindelin a97a74686d Introduce commit notes
Commit notes are blobs which are shown together with the commit
message.  These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.

The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).

The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.

This patch has been improved by the following contributions:
- Thomas Rast: fix core.notesRef documentation
- Tor Arne Vestbø: fix printing of multi-line notes
- Alex Riesen: Using char array instead of char pointer costs less BSS
- Johan Herland: Plug leak when msg is good, but msglen or type causes return

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

get_commit_notes(): Plug memory leak when 'if' triggers, but not because of read_sha1_file() failure
2009-10-19 18:59:42 -07:00
Junio C Hamano 954cfb5cfd Revert "Merge branch 'js/notes'"
This reverts commit 7b75b331f6, reversing
changes made to 5d680a67d7.
2009-02-10 21:32:10 -08:00
Tor Arne Vestbø 22a3d06093 git-notes: fix printing of multi-line notes
The line length was read from the same position every time,
causing mangled output when printing notes with multiple lines.

Also, adding new-line manually for each line ensures that we
get a new-line between commits, matching git-log for commits
without notes.

Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-14 14:54:41 -08:00
Johannes Schindelin 2dd625d022 Speed up git notes lookup
To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.

The idea fo the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.

Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object.  This
allows you to annotate arbitrary branches, or objects.

[jc: fixed an obvious error in initialize_hash_map()]

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-11 19:17:45 -08:00
Johannes Schindelin 879ef2485d Introduce commit notes
Commit notes are blobs which are shown together with the commit
message.  These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.

The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).

The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-21 02:47:21 -08:00