Commit graph

26 commits

Author SHA1 Message Date
SZEDER Gábor
4468d4435c object_as_type: initialize commit-graph-related fields of 'struct commit'
When the commit graph and generation numbers were introduced in
commits 177722b344 (commit: integrate commit graph with commit
parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
struct commit, 2018-04-25), they tried to make sure that the
corresponding 'graph_pos' and 'generation' fields of 'struct commit'
are initialized conservatively, as if the commit were not included in
the commit-graph file.

Alas, initializing those fields only in alloc_commit_node() missed the
case when an object that happens to be a commit is first looked up via
lookup_unknown_object(), and is then later converted to a 'struct
commit' via the object_as_type() helper function (either calling it
directly, or as part of a subsequent lookup_commit() call).
Consequently, both of those fields incorrectly remain set to zero,
which means e.g. that the commit is present in and is the first entry
of the commit-graph file.  This will result in wrong timestamp, parent
and root tree hashes, if such a 'struct commit' instance is later
filled from the commit-graph.

Extract the initialization of 'struct commit's fields from
alloc_commit_node() into a helper function, and call it from
object_as_type() as well, to make sure that it properly initializes
the two commit-graph-related fields, too.  With this helper function
it is hopefully less likely that any new fields added to 'struct
commit' in the future would remain uninitialized.

With this change alloc_commit_index() won't have any remaining callers
outside of 'alloc.c', so mark it as static.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-27 16:55:57 -08:00
Elijah Newren
1731310745 alloc: make allocate_alloc_state and clear_alloc_state more consistent
Since both functions are using the same data type, they should either both
refer to it as void *, or both use the real type (struct alloc_state *).
Opt for the latter.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-15 11:52:09 -07:00
Junio C Hamano
110240588d Merge branch 'sb/object-store-alloc'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-alloc:
  alloc: allow arbitrary repositories for alloc functions
  object: allow create_object to handle arbitrary repositories
  object: allow grow_object_hash to handle arbitrary repositories
  alloc: add repository argument to alloc_commit_index
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_blob_node
  object: add repository argument to grow_object_hash
  object: add repository argument to create_object
  repository: introduce parsed objects field
2018-06-25 13:22:38 -07:00
Derrick Stolee
83073cc994 commit: add generation number to struct commit
The generation number of a commit is defined recursively as follows:

* If a commit A has no parents, then the generation number of A is one.
* If a commit A has parents, then the generation number of A is one
  more than the maximum generation number among the parents of A.

Add a uint32_t generation field to struct commit so we can pass this
information to revision walks. We use three special values to signal
the generation number is invalid:

GENERATION_NUMBER_INFINITY 0xFFFFFFFF
GENERATION_NUMBER_MAX 0x3FFFFFFF
GENERATION_NUMBER_ZERO 0

The first (_INFINITY) means the generation number has not been loaded or
computed. The second (_MAX) means the generation number is too large to
store in the commit-graph file. The third (_ZERO) means the generation
number was loaded from a commit graph file that was written by a version
of git that did not support generation numbers.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 12:36:25 +09:00
Stefan Beller
14ba97f81c alloc: allow arbitrary repositories for alloc functions
We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:16:50 +09:00
Stefan Beller
dd5d9deb01 alloc: add repository argument to alloc_commit_index
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
17bfe87369 alloc: add repository argument to alloc_report
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
13e3fdcb76 alloc: add repository argument to alloc_object_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
a0bd9086bb alloc: add repository argument to alloc_tag_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
8ba0e5ec57 alloc: add repository argument to alloc_commit_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
cf7203bdc6 alloc: add repository argument to alloc_tree_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller
f0de1d62ae alloc: add repository argument to alloc_blob_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Derrick Stolee
177722b344 commit: integrate commit graph with commit parsing
Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:43:02 +09:00
Jeff King
94d5a22cf6 alloc: factor out commit index
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.

While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:05 -07:00
Jeff King
d36f51c13b move setting of object->type to alloc_* functions
The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:05 -07:00
Jeff King
600e2a69df alloc: write out allocator definitions
Because the allocator functions for tree, blobs, etc are all
very similar, we originally used a macro to avoid repeating
ourselves. Since the prior commit, though, the heavy lifting
is done by an inline helper function.  The macro does still
save us a few lines, but at some readability cost.  It
obfuscates the function definitions (and makes them hard to
find via grep).

Much worse, though, is the fact that it isn't used
consistently for all allocators. Somebody coming later may
be tempted to modify DEFINE_ALLOCATOR, but they would miss
alloc_commit_node, which is treated specially.

Let's just drop the macro and write everything out
explicitly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:04 -07:00
Ramsay Jones
225ea22046 alloc.c: remove the alloc_raw_commit_node() function
In order to encapsulate the setting of the unique commit index, commit
969eba63 ("commit: push commit_index update into alloc_commit_node",
10-06-2014) introduced a (logically private) intermediary allocator
function. However, this function (alloc_raw_commit_node()) was declared
as a public function, which undermines its entire purpose.

Introduce an inline function, alloc_node(), which implements the main
logic of the allocator used by DEFINE_ALLOCATOR, and redefine the macro
in terms of the new function. In addition, use the new function in the
implementation of the alloc_commit_node() allocator, rather than the
intermediary allocator, which can now be removed.

Noticed by sparse ("symbol 'alloc_raw_commit_node' was not declared.
Should it be static?").

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:04 -07:00
Jeff King
969eba6341 commit: push commit_index update into alloc_commit_node
Whenever we create a commit object via lookup_commit, we
give it a unique index to be used with the commit-slab API.
The theory is that any "struct commit" we create would
follow this code path, so any such struct would get an
index. However, callers could use alloc_commit_node()
directly (and get multiple commits with index 0).

Let's push the indexing into alloc_commit_node so that it's
hard for callers to get it wrong.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12 10:29:42 -07:00
Jeff King
c335d74d34 alloc: include any-object allocations in alloc_report
When 2c1cbec (Use proper object allocators for unknown
object nodes too, 2007-04-16), added a special "any_object"
allocator, it never taught alloc_report to report on it. To
do so we need to add an extra type argument to the REPORT
macro, as that commit did for DEFINE_ALLOCATOR.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12 10:29:42 -07:00
Junio C Hamano
ea6640ec3e alloc.c: have SP around arithmetic operators
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-16 10:27:26 -07:00
Jonathan Nieder
28bd70d811 unbreak and eliminate NO_C99_FORMAT
In the spirit of v1.5.0.2~21 (Check for PRIuMAX rather than
NO_C99_FORMAT in fast-import.c, 2007-02-20), use PRIuMAX from
git-compat-util.h on all platforms instead of C99-specific formats
like %zu with dangerous fallbacks to %u or %lu.

So now C99-challenged platforms can build git without provoking
warnings or errors from printf, even if pointers do not have the same
size as an int or long.

The need for a fallback PRIuMAX is detected in git-compat-util.h with
"#ifndef PRIuMAX".  So while at it, simplify the Makefile and configure
script by eliminating the NO_C99_FORMAT knob altogether.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-17 15:30:49 -07:00
Felipe Contreras
4b25d091ba Fix a bunch of pointer declarations (codestyle)
Essentially; s/type* /type */ as per the coding guidelines.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-01 15:17:31 -07:00
Linus Torvalds
100c5f3b0b Clean up object creation to use more common code
This replaces the fairly odd "created_object()" function that did _most_
of the object setup with a more complete "create_object()" function that
also has a more natural calling convention.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-16 23:36:16 -07:00
Linus Torvalds
2c1cbec1e2 Use proper object allocators for unknown object nodes too
We used to use a different allocator scheme for when we didn't know the
object type.  That meant that objects that were created without any
up-front knowledge of the type would not go through the same allocation
paths as normal object allocations, and would miss out on the statistics.

But perhaps more importantly than the statistics (that are useful when
looking at memory usage but not much else), if we want to make the
object hash tables use a denser object pointer representation, we need
to make sure that they all go through the same blocking allocator.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-16 23:36:11 -07:00
Ramsay Allan Jones
579d1fbfaf Add NO_C99_FORMAT to support older compilers.
The NO_C99_FORMAT macro allows compilers that lack support for the
ll,hh,j,z,t size specifiers (eg. gcc 2.95.2) to adapt the code to avoid
runtime errors in the formatted IO functions.

Signed-off-by: Ramsay Allan Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-02 00:27:18 -07:00
Linus Torvalds
855419f764 Add specialized object allocator
This creates a simple specialized object allocator for basic
objects.

This avoids wasting space with malloc overhead (metadata and
extra alignment), since the specialized allocator knows the
alignment, and that objects, once allocated, are never freed.

It also allows us to track some basic statistics about object
allocations. For example, for the mozilla import, it shows
object usage as follows:

     blobs:   627629 (14710 kB)
     trees:  1119035 (34969 kB)
   commits:   196423  (8440 kB)
      tags:     1336    (46 kB)

and the simpler allocator shaves off about 2.5% off the memory
footprint off a "git-rev-list --all --objects", and is a bit
faster too.

[ Side note: this concludes the series of "save memory in object storage".
  The thing is, there simply isn't much more to be saved on the objects.

  Doing "git-rev-list --all --objects" on the mozilla archive has a final
  total RSS of 131498 pages for me: that's about 513MB. Of that, the
  object overhead is now just 56MB, the rest is going somewhere else (put
  another way: the fact that this patch shaves off 2.5% of the total
  memory overhead, considering that objects are now not much more than 10%
  of the total shows how big the wasted space really was: this makes
  object allocations much more memory- and time-efficient).

  I haven't looked at where the rest is, but I suspect the bulk of it is
  just the pack-file loading. It may be that we should pack the tree
  objects separately from the blob objects: for git-rev-list --objects, we
  don't actually ever need to even look at the blobs, but since trees and
  blobs are interspersed in the pack-file, we end up not being dense in
  the tree accesses, so we end up looking at more pages than we strictly
  need to.

  So with a 535MB pack-file, it's entirely possible - even likely - that
  most of the remaining RSS is just the mmap of the pack-file itself. We
  don't need to map in _all_ of it, but we do end up mapping a fair
  amount. ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 18:42:21 -07:00