development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 14:03:28 +00:00

Author	SHA1	Message	Date
Junio C Hamano	ede63a195c	Merge branch 'mh/reflife' Define memory ownership and lifetime rules for what for-each-ref feeds to its callbacks (in short, "you do not own it, so make a copy if you want to keep it"). * mh/reflife: (25 commits) refs: document the lifetime of the args passed to each_ref_fn register_ref(): make a copy of the bad reference SHA-1 exclude_existing(): set existing_refs.strdup_strings string_list_add_refs_by_glob(): add a comment about memory management string_list_add_one_ref(): rename first parameter to "refname" show_head_ref(): rename first parameter to "refname" show_head_ref(): do not shadow name of argument add_existing(): do not retain a reference to sha1 do_fetch(): clean up existing_refs before exiting do_fetch(): reduce scope of peer_item object_array_entry: fix memory handling of the name field find_first_merges(): remove unnecessary code find_first_merges(): initialize merges variable using initializer fsck: don't put a void-shaped peg in a char-shaped hole object_array_remove_duplicates(): rewrite to reduce copying revision: use object_array_filter() in implementation of gc_boundary() object_array: add function object_array_filter() revision: split some overly-long lines cmd_diff(): make it obvious which cases are exclusive of each other cmd_diff(): rename local variable "list" -> "entry" ...	2013-06-14 08:46:14 -07:00
Michael Haggerty	31faeb2088	object_array_entry: fix memory handling of the name field Previously, the memory management of the object_array_entry::name field was inconsistent and undocumented. object_array_entries are ultimately created by a single function, add_object_array_with_mode(), which has an argument "const char name". This function used to simply set the name field to reference the string pointed to by the name parameter, and nobody on the object_array side ever freed the memory. Thus, it assumed that the memory for the name field would be managed by the caller, and that the lifetime of that string would be at least as long as the lifetime of the object_array_entry. But callers were inconsistent: Some passed pointers to constant strings or argv entries, which was OK. * Some passed pointers to newly-allocated memory, but didn't arrange for the memory ever to be freed. * Some passed the return value of sha1_to_hex(), which is a pointer to a statically-allocated buffer that can be overwritten at any time. * Some passed pointers to refnames that they received from a for_each_ref()-type iteration, but the lifetimes of such refnames is not guaranteed by the refs API. Bring consistency to this mess by changing object_array to make its own copy for the object_array_entry::name field and free this memory when an object_array_entry is deleted from the array. Many callers were passing the empty string as the name parameter, so as a performance optimization, treat the empty string specially. Instead of making a copy, store a pointer to a statically-allocated empty string to object_array_entry::name. When deleting such an entry, skip the free(). Change the callers that were already passing copies to add_object_array_with_mode() to either skip the copy, or (if the memory needed to be allocated anyway) freeing the memory itself. A part of this commit effectively reverts `70d26c6e76` read_revisions_from_stdin: make copies for handle_revision_arg because the copying introduced by that commit (which is still necessary) is now done at a deeper level. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-02 15:28:46 -07:00
Junio C Hamano	4818cfcdcc	Merge branch 'jk/lookup-object-prefer-latest' Optimizes object lookup when the object hashtable starts to become crowded. * jk/lookup-object-prefer-latest: lookup_object: prioritize recently found objects	2013-05-29 14:29:59 -07:00
Michael Haggerty	1506510c17	object_array_remove_duplicates(): rewrite to reduce copying The old version copied one entry to its destination position, then deleted any matching entries from the tail of the array. This required the tail of the array to be copied multiple times. It didn't affect the complexity of the algorithm because the whole tail has to be searched through anyway. But all the copying was unnecessary. Instead, check for the existence of an entry with the same name in the head of the list before copying an entry to its final position. This way each entry has to be copied at most one time. Extract a helper function contains_name() to do a bit of the work. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-28 09:25:01 -07:00
Michael Haggerty	aeb4a51ef8	object_array: add function object_array_filter() Add a function that allows unwanted entries in an object_array to be removed. This encapsulation is a step towards giving object_array ownership of its entries' name memory. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-28 09:25:01 -07:00
Michael J Gruber	afa15f3cd8	grep: honor --textconv for the case rev:path Make "grep" honor the "--textconv" option also for the object case, i.e. when used with an argument "rev:path". Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-10 10:27:34 -07:00
Jeff King	9a414486d9	lookup_object: prioritize recently found objects The lookup_object function is backed by a hash table of all objects we have seen in the program. We manage collisions with a linear walk over the colliding entries, checking each with hashcmp(). The main cost of lookup is in these hashcmp() calls; finding our item in the first slot is cheaper than finding it in the second slot, which is cheaper than the third, and so on. If we assume that there is some locality to the object lookups (e.g., if X and Y collide, and we have just looked up X, the next lookup is more likely to be for X than for Y), then we can improve our average lookup speed by checking X before Y. This patch does so by swapping a found item to the front of the collision chain. The p0001 perf test reveals that this does indeed exploit locality in the case of "rev-list --all --objects": Test origin this tree ------------------------------------------------------------------------- 0001.1: rev-list --all 0.40(0.38+0.02) 0.40(0.36+0.03) +0.0% 0001.2: rev-list --all --objects 2.24(2.17+0.05) 1.86(1.79+0.05) -17.0% This is not surprising, as the full object traversal will hit the same tree entries over and over (e.g., for every commit that doesn't change "Documentation/", we will have to look up the same sha1 just to find out that we already processed it). The reason why this technique works (and does not violate any properties of the hash table) is subtle and bears some explanation. Let's imagine we get a lookup for sha1 `X`, and it hashes to bucket `i` in our table. That stretch of the table may look like: index \| i-1 \| i \| i+1 \| i+2 \| ----------------------------------- entry ... \| A \| B \| C \| X \| ... ----------------------------------- We start our probe at i, see that B does not match, nor does C, and finally find X. There may be multiple C's in the middle, but we know that there are no empty slots (or else we would not find X at all). We do not know the original index of B; it may be `i`, or it may be less than i (e.g., if it were `i-1`, it would collide with A and spill over into the `i` bucket). So it is acceptable for us to move it to the right of a contiguous stretch of entries (because we will find it from a linear walk starting anywhere at `i` or before), but never to the left (if we moved it to `i-1`, we would miss it when starting our walk at `i`). We do know the original index of X; it is `i`, so it is safe to place it anywhere in the contiguous stretch between `i` and where we found it (`i+2` in the this case). This patch does a pure swap; after finding X in the situation above, we would end with: index \| i-1 \| i \| i+1 \| i+2 \| ----------------------------------- entry ... \| A \| X \| C \| B \| ... ----------------------------------- We could instead bump X into the `i` slot, and then shift the whole contiguous chain down by one, resulting in: index \| i-1 \| i \| i+1 \| i+2 \| ----------------------------------- entry ... \| A \| X \| B \| C \| ... ----------------------------------- That puts our chain in true most-recently-used order. However, experiments show that it is not any faster (and in fact, is slightly slower due to the extra manipulation). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-02 08:36:50 -07:00
Jeff King	75a9549047	avoid segfaults on parse_object failure Many call-sites of parse_object assume that they will get a non-NULL return value; this is not the case if we encounter an error while parsing the object. This patch adds a wrapper function around parse_object that handles dying automatically, and uses it anywhere we immediately try to access the return value as a non-NULL pointer (i.e., anywhere that we would currently segfault). This wrapper may also be useful in other places. The most obvious one is code like: o = parse_object(sha1); if (!o) die(...); However, these should not be mechanically converted to parse_object_or_die, as the die message is sometimes customized. Later patches can address these sites on a case-by-case basis. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-17 12:49:03 -07:00
Pete Wyckoff	82247e9bd5	remove superfluous newlines in error messages The error handling routines add a newline. Remove the duplicate ones in error messages. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:51 -07:00
Junio C Hamano	419f2ecf78	Merge branch 'hv/submodule-recurse-push' "git push --recurse-submodules" learns to optionally look into the histories of submodules bound to the superproject and push them out. By Heiko Voigt * hv/submodule-recurse-push: push: teach --recurse-submodules the on-demand option Refactor submodule push check to use string list instead of integer Teach revision walking machinery to walk multiple times sequencially	2012-04-24 14:40:20 -07:00
Heiko Voigt	bcc0a3ea38	Teach revision walking machinery to walk multiple times sequencially Previously it was not possible to iterate revisions twice using the revision walking api. We add a reset_revision_walk() which clears the used flags. This allows us to do multiple sequencial revision walks. We add the appropriate calls to the existing submodule machinery doing revision walks. This is done to avoid surprises if future code wants to call these functions more than once during the processes lifetime. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-03-30 08:57:49 -07:00
Nguyễn Thái Ngọc Duy	090ea12671	parse_object: avoid putting whole blob in core Traditionally, all the callers of check_sha1_signature() first called read_sha1_file() to prepare the whole object data in core, and called this function. The function is used to revalidate what we read from the object database actually matches the object name we used to ask for the data from the object database. Update the API to allow callers to pass NULL as the object data, and have the function read and hash the object data using streaming API to recompute the object name, without having to hold everything in core at the same time. This is most useful in parse_object() that parses a blob object, because this caller does not have to keep the actual blob data around in memory after a "struct blob" is returned. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-03-07 09:07:38 -08:00
Jeff King	ccdc6037fe	parse_object: try internal cache before reading object db When parse_object is called, we do the following: 1. read the object data into a buffer via read_sha1_file 2. call parse_object_buffer, which then: a. calls the appropriate lookup_{commit,tree,blob,tag} to either create a new "struct object", or to find an existing one. We know the appropriate type from the lookup in step 1. b. calls the appropriate parse_{commit,tree,blob,tag} to parse the buffer for the new (or existing) object In step 2b, all of the called functions are no-ops for object "X" if "X->object.parsed" is set. I.e., when we have already parsed an object, we end up going to a lot of work just to find out at a low level that there is nothing left for us to do (and we throw away the data from read_sha1_file unread). We can optimize this by moving the check for "do we have an in-memory object" from 2a before the expensive call to read_sha1_file in step 1. This might seem circular, since step 2a uses the type information determined in step 1 to call the appropriate lookup function. However, we can notice that all of the lookup_* functions are backed by lookup_object. In other words, all of the objects are kept in a master hash table, and we don't actually need the type to do the "do we have it" part of the lookup, only to do the "and create it if it doesn't exist" part. This can save time whenever we call parse_object on the same sha1 twice in a single program. Some code paths already perform this optimization manually, with either: if (!obj->parsed) obj = parse_object(obj->sha1); if you already have a "struct object", or: struct object *obj = lookup_unknown_object(sha1); if (!obj \|\| !obj->parsed) obj = parse_object(sha1); if you don't. This patch moves the optimization into parse_object itself. Most git operations won't notice any impact. Either they don't parse a lot of duplicate sha1s, or the calling code takes special care not to re-parse objects. I timed two code paths that do benefit (there may be more, but these two were immediately obvious and easy to time). The first is fast-export, which calls parse_object on each object it outputs, like this: object = parse_object(sha1); if (!object) die(...); if (object->flags & SHOWN) return; which means that just to realize we have already shown an object, we will read the whole object from disk! With this patch, my best-of-five time for "fast-export --all" on git.git dropped from 26.3s to 21.3s. The second case is upload-pack, which will call parse_object for each advertised ref (because it needs to peel tags to show "^{}" entries). This doesn't matter for most repositories, because they don't have a lot of refs pointing to the same objects. However, if you have a big alternates repository with a shared object db for a number of child repositories, then the alternates repository will have duplicated refs representing each of its children. For example, GitHub's alternates repository for git.git has ~120,000 refs, of which only ~3200 are unique. The time for upload-pack to print its list of advertised refs dropped from 3.4s to 0.76s. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-01-05 13:30:54 -08:00
Junio C Hamano	68be2fea50	receive-pack, fetch-pack: reject bogus pack that records objects twice When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-16 22:05:21 -08:00
Junio C Hamano	4bbf5a2615	read_sha1_file(): get rid of read_sha1_file_repl() madness Most callers want to silently get a replacement object, and they do not care what the real name of the replacement object is. Worse yet, no sane interface to return the underlying object without replacement is provided. Remove the function and make only the few callers that want the name of the replacement object find it themselves. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-15 15:23:33 -07:00
Junio C Hamano	4682693e9c	Merge branch 'maint' * maint: tag.c: whitespace breakages fix Fix whitespace issue in object.c t5505: add missing &&	2010-09-06 00:12:04 -07:00
Junio C Hamano	af24059fa2	Merge branch 'xx/trivial' into maint * xx/trivial: tag.c: whitespace breakages fix Fix whitespace issue in object.c t5505: add missing &&	2010-09-06 00:11:59 -07:00
Jared Hance	55b4e9e432	Fix whitespace issue in object.c Change some expanded tabs (spaces) to tabs in object.c. Signed-off-by: Jared Hance <jaredhance@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-09-05 22:12:29 -07:00
Junio C Hamano	f92d62ec4e	Merge branch 'nd/maint-fix-replace' * nd/maint-fix-replace: parse_object: pass on the original sha1, not the replaced one	2010-09-03 22:23:13 -07:00
Nguyễn Thái Ngọc Duy	2e3400c052	parse_object: pass on the original sha1, not the replaced one Commit `0e87c36` (object: call "check_sha1_signature" with the replacement sha1) changed the first argument passed to parse_object_buffer() from "sha1" to "repl". With that change, the returned obj pointer has the replacement SHA1 in obj->sha1, not the original one. But when using lookup_commit() and then parse_commit() on a commit, we get an object pointer with the original sha1, but the commit content comes from the replacement commit. So the result we get from using parse_object() is different from the we get from using lookup_commit() followed by parse_commit(). It looks much simpler and safer to fix this inconsistency by passing "sha1" to parse_object_bufer() instead of "repl". The commit comment should be used to tell the the replacement commit is replacing another commit and why. So it should be easy to see that we have a replacement commit instead of an original one. And it is not a problem if the content of the commit is not consistent with the sha1 as cat-file piped to hash-object can be used to see the difference. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-09-03 22:13:08 -07:00
Jonathan Nieder	97a20eea19	fix "bundle --stdin" segfault When passed an empty list, objects_array_remove_duplicates() corrupts it by changing the number of entries from 0 to 1. The problem lies in the condition of its main loop: for (ref = 0; ref < array->nr - 1; ref++) { The loop body manipulates the supplied object array. In the case of an empty array, it should not be doing anything at all. But array->nr is an unsigned quantity, so the code enters the loop, in particular increasing array->nr. Fix this by comparing (ref + 1 < array->nr) instead. This bug can be triggered by git bundle --stdin: $ echo HEAD \| git bundle create some.bundle --stdin’ Segmentation fault (core dumped) The list of commits to bundle appears to be empty because of another bug: by the time the revision-walking machinery gets to look at it, standard input has already been consumed by rev-list, so this function gets an empty list of revisions. After this patch, git bundle --stdin still does not work; it just doesn’t segfault any more. Reported-by: Joey Hess <joey@kitenet.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-04-19 22:16:35 -07:00
Junio C Hamano	c76189875b	object.c: remove unused functions object_list_append() and object_list_length}() are not used anywhere. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-17 22:49:36 -08:00
Christian Couder	0e87c36763	object: call "check_sha1_signature" with the replacement sha1 Otherwise we get a "sha1 mismatch" error for replaced objects. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-05-31 17:02:59 -07:00
Dan McGee	91fe2f9091	Unify signedness in hashing calls Our hash_obj and hashtable_index calls and functions were doing a lot of funny things with signedness. Unify all of it to 'unsigned int'. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-05-20 00:02:24 -07:00
Dan McGee	b867d324ce	Fix type-punning issues In these two places we are casting part of our unsigned char sha1 array into an unsigned int, which violates GCCs strict-aliasing rules (and probably other compilers). Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-05-16 22:41:18 -07:00
Junio C Hamano	b2a6d1c686	bundle: allow the same ref to be given more than once "git bundle create x master master" used to create a bundle that lists the same branch (master) twice. Cloning from such a bundle resulted in a needless warning "warning: Duplicated ref: refs/remotes/origin/master". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-17 23:00:31 -08:00
Martin Koegler	d0b8c9e561	parse_object_buffer: don't ignore errors from the object specific parsing functions In the case of an malformed object, the object specific parsing functions would return an error, which is currently ignored. The object can be partial initialized in this case. This patch make parse_object_buffer propagate such errors. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-03 16:04:57 -08:00
Jim Meyering	cc21682793	Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-12-22 11:15:38 -08:00
Sam Vilain	e2ac7cb5fb	Don't assume tree entries that are not dirs are blobs When scanning the trees in track_tree_refs() there is a "lazy" test that assumes that entries are either directories or files. Don't do that. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-06-06 15:43:18 -07:00
Junio C Hamano	76026200ee	Merge branch 'maint-1.5.1' into maint * maint-1.5.1: fix memory leak in parse_object when check_sha1_signature fails name-rev: tolerate clock skew in committer dates	2007-05-24 19:01:50 -07:00
Carlos Rica	0b1f113075	fix memory leak in parse_object when check_sha1_signature fails When check_sha1_signature fails, program is not terminated: it prints an error message and returns NULL, so the buffer returned by read_sha1_file should be freed before. Signed-off-by: Carlos Rica <jasampler@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-05-24 18:56:06 -07:00
Martin Koegler	e5709a4a68	add add_object_array_with_mode Each object in struct object_array is extended with the mode. If not specified, S_IFINVALID is used. An object with an mode value can be added with add_object_array_with_mode. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-24 00:08:49 -07:00
Linus Torvalds	100c5f3b0b	Clean up object creation to use more common code This replaces the fairly odd "created_object()" function that did _most_ of the object setup with a more complete "create_object()" function that also has a more natural calling convention. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-16 23:36:16 -07:00
Linus Torvalds	2c1cbec1e2	Use proper object allocators for unknown object nodes too We used to use a different allocator scheme for when we didn't know the object type. That meant that objects that were created without any up-front knowledge of the type would not go through the same allocation paths as normal object allocations, and would miss out on the statistics. But perhaps more importantly than the statistics (that are useful when looking at memory usage but not much else), if we want to make the object hash tables use a denser object pointer representation, we need to make sure that they all go through the same blocking allocator. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-16 23:36:11 -07:00
Linus Torvalds	acdeec62cb	Don't ever return corrupt objects from "parse_object()" Looking at the SHA1 validation code due to the corruption that Alexander Litvinov is seeing under Cygwin, I notice that one of the most central places where we read objects, we actually do end up verifying the SHA1 of the result, but then we happily parse it anyway. And using "printf" to write the error message means that it not only can get lost, but will actually mess up stdout, and cause other strange and hard-to-debug failures downstream. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-03-20 22:17:17 -07:00
Nicolas Pitre	0ab179504a	get rid of lookup_object_type() This function is called only once in the whole source tree. Let's move its code inline instead, which is also in the spirit of removing as much object type char arrays as possible (not that this patch does anything for that but at least it is now a local matter). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-27 01:34:21 -08:00
Nicolas Pitre	21666f1aae	convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-27 01:34:21 -08:00
Nicolas Pitre	df8436622f	formalize typename(), and add its reverse type_from_string() Sometime typename() is used, sometimes type_names[] is accessed directly. Let's enforce typename() all the time which allows for validating the type. Also let's add a function to go from a name to a type and use it instead of manual memcpy() when appropriate. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-27 01:34:21 -08:00
Junio C Hamano	9f613ddd21	Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-09-16 10:22:02 -07:00
Jonas Fonseca	b3c952f838	Use xcalloc instead of calloc Signed-off-by: Jonas Fonseca <fonseca@diku.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-08-27 20:49:43 -07:00
Shawn Pearce	e702496e43	Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-08-23 13:53:10 -07:00
David Rientjes	a89fccd281	Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-08-17 14:23:53 -07:00
Linus Torvalds	1974632c66	Remove TYPE_* constant macros and use object_type enums consistently. This updates the type-enumeration constants introduced to reduce the memory footprint of "struct object" to match the type bits already used in the packfile format, by removing the former (i.e. TYPE_* constant macros) and using the latter (i.e. enum object_type) throughout the code for consistency. Eventually we can stop passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-07-12 23:18:03 -07:00
Linus Torvalds	0556a11a0d	git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-07-01 18:28:15 -07:00
Linus Torvalds	fc046a75d5	Abstract out accesses to object hash array There are a few special places where some programs accessed the object hash array directly, which bothered me because I wanted to play with some simple re-organizations. So this patch makes the object hash array data structures all entirely local to object.c, and the few users who wanted to look at it now get to use a function to query how many object index entries there can be, and to actually access the array. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-06-29 23:48:31 -07:00
Linus Torvalds	1f1e895fcc	Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-06-19 18:45:48 -07:00
Linus Torvalds	3e4339e6f9	Remove "refs" field from "struct object" This shrinks "struct object" to the absolutely minimal size possible. It now contains /only/ the object flags and the SHA1 hash name of the object. The "refs" field, which is really needed only for fsck, is maintained in a separate hashed lookup-table, allowing all normal users to totally ignore it. This helps memory usage, although not as much as I hoped: it looks like the allocation overhead of malloc (and the alignment constraints in particular) means that while the structure size shrinks, the actual allocation overhead mostly does not. [ That said: memory usage is actually down, but not as much as it should be: I suspect just one of the object types actually ended up shrinking its effective allocation size. To get to the next level, we probably need specialized allocators that don't pad the allocation more than necessary. ] The separation makes for some code cleanup, though, and makes the ref tracking that fsck wants a clearly separate thing. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-06-18 13:51:27 -07:00
Linus Torvalds	885a86abe2	Shrink "struct object" a bit This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-06-17 18:49:18 -07:00
Linus Torvalds	3a7c352bd0	Make "tree_entry" have a SHA1 instead of a union of object pointers This is preparatory work for further cleanups, where we try to make tree_entry look more like the more efficient tree-walk descriptor. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-05-29 19:05:06 -07:00
Linus Torvalds	136f2e548a	Make "struct tree" contain the pointer to the tree buffer This allows us to avoid allocating information for names etc, because we can just use the information from the tree buffer directly. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-05-29 19:05:02 -07:00
Peter Eriksen	90321c106c	Replace xmalloc+memset(0) with xcalloc. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-04-04 00:11:19 -07:00
Peter Eriksen	8e44025925	Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-04-04 00:11:19 -07:00
Linus Torvalds	d7ee090d0d	Fix object re-hashing The hashed object lookup had a subtle bug in re-hashing: it did for (i = 0; i < count; i++) if (objs[i]) { .. rehash .. where "count" was the old hash couny. Oon the face of it is obvious, since it clearly re-hashes all the old objects. However, it's wrong. If the last old hash entry before re-hashing was in use (or became in use by the re-hashing), then when re-hashing could have inserted an object into the hash entries with idx >= count due to overflow. When we then rehash the last old entry, that old entry might become empty, which means that the overflow entries should be re-hashed again. In other words, the loop has to be fixed to either traverse the whole array, rather than just the old count. (There's room for a slight optimization: instead of counting all the way up, we can break when we see the first empty slot that is above the old "count". At that point we know we don't have any collissions that we might have to fix up any more. This patch only does the trivial fix) [jc: with trivial fix on trivial fix] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-02-12 11:24:50 -08:00
Junio C Hamano	2b796360ac	hashtable-based objects: minimum fixups. Calling hashtable_index from find_object before objs is created would result in division by zero failure. Avoid it. Also the given object name may not be aligned suitably for unsigned int; avoid dereferencing casted pointer. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-02-12 05:12:39 -08:00
Johannes Schindelin	070879ca93	Use a hashtable for objects instead of a sorted list In a simple test, this brings down the CPU time from 47 sec to 22 sec. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-02-12 05:12:39 -08:00
Junio C Hamano	8f1d2e6f49	[PATCH] Compilation: zero-length array declaration. ISO C99 (and GCC 3.x or later) lets you write a flexible array at the end of a structure, like this: struct frotz { int xyzzy; char nitfol[]; /* more / }; GCC 2.95 and 2.96 let you to do this with "char nitfol[0]"; unfortunately this is not allowed by ISO C90. This declares such construct like this: struct frotz { int xyzzy; char nitfol[FLEX_ARRAY]; / more */ }; and git-compat-util.h defines FLEX_ARRAY to 0 for gcc 2.95 and empty for others. If you are using a C90 C compiler, you should be able to override this with CFLAGS=-DFLEX_ARRAY=1 from the command line of "make". Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-01-07 10:51:06 -08:00
Junio C Hamano	e23eff8be9	qsort() ptrdiff_t may be larger than int Morten Welinder <mwelinder@gmail.com> writes: > The code looks wrong. It assumes that pointers are no larger than ints. > If pointers are larger than ints, the code does not necessarily compute > a consistent ordering and qsort is allowed to do whatever it wants. > > Morten > > static int compare_object_pointers(const void a, const void b) > { > const struct object * const pa = a; > const struct object const pb = b; > return pa - *pb; > } Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-12-06 17:28:26 -08:00
Sergey Vlasov	4a4e6fd74f	Rework object refs tracking to reduce memory usage Store pointers to referenced objects in a variable sized array instead of linked list. This cuts down memory usage of utilities which use object references; e.g., git-fsck-objects --full on the git.git repository consumes about 2 MB of memory tracked by Massif instead of 7 MB before the change. Object refs are still the biggest consumer of memory (57%), but the malloc overhead for a single block instead of a linked list is substantially smaller. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-11-15 11:42:29 -08:00
Linus Torvalds	8805ccac40	[PATCH] Avoid building object ref lists when not needed The object parsing code builds a generic "this object references that object" because doing a full connectivity check for fsck requires it. However, nothing else really needs it, and it's quite expensive for git-rev-list that can have tons of objects in flight. So, exactly like the commit buffer save thing, add a global flag to disable it, and use it in git-rev-list. Before: $ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 12.28user 0.29system 0:12.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+26718minor)pagefaults 0swaps 59124 After this change: $ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 10.33user 0.18system 0:10.54elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+18509minor)pagefaults 0swaps 59124 and note how the number of pages touched by git-rev-list for this particular object list has shrunk from 26,718 (104 MB) to 18,509 (72 MB). Calculating the total object difference between two git revisions is still clearly the most expensive git operation (both in memory and CPU time), but it's now less than 40% of what it used to be. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-09-16 15:32:23 -07:00
Daniel Barkalow	680bab3d9a	[PATCH] Add function to append to an object_list. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-09-10 18:27:40 -07:00
barkalow@iabervon.org	66e481b007	[PATCH] Object library enhancements Add function to look up an object which is entirely unknown, so that it can be put in a list. Various other functions related to lists of objects. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-08-02 22:53:07 -07:00
Junio C Hamano	c4584ae3fd	[PATCH] Remove "delta" object representation. Packed delta files created by git-pack-objects seems to be the way to go, and existing "delta" object handling code has exposed the object representation details to too many places. Remove it while we refactor code to come up with a proper interface in sha1_file.c. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-27 15:27:51 -07:00
Daniel Barkalow	89e4202f98	[PATCH] Parse tags for absent objects Handle parsing a tag for a non-present object. This adds a function to lookup an object with lookup_* for * in a string, so that it can get the right storage based on the "type" line in the tag. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:29:12 -07:00
Jason McMullan	5d6ccf5ce7	[PATCH] Anal retentive 'const unsigned char *sha1' Make 'sha1' parameters const where possible Signed-off-by: Jason McMullan <jason.mcmullan@timesys.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-08 13:04:53 -07:00
Linus Torvalds	bd1e17e245	Make "parse_object()" also fill in commit message buffer data. And teach fsck to free it to save memory.	2005-05-25 19:26:28 -07:00
Linus Torvalds	6b0c312106	Include file cleanups.. Add <limits.h> to the include files handled by "cache.h", and remove extraneous #include directives from various .c files. The rule is that "cache.h" gets all the basic stuff, so that we'll have as few system dependencies as possible.	2005-05-22 11:54:17 -07:00
Nicolas Pitre	d1af002dc6	[PATCH] delta check This adds knowledge of delta objects to fsck-cache and various object parsing code. A new switch to git-fsck-cache is provided to display the maximum delta depth found in a repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-20 15:41:45 -07:00
Nicolas Pitre	bd2c39f58f	[PATCH] don't load and decompress objects twice with parse_object() It turns out that parse_object() is loading and decompressing given object to free it just before calling the specific object parsing function which does mmap and decompress the same object again. This patch introduces the ability to parse specific objects directly from a memory buffer. Without this patch, running git-fsck-cache on the kernel repositorytake: real 0m13.006s user 0m11.421s sys 0m1.218s With this patch applied: real 0m8.060s user 0m7.071s sys 0m0.710s The performance increase is significant, and this is kind of a prerequisite for sane delta object support with fsck. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-06 11:02:01 -07:00
Sergey Vlasov	13019d4136	[PATCH] Fix memory leaks in git-fsck-cache This patch fixes memory leaks in parse_object() and related functions; these leaks were very noticeable when running git-fsck-cache. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-04 10:58:15 -07:00
Daniel Barkalow	e9eefa6761	[PATCH] Add function to parse an object of unspecified type (take 2) This adds a function that parses an object from the database when we have to look up its actual type. It also checks the hash of the file, due to its heritage as part of fsck-cache. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-04-28 07:46:33 -07:00
Christopher Li	812666c8e6	[PATCH] introduce xmalloc and xrealloc Introduce xmalloc and xrealloc to die gracefully with a descriptive message when out of memory, rather than taking a SIGSEGV. Signed-off-by: Christopher Li<chrislgit@chrisli.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-04-26 12:00:58 -07:00
Daniel Barkalow	175785e5ff	[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-04-18 11:39:48 -07:00

1 2 3 4 5

222 commits