Update TOpic script to show how old they are.

Signed-off-by: Junio C Hamano <junkio@cox.net>
This commit is contained in:
Junio C Hamano 2006-02-16 01:32:23 -08:00
parent 8cbf8eaf63
commit 99bd27ebe7
3 changed files with 199 additions and 6 deletions

125
ClonePlus.txt Normal file
View file

@ -0,0 +1,125 @@
From: Junio C Hamano <junkio@cox.net>
Subject: Re: Make "git clone" less of a deathly quiet experience
Date: Sun, 12 Feb 2006 19:36:41 -0800
Message-ID: <7v4q3453qu.fsf@assigned-by-dhcp.cox.net>
References: <Pine.LNX.4.64.0602102018250.3691@g5.osdl.org>
<7vwtg2o37c.fsf@assigned-by-dhcp.cox.net>
<Pine.LNX.4.64.0602110943170.3691@g5.osdl.org>
<1139685031.4183.31.camel@evo.keithp.com> <43EEAEF3.7040202@op5.se>
<1139717510.4183.34.camel@evo.keithp.com>
<46a038f90602121806jfcaac41tb98b8b4cd4c07c23@mail.gmail.com>
Content-Type: text/plain; charset=us-ascii
Cc: Keith Packard <keithp@keithp.com>, Andreas Ericsson <ae@op5.se>,
Linus Torvalds <torvalds@osdl.org>,
Git Mailing List <git@vger.kernel.org>,
Petr Baudis <pasky@suse.cz>
Return-path: <git-owner@vger.kernel.org>
In-Reply-To: <46a038f90602121806jfcaac41tb98b8b4cd4c07c23@mail.gmail.com>
(Martin Langhoff's message of "Mon, 13 Feb 2006 15:06:42 +1300")
Martin Langhoff <martin.langhoff@gmail.com> writes:
> +1... there should be an easy-to-compute threshold trigger to say --
> hey, let's quit being smart and send this client the packs we got and
> get it over with. Or perhaps a client flag so large projects can
> recommend that uses do their initial clone with --gimme-all-packs?
What upload-pack does boils down to:
* find out the latest of what client has and what client asked.
* run "rev-list --objects ^client ours" to make a list of
objects client needs. The actual command line has multiple
"clients" to exclude what is unneeded to be sent, and
multiple "ours" to include refs asked. When you are doing
a full clone, ^client is empty and ours is essentially
--all.
* feed that output to "pack-objects --stdout" and send out
the result.
If you run this command:
$ git-rev-list --objects --all |
git-pack-objects --stdout >/dev/null
It would say some things. The phases of operations are:
Generating pack...
Counting objects XXXX...
Done counting XXXX objects.
Packing XXXXX objects.....
Phase (1). Between the time it says "Generating pack..." upto
"Done counting XXXX objects.", the time is spent by rev-list to
list up all the objects to be sent out.
Phase (2). After that, it tries to make decision what object to
delta against what other object, while twenty or so dots are
printed after "Packing XXXXX objects." (see #git irc log a
couple of days ago; Linus describes how pack building works).
Phase (3). After the dot stops, the program becomes silent.
That is where it actually does delta compression and writeout.
You would notice that quite a lot of time is spent in all
phases.
There is an internal hook to create full repository pack inside
upload-pack (which is what runs on the other end when you run
fetch-pack or clone-pack), but it works slightly differently
from what you are suggesting, in that it still tries to do the
"correct" thing. It still runs "rev-list --objects --all", so
"dangling objects" are never sent out.
We could cheat in all phases to speed things up, at the expense
of ending up sending excess objects. So let's pretend we
decided to treat everything in .git/objects/packs/pack-* (and
the ones found in alternates as well) have interesting objects
for the cloner.
(1) This part unfortunately cannot be totally eliminated. By
assume all packs are interesting, we could use the object
names from the pack index, which is a lot cheaper than
rev-list object traversal. We still need to run rev-list
--objects --all --unpacked to pick up loose objects we would
not be able to tell by looking at the pack index to cover
the rest.
This however needs to be done in conjunction with the second
phase change. pack-objects depends on the hint rev-list
--objects output gives it to group the blobs and trees with
the same pathnames together, and that greatly affects the
packing efficiency. Unfortunately pack index does not have
that information -- it does not know type, nor pathnames.
Type is relatively cheap to obtain but pathnames for blob
objects are inherently unavailable.
(2) This part can be mostly eliminated for already packed
objects, because we have already decided to cheat by sending
everything, so we can just reuse how objects are deltified
in existing packs. It still needs to be done for loose
objects we collected to fill the gap in (1).
(3) This also can be sped up by reusing what are already in
packs. Pack index records starting (but not end) offset of
each object in the pack, so we can sort by offset to find
out which part of the existing pack corresponds to what
object, to reorder the objects in the final pack. This
needs to be done somewhat carefully to preserve the locality
of objects (again, see #git log). The deltifying and
compressing for loose objects cannot be avoided.
While we are writing things out in (3), we need to keep
track of running SHA1 sum of what we write out so that we
can fill out the correct checksum at the end, but I am
guessing that is relatively cheap compared to the
deltification and compression cost we are currently paying
in this phase.
NB. In the #git log, Linus made it sound like I am clueless
about how pack is generated, but if you check commit 9d5ab96,
the "recency of delta is inherited from base", one of the tricks
that have a big performance impact, was done by me ;-).

64
ResettingPaths.txt Normal file
View file

@ -0,0 +1,64 @@
From: Junio C Hamano <junkio@cox.net>
Subject: Resetting paths
Date: Thu, 09 Feb 2006 20:40:15 -0800
Message-ID: <7vlkwjzv0w.fsf@assigned-by-dhcp.cox.net>
Content-Type: text/plain; charset=us-ascii
Return-path: <git-owner@vger.kernel.org>
While working on "assume unchanged" git series, I found one
thing missing from the current set of tools.
While I worked on parts of the system that deals with the cached
lstat() information, I needed a way to debug that, so I hacked
ls-files -t option to show entries marked as "always matches the
index" with lowercase tag letters. This was primarily debugging
aid hack.
Then I committed the whole thing with "git commit -a" by
mistake. In order to rewind the HEAD to pre-commit state, I can
say "git reset --soft HEAD^", but after doing that, now I want
to unupdate the index so that ls-files.c matches the pre-commit
HEAD.
"git reset --mixed" is a heavy-handed tool for that. It reads
the entier index from the HEAD commit without touching the
working tree, so I would need to add the modified paths back
with "git update-index".
The low-level voodoo to do so for this particular case is this
single liner:
git ls-tree HEAD ls-files.c | git update-index --index-info
Have people found themselves in similar need like this? This
could take different forms.
* you did "git update-index" on a wrong path. This is my
example and the above voodoo is a recipe for recovery.
* you did "git add" on a wrong path and you want to remove it.
This is easier than the above:
git update-index --force-remove path
* you did the above recovery from "git add" on a wrong path,
and you want to add it again. The same voodoo would work in
this case as well.
git ls-tree HEAD path | git update-index --index-info
We could add "git reset path..." to reduce typing for the above,
but I am wondering if it is worth it.
BTW, this shows how "index centric" git is. With other SCM that
has only the last commit and the working tree files, you do not
have to worry any of these things, so it might appear that index
is just a nuisance. But if you do not have any "registry of
paths to be committed", you cannot do a partial commit like what
I did above ("commit changes to all files other than
ls-files.c") without listing all the paths to be committed, or
fall back on CVS style "one path at a time", breaking an atomic
commit, so there is a drawback for not having an index as well.

16
TO
View file

@ -37,7 +37,7 @@ sed -n \
-e '/^[^\/][^\/]\//p' |
while read topic
do
rebase= done= not_done= trouble=
rebase= done= not_done= trouble= date=
# (1)
only_next_1=`git-rev-list ^master "^$topic" ${next} | sort`
@ -55,16 +55,14 @@ do
# (2)
not_in_master=`
git-rev-list --pretty=oneline ^master "$topic" |
sed -e 's/^[0-9a-f]* //'
git-rev-list ^master "$topic"
`
test -z "$not_in_master" &&
done="${LF}Fully merged -- delete."
# (3)
not_in_next=`
git-rev-list --pretty=oneline ^${next} "$topic" |
sed -e 's/^[0-9a-f]* / - /'
git-rev-list --pretty=oneline ^${next} "$topic"
`
if test -n "$not_in_next"
then
@ -72,6 +70,12 @@ do
then
trouble="${LF}### MODIFIED AFTER COOKED ###"
fi
last=`expr "$not_in_next" : '\([0-9a-f]*\) '`
date=`
git-rev-list -1 --pretty "$last" |
sed -ne 's/^Date: *\(.*\)/ (\1)/p'
`
not_in_next=`echo "$not_in_next" | sed -e 's/^[0-9a-f]* / - /'`
not_done="${LF}Still not merged in ${next}$rebase.$LF$not_in_next"
elif test -n "$done"
then
@ -80,7 +84,7 @@ do
not_done="${LF}Up to date."
fi
echo "*** $topic ***$trouble$done$not_done"
echo "*** $topic ***$date$trouble$done$not_done"
if test -z "$trouble$not_done" &&
test -n "$done" &&