Merge branch 'ja/doc-markup-cleanup'

Doc cleanup.

* ja/doc-markup-cleanup:
  doc: indent multi-line items in list
  doc: remove non pure ASCII characters
This commit is contained in:
Junio C Hamano 2019-12-25 11:22:00 -08:00
commit dfee504bee
3 changed files with 131 additions and 122 deletions

View file

@ -61,7 +61,7 @@ Possible status letters are:
- R: renaming of a file
- T: change in the type of the file
- U: file is unmerged (you must complete the merge before it can
be committed)
be committed)
- X: "unknown" change type (most probably a bug, please report it)
Status letters C and R are always followed by a score (denoting the

View file

@ -268,9 +268,9 @@ or `--mirror` is given)
All submodules which are cloned will be shallow with a depth of 1.
--[no-]remote-submodules::
All submodules which are cloned will use the status of the submodules
All submodules which are cloned will use the status of the submodule's
remote-tracking branch to update the submodule, rather than the
superprojects recorded SHA-1. Equivalent to passing `--remote` to
superproject's recorded SHA-1. Equivalent to passing `--remote` to
`git submodule update`.
--separate-git-dir=<git dir>::

View file

@ -466,13 +466,13 @@ The performance of git-filter-branch is glacially slow; its design makes it
impossible for a backward-compatible implementation to ever be fast:
* In editing files, git-filter-branch by design checks out each and
every commit as it existed in the original repo. If your repo has 10\^5
files and 10\^5 commits, but each commit only modifies 5 files, then
git-filter-branch will make you do 10\^10 modifications, despite only
having (at most) 5*10^5 unique blobs.
every commit as it existed in the original repo. If your repo has
10\^5 files and 10\^5 commits, but each commit only modifies 5
files, then git-filter-branch will make you do 10\^10 modifications,
despite only having (at most) 5*10^5 unique blobs.
* If you try and cheat and try to make git-filter-branch only work on
files modified in a commit, then two things happen
files modified in a commit, then two things happen
** you run into problems with deletions whenever the user is simply
trying to rename files (because attempting to delete files that
@ -481,39 +481,41 @@ files modified in a commit, then two things happen
user-provided shell)
** even if you succeed at the map-deletes-for-renames chicanery, you
still technically violate backward compatibility because users are
allowed to filter files in ways that depend upon topology of
commits instead of filtering solely based on file contents or names
(though this has not been observed in the wild).
still technically violate backward compatibility because users
are allowed to filter files in ways that depend upon topology of
commits instead of filtering solely based on file contents or
names (though this has not been observed in the wild).
* Even if you don't need to edit files but only want to e.g. rename or
remove some and thus can avoid checking out each file (i.e. you can use
--index-filter), you still are passing shell snippets for your filters.
This means that for every commit, you have to have a prepared git repo
where those filters can be run. That's a significant setup.
remove some and thus can avoid checking out each file (i.e. you can
use --index-filter), you still are passing shell snippets for your
filters. This means that for every commit, you have to have a
prepared git repo where those filters can be run. That's a
significant setup.
* Further, several additional files are created or updated per commit by
git-filter-branch. Some of these are for supporting the convenience
functions provided by git-filter-branch (such as map()), while others
are for keeping track of internal state (but could have also been
accessed by user filters; one of git-filter-branch's regression tests
does so). This essentially amounts to using the filesystem as an IPC
mechanism between git-filter-branch and the user-provided filters.
Disks tend to be a slow IPC mechanism, and writing these files also
effectively represents a forced synchronization point between separate
processes that we hit with every commit.
* Further, several additional files are created or updated per commit
by git-filter-branch. Some of these are for supporting the
convenience functions provided by git-filter-branch (such as map()),
while others are for keeping track of internal state (but could have
also been accessed by user filters; one of git-filter-branch's
regression tests does so). This essentially amounts to using the
filesystem as an IPC mechanism between git-filter-branch and the
user-provided filters. Disks tend to be a slow IPC mechanism, and
writing these files also effectively represents a forced
synchronization point between separate processes that we hit with
every commit.
* The user-provided shell commands will likely involve a pipeline of
commands, resulting in the creation of many processes per commit.
Creating and running another process takes a widely varying amount of
time between operating systems, but on any platform it is very slow
relative to invoking a function.
commands, resulting in the creation of many processes per commit.
Creating and running another process takes a widely varying amount
of time between operating systems, but on any platform it is very
slow relative to invoking a function.
* git-filter-branch itself is written in shell, which is kind of slow.
This is the one performance issue that could be backward-compatibly
fixed, but compared to the above problems that are intrinsic to the
design of git-filter-branch, the language of the tool itself is a
relatively minor issue.
This is the one performance issue that could be backward-compatibly
fixed, but compared to the above problems that are intrinsic to the
design of git-filter-branch, the language of the tool itself is a
relatively minor issue.
** Side note: Unfortunately, people tend to fixate on the
written-in-shell aspect and periodically ask if git-filter-branch
@ -546,51 +548,55 @@ easily corrupt repos or end up with a mess worse than what you started
with:
* Someone can have a set of "working and tested filters" which they
document or provide to a coworker, who then runs them on a different OS
where the same commands are not working/tested (some examples in the
git-filter-branch manpage are also affected by this). BSD vs. GNU
userland differences can really bite. If lucky, error messages are
spewed. But just as likely, the commands either don't do the filtering
requested, or silently corrupt by making some unwanted change. The
unwanted change may only affect a few commits, so it's not necessarily
obvious either. (The fact that problems won't necessarily be obvious
means they are likely to go unnoticed until the rewritten history is in
use for quite a while, at which point it's really hard to justify
another flag-day for another rewrite.)
document or provide to a coworker, who then runs them on a different
OS where the same commands are not working/tested (some examples in
the git-filter-branch manpage are also affected by this).
BSD vs. GNU userland differences can really bite. If lucky, error
messages are spewed. But just as likely, the commands either don't
do the filtering requested, or silently corrupt by making some
unwanted change. The unwanted change may only affect a few commits,
so it's not necessarily obvious either. (The fact that problems
won't necessarily be obvious means they are likely to go unnoticed
until the rewritten history is in use for quite a while, at which
point it's really hard to justify another flag-day for another
rewrite.)
* Filenames with spaces are often mishandled by shell snippets since
they cause problems for shell pipelines. Not everyone is familiar with
find -print0, xargs -0, git-ls-files -z, etc. Even people who are
familiar with these may assume such flags are not relevant because
someone else renamed any such files in their repo back before the person
doing the filtering joined the project. And often, even those familiar
with handling arguments with spaces may not do so just because they
aren't in the mindset of thinking about everything that could possibly
go wrong.
they cause problems for shell pipelines. Not everyone is familiar
with find -print0, xargs -0, git-ls-files -z, etc. Even people who
are familiar with these may assume such flags are not relevant
because someone else renamed any such files in their repo back
before the person doing the filtering joined the project. And
often, even those familiar with handling arguments with spaces may
not do so just because they aren't in the mindset of thinking about
everything that could possibly go wrong.
* Non-ascii filenames can be silently removed despite being in a desired
directory. Keeping only wanted paths is often done using pipelines like
`git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`. ls-files will
only quote filenames if needed, so folks may not notice that one of the
files didn't match the regex (at least not until it's much too late).
Yes, someone who knows about core.quotePath can avoid this (unless they
have other special characters like \t, \n, or "), and people who use
ls-files -z with something other than grep can avoid this, but that
doesn't mean they will.
* Non-ascii filenames can be silently removed despite being in a
desired directory. Keeping only wanted paths is often done using
pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
ls-files will only quote filenames if needed, so folks may not
notice that one of the files didn't match the regex (at least not
until it's much too late). Yes, someone who knows about
core.quotePath can avoid this (unless they have other special
characters like \t, \n, or "), and people who use ls-files -z with
something other than grep can avoid this, but that doesn't mean they
will.
* Similarly, when moving files around, one can find that filenames with
non-ascii or special characters end up in a different directory, one
that includes a double quote character. (This is technically the same
issue as above with quoting, but perhaps an interesting different way
that it can and has manifested as a problem.)
* Similarly, when moving files around, one can find that filenames
with non-ascii or special characters end up in a different
directory, one that includes a double quote character. (This is
technically the same issue as above with quoting, but perhaps an
interesting different way that it can and has manifested as a
problem.)
* It's far too easy to accidentally mix up old and new history. It's
still possible with any tool, but git-filter-branch almost invites it.
If lucky, the only downside is users getting frustrated that they don't
know how to shrink their repo and remove the old stuff. If unlucky,
they merge old and new history and end up with multiple "copies" of each
commit, some of which have unwanted or sensitive files and others which
don't. This comes about in multiple different ways:
still possible with any tool, but git-filter-branch almost
invites it. If lucky, the only downside is users getting frustrated
that they don't know how to shrink their repo and remove the old
stuff. If unlucky, they merge old and new history and end up with
multiple "copies" of each commit, some of which have unwanted or
sensitive files and others which don't. This comes about in
multiple different ways:
** the default to only doing a partial history rewrite ('--all' is not
the default and few examples show it)
@ -609,8 +615,8 @@ don't. This comes about in multiple different ways:
"DISCUSSION" section of the git filter-repo manual page for more
details.
* Annotated tags can be accidentally converted to lightweight tags, due
to either of two issues:
* Annotated tags can be accidentally converted to lightweight tags,
due to either of two issues:
** Someone can do a history rewrite, realize they messed up, restore
from the backups in refs/original/, and then redo their
@ -623,71 +629,74 @@ to either of two issues:
restored from refs/original/ in a previously botched rewrite).
* Any commit messages that specify an encoding will become corrupted
by the rewrite; git-filter-branch ignores the encoding, takes the original
bytes, and feeds it to commit-tree without telling it the proper
encoding. (This happens whether or not --msg-filter is used.)
by the rewrite; git-filter-branch ignores the encoding, takes the
original bytes, and feeds it to commit-tree without telling it the
proper encoding. (This happens whether or not --msg-filter is
used.)
* Commit messages (even if they are all UTF-8) by default become
corrupted due to not being updated -- any references to other commit
hashes in commit messages will now refer to no-longer-extant commits.
corrupted due to not being updated -- any references to other commit
hashes in commit messages will now refer to no-longer-extant
commits.
* There are no facilities for helping users find what unwanted crud they
should delete, which means they are much more likely to have incomplete
or partial cleanups that sometimes result in confusion and people
wasting time trying to understand. (For example, folks tend to just
look for big files to delete instead of big directories or extensions,
and once they do so, then sometime later folks using the new repository
who are going through history will notice a build artifact directory
that has some files but not others, or a cache of dependencies
(node_modules or similar) which couldn't have ever been functional since
it's missing some files.)
* There are no facilities for helping users find what unwanted crud
they should delete, which means they are much more likely to have
incomplete or partial cleanups that sometimes result in confusion
and people wasting time trying to understand. (For example, folks
tend to just look for big files to delete instead of big directories
or extensions, and once they do so, then sometime later folks using
the new repository who are going through history will notice a build
artifact directory that has some files but not others, or a cache of
dependencies (node_modules or similar) which couldn't have ever been
functional since it's missing some files.)
* If --prune-empty isn't specified, then the filtering process can
create hoards of confusing empty commits
create hoards of confusing empty commits
* If --prune-empty is specified, then intentionally placed empty
commits from before the filtering operation are also pruned instead of
just pruning commits that became empty due to filtering rules.
commits from before the filtering operation are also pruned instead
of just pruning commits that became empty due to filtering rules.
* If --prune-empty is specified, sometimes empty commits are missed
and left around anyway (a somewhat rare bug, but it happens...)
and left around anyway (a somewhat rare bug, but it happens...)
* A minor issue, but users who have a goal to update all names and
emails in a repository may be led to --env-filter which will only update
authors and committers, missing taggers.
emails in a repository may be led to --env-filter which will only
update authors and committers, missing taggers.
* If the user provides a --tag-name-filter that maps multiple tags to
the same name, no warning or error is provided; git-filter-branch simply
overwrites each tag in some undocumented pre-defined order resulting in
only one tag at the end. (A git-filter-branch regression test requires
this surprising behavior.)
the same name, no warning or error is provided; git-filter-branch
simply overwrites each tag in some undocumented pre-defined order
resulting in only one tag at the end. (A git-filter-branch
regression test requires this surprising behavior.)
Also, the poor performance of git-filter-branch often leads to safety
issues:
* Coming up with the correct shell snippet to do the filtering you want
is sometimes difficult unless you're just doing a trivial modification
such as deleting a couple files. Unfortunately, people often learn if
the snippet is right or wrong by trying it out, but the rightness or
wrongness can vary depending on special circumstances (spaces in
filenames, non-ascii filenames, funny author names or emails, invalid
timezones, presence of grafts or replace objects, etc.), meaning they
may have to wait a long time, hit an error, then restart. The
performance of git-filter-branch is so bad that this cycle is painful,
reducing the time available to carefully re-check (to say nothing about
what it does to the patience of the person doing the rewrite even if
they do technically have more time available). This problem is extra
compounded because errors from broken filters may not be shown for a
long time and/or get lost in a sea of output. Even worse, broken
filters often just result in silent incorrect rewrites.
* Coming up with the correct shell snippet to do the filtering you
want is sometimes difficult unless you're just doing a trivial
modification such as deleting a couple files. Unfortunately, people
often learn if the snippet is right or wrong by trying it out, but
the rightness or wrongness can vary depending on special
circumstances (spaces in filenames, non-ascii filenames, funny
author names or emails, invalid timezones, presence of grafts or
replace objects, etc.), meaning they may have to wait a long time,
hit an error, then restart. The performance of git-filter-branch is
so bad that this cycle is painful, reducing the time available to
carefully re-check (to say nothing about what it does to the
patience of the person doing the rewrite even if they do technically
have more time available). This problem is extra compounded because
errors from broken filters may not be shown for a long time and/or
get lost in a sea of output. Even worse, broken filters often just
result in silent incorrect rewrites.
* To top it all off, even when users finally find working commands, they
naturally want to share them. But they may be unaware that their repo
didn't have some special cases that someone else's does. So, when
someone else with a different repository runs the same commands, they
get hit by the problems above. Or, the user just runs commands that
really were vetted for special cases, but they run it on a different OS
where it doesn't work, as noted above.
* To top it all off, even when users finally find working commands,
they naturally want to share them. But they may be unaware that
their repo didn't have some special cases that someone else's does.
So, when someone else with a different repository runs the same
commands, they get hit by the problems above. Or, the user just
runs commands that really were vetted for special cases, but they
run it on a different OS where it doesn't work, as noted above.
GIT
---