doc hash-function-transition: use SHA-1 and SHA-256 consistently

Use SHA-1 and SHA-256 instead of sha1 and sha256  when referring
to the hash type.

Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Thomas Ackermann 2021-02-05 18:22:25 +00:00 committed by Junio C Hamano
parent de82095a95
commit af9b1e9aba

View file

@ -107,7 +107,7 @@ mapping to allow naming objects using either their SHA-1 and SHA-256 names
interchangeably.
"git cat-file" and "git hash-object" gain options to display an object
in its sha1 form and write an object given its sha1 form. This
in its SHA-1 form and write an object given its SHA-1 form. This
requires all objects referenced by that object to be present in the
object database so that they can be named using the appropriate name
(using the bidirectional hash mapping).
@ -115,7 +115,7 @@ object database so that they can be named using the appropriate name
Fetches from a SHA-1 based server convert the fetched objects into
SHA-256 form and record the mapping in the bidirectional mapping table
(see below for details). Pushes to a SHA-1 based server convert the
objects being pushed into sha1 form so the server does not have to be
objects being pushed into SHA-1 form so the server does not have to be
aware of the hash function the client is using.
Detailed Design
@ -151,38 +151,38 @@ repository extensions.
Object names
~~~~~~~~~~~~
Objects can be named by their 40 hexadecimal digit sha1-name or 64
hexadecimal digit sha256-name, plus names derived from those (see
Objects can be named by their 40 hexadecimal digit SHA-1 name or 64
hexadecimal digit SHA-256 name, plus names derived from those (see
gitrevisions(7)).
The sha1-name of an object is the SHA-1 of the concatenation of its
type, length, a nul byte, and the object's sha1-content. This is the
The SHA-1 name of an object is the SHA-1 of the concatenation of its
type, length, a nul byte, and the object's SHA-1 content. This is the
traditional <sha1> used in Git to name objects.
The sha256-name of an object is the SHA-256 of the concatenation of its
type, length, a nul byte, and the object's sha256-content.
The SHA-256 name of an object is the SHA-256 of the concatenation of its
type, length, a nul byte, and the object's SHA-256 content.
Object format
~~~~~~~~~~~~~
The content as a byte sequence of a tag, commit, or tree object named
by sha1 and sha256 differ because an object named by sha256-name refers to
other objects by their sha256-names and an object named by sha1-name
refers to other objects by their sha1-names.
by SHA-1 and SHA-256 differ because an object named by SHA-256 name refers to
other objects by their SHA-256 names and an object named by SHA-1 name
refers to other objects by their SHA-1 names.
The sha256-content of an object is the same as its sha1-content, except
that objects referenced by the object are named using their sha256-names
instead of sha1-names. Because a blob object does not refer to any
other object, its sha1-content and sha256-content are the same.
The SHA-256 content of an object is the same as its SHA-1 content, except
that objects referenced by the object are named using their SHA-256 names
instead of SHA-1 names. Because a blob object does not refer to any
other object, its SHA-1 content and SHA-256 content are the same.
The format allows round-trip conversion between sha256-content and
sha1-content.
The format allows round-trip conversion between SHA-256 content and
SHA-1 content.
Object storage
~~~~~~~~~~~~~~
Loose objects use zlib compression and packed objects use the packed
format described in Documentation/technical/pack-format.txt, just like
today. The content that is compressed and stored uses sha256-content
instead of sha1-content.
today. The content that is compressed and stored uses SHA-256 content
instead of SHA-1 content.
Pack index
~~~~~~~~~~
@ -287,18 +287,18 @@ To remove entries (e.g. in "git pack-refs" or "git-prune"):
Translation table
~~~~~~~~~~~~~~~~~
The index files support a bidirectional mapping between sha1-names
and sha256-names. The lookup proceeds similarly to ordinary object
lookups. For example, to convert a sha1-name to a sha256-name:
The index files support a bidirectional mapping between SHA-1 names
and SHA-256 names. The lookup proceeds similarly to ordinary object
lookups. For example, to convert a SHA-1 name to a SHA-256 name:
1. Look for the object in idx files. If a match is present in the
idx's sorted list of truncated sha1-names, then:
a. Read the corresponding entry in the sha1-name order to pack
idx's sorted list of truncated SHA-1 names, then:
a. Read the corresponding entry in the SHA-1 name order to pack
name order mapping.
b. Read the corresponding entry in the full sha1-name table to
b. Read the corresponding entry in the full SHA-1 name table to
verify we found the right object. If it is, then
c. Read the corresponding entry in the full sha256-name table.
That is the object's sha256-name.
c. Read the corresponding entry in the full SHA-256 name table.
That is the object's SHA-256 name.
2. Check for a loose object. Read lines from loose-object-idx until
we find a match.
@ -312,10 +312,10 @@ Since all operations that make new objects (e.g., "git commit") add
the new objects to the corresponding index, this mapping is possible
for all objects in the object store.
Reading an object's sha1-content
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The sha1-content of an object can be read by converting all sha256-names
its sha256-content references to sha1-names using the translation table.
Reading an object's SHA-1 content
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The SHA-1 content of an object can be read by converting all SHA-256 names
its SHA-256 content references to SHA-1 names using the translation table.
Fetch
~~~~~
@ -338,7 +338,7 @@ the following steps:
1. index-pack: inflate each object in the packfile and compute its
SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
objects the client has locally. These objects can be looked up
using the translation table and their sha1-content read as
using the translation table and their SHA-1 content read as
described above to resolve the deltas.
2. topological sort: starting at the "want"s from the negotiation
phase, walk through objects in the pack and emit a list of them,
@ -347,12 +347,12 @@ the following steps:
(This list only contains objects reachable from the "wants". If the
pack from the server contained additional extraneous objects, then
they will be discarded.)
3. convert to sha256: open a new (sha256) packfile. Read the topologically
3. convert to SHA-256: open a new SHA-256 packfile. Read the topologically
sorted list just generated. For each object, inflate its
sha1-content, convert to sha256-content, and write it to the sha256
pack. Record the new sha1<-->sha256 mapping entry for use in the idx.
SHA-1 content, convert to SHA-256 content, and write it to the SHA-256
pack. Record the new SHA-1<-->SHA-256 mapping entry for use in the idx.
4. sort: reorder entries in the new pack to match the order of objects
in the pack the server generated and include blobs. Write a sha256 idx
in the pack the server generated and include blobs. Write a SHA-256 idx
file
5. clean up: remove the SHA-1 based pack file, index, and
topologically sorted list obtained from the server in steps 1
@ -377,16 +377,16 @@ experimenting to get this to perform well.
Push
~~~~
Push is simpler than fetch because the objects referenced by the
pushed objects are already in the translation table. The sha1-content
pushed objects are already in the translation table. The SHA-1 content
of each object being pushed can be read as described in the "Reading
an object's sha1-content" section to generate the pack written by git
an object's SHA-1 content" section to generate the pack written by git
send-pack.
Signed Commits
~~~~~~~~~~~~~~
We add a new field "gpgsig-sha256" to the commit object format to allow
signing commits without relying on SHA-1. It is similar to the
existing "gpgsig" field. Its signed payload is the sha256-content of the
existing "gpgsig" field. Its signed payload is the SHA-256 content of the
commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
This means commits can be signed
@ -404,7 +404,7 @@ Signed Tags
~~~~~~~~~~~
We add a new field "gpgsig-sha256" to the tag object format to allow
signing tags without relying on SHA-1. Its signed payload is the
sha256-content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
SIGNATURE-----" delimited in-body signature removed.
This means tags can be signed
@ -416,11 +416,11 @@ This means tags can be signed
Mergetag embedding
~~~~~~~~~~~~~~~~~~
The mergetag field in the sha1-content of a commit contains the
sha1-content of a tag that was merged by that commit.
The mergetag field in the SHA-1 content of a commit contains the
SHA-1 content of a tag that was merged by that commit.
The mergetag field in the sha256-content of the same commit contains the
sha256-content of the same tag.
The mergetag field in the SHA-256 content of the same commit contains the
SHA-256 content of the same tag.
Submodules
~~~~~~~~~~
@ -495,7 +495,7 @@ Caveats
-------
Invalid objects
~~~~~~~~~~~~~~~
The conversion from sha1-content to sha256-content retains any
The conversion from SHA-1 content to SHA-256 content retains any
brokenness in the original object (e.g., tree entry modes encoded with
leading 0, tree objects whose paths are not sorted correctly, and
commit objects without an author or committer). This is a deliberate
@ -514,15 +514,15 @@ allow lifting this restriction.
Alternates
~~~~~~~~~~
For the same reason, a sha256 repository cannot borrow objects from a
sha1 repository using objects/info/alternates or
For the same reason, a SHA-256 repository cannot borrow objects from a
SHA-1 repository using objects/info/alternates or
$GIT_ALTERNATE_OBJECT_REPOSITORIES.
git notes
~~~~~~~~~
The "git notes" tool annotates objects using their sha1-name as key.
The "git notes" tool annotates objects using their SHA-1 name as key.
This design does not describe a way to migrate notes trees to use
sha256-names. That migration is expected to happen separately (for
SHA-256 names. That migration is expected to happen separately (for
example using a file at the root of the notes tree to describe which
hash it uses).
@ -556,7 +556,7 @@ unclear:
Git 2.12
Does this mean Git v2.12.0 is the commit with sha1-name
Does this mean Git v2.12.0 is the commit with SHA-1 name
e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with
new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7?
@ -676,7 +676,7 @@ The next step is supporting fetches and pushes to SHA-1 repositories:
- allow pushes to a repository using the compat format
- generate a topologically sorted list of the SHA-1 names of fetched
objects
- convert the fetched packfile to sha256 format and generate an idx
- convert the fetched packfile to SHA-256 format and generate an idx
file
- re-sort to match the order of objects in the fetched packfile
@ -748,38 +748,38 @@ using the old hash function.
Signed objects with multiple hashes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Instead of introducing the gpgsig-sha256 field in commit and tag objects
for sha256-content based signatures, an earlier version of this design
added "hash sha256 <sha256-name>" fields to strengthen the existing
sha1-content based signatures.
for SHA-256 content based signatures, an earlier version of this design
added "hash sha256 <SHA-256 name>" fields to strengthen the existing
SHA-1 content based signatures.
In other words, a single signature was used to attest to the object
content using both hash functions. This had some advantages:
* Using one signature instead of two speeds up the signing process.
* Having one signed payload with both hashes allows the signer to
attest to the sha1-name and sha256-name referring to the same object.
attest to the SHA-1 name and SHA-256 name referring to the same object.
* All users consume the same signature. Broken signatures are likely
to be detected quickly using current versions of git.
However, it also came with disadvantages:
* Verifying a signed object requires access to the sha1-names of all
* Verifying a signed object requires access to the SHA-1 names of all
objects it references, even after the transition is complete and
translation table is no longer needed for anything else. To support
this, the design added fields such as "hash sha1 tree <sha1-name>"
and "hash sha1 parent <sha1-name>" to the sha256-content of a signed
this, the design added fields such as "hash sha1 tree <SHA-1 name>"
and "hash sha1 parent <SHA-1 name>" to the SHA-256 content of a signed
commit, complicating the conversion process.
* Allowing signed objects without a sha1 (for after the transition is
* Allowing signed objects without a SHA-1 (for after the transition is
complete) complicated the design further, requiring a "nohash sha1"
field to suppress including "hash sha1" fields in the sha256-content
field to suppress including "hash sha1" fields in the SHA-256 content
and signed payload.
Lazily populated translation table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some of the work of building the translation table could be deferred to
push time, but that would significantly complicate and slow down pushes.
Calculating the sha1-name at object creation time at the same time it is
being streamed to disk and having its sha256-name calculated should be
Calculating the SHA-1 name at object creation time at the same time it is
being streamed to disk and having its SHA-256 name calculated should be
an acceptable cost.
Document History
@ -801,7 +801,7 @@ Incorporated suggestions from jonathantanmy and sbeller:
2017-03-06 jrnieder@gmail.com
* Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2]
* Make sha3-based signatures a separate field, avoiding the need for
* Make SHA3-based signatures a separate field, avoiding the need for
"hash" and "nohash" fields (thanks to peff[3]).
* Add a sorting phase to fetch (thanks to Junio for noticing the need
for this).