126 lines
4.7 KiB
Markdown
126 lines
4.7 KiB
Markdown
---
|
|
obj: concept
|
|
repo: https://github.com/ulid/spec
|
|
aliases: ["Universally Unique Lexicographically Sortable Identifier"]
|
|
---
|
|
|
|
# ULID (Universally Unique Lexicographically Sortable Identifier)
|
|
UUID can be suboptimal for many use-cases because:
|
|
|
|
- It isn't the most character efficient way of encoding 128 bits of randomness
|
|
- UUID v1/v2 is impractical in many environments, as it requires access to a unique, stable MAC address
|
|
- UUID v3/v5 requires a unique seed and produces randomly distributed IDs, which can cause fragmentation in many data structures
|
|
- UUID v4 provides no other information than randomness which can cause fragmentation in many data structures
|
|
|
|
Instead, herein is proposed ULID:
|
|
|
|
```javascript
|
|
ulid() // 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
|
```
|
|
|
|
- 128-bit compatibility with UUID
|
|
- 1.21e+24 unique ULIDs per millisecond
|
|
- Lexicographically sortable!
|
|
- Canonically encoded as a 26 character string, as opposed to the 36 character UUID
|
|
- Uses Crockford's base32 for better efficiency and readability (5 bits per character)
|
|
- Case insensitive
|
|
- No special characters (URL safe)
|
|
- Monotonic sort order (correctly detects and handles the same millisecond)
|
|
|
|
## Specification
|
|
Below is the current specification of ULID as implemented in [ulid/javascript](https://github.com/ulid/javascript).
|
|
|
|
*Note: the binary format has not been implemented in JavaScript as of yet.*
|
|
|
|
```
|
|
01AN4Z07BY 79KA1307SR9X4MV3
|
|
|
|
|----------| |----------------|
|
|
Timestamp Randomness
|
|
48bits 80bits
|
|
```
|
|
|
|
### Components
|
|
|
|
**Timestamp**
|
|
- 48 bit integer
|
|
- UNIX-time in milliseconds
|
|
- Won't run out of space 'til the year 10889 AD.
|
|
|
|
**Randomness**
|
|
- 80 bits
|
|
- Cryptographically secure source of randomness, if possible
|
|
|
|
### Sorting
|
|
The left-most character must be sorted first, and the right-most character sorted last (lexical order). The default ASCII character set must be used. Within the same millisecond, sort order is not guaranteed
|
|
|
|
### Canonical String Representation
|
|
|
|
```
|
|
ttttttttttrrrrrrrrrrrrrrrr
|
|
|
|
where
|
|
t is Timestamp (10 characters)
|
|
r is Randomness (16 characters)
|
|
```
|
|
|
|
#### Encoding
|
|
Crockford's Base32 is used as shown. This alphabet excludes the letters I, L, O, and U to avoid confusion and abuse.
|
|
|
|
```
|
|
0123456789ABCDEFGHJKMNPQRSTVWXYZ
|
|
```
|
|
|
|
### Monotonicity
|
|
When generating a ULID within the same millisecond, we can provide some guarantees regarding sort order. Namely, if the same millisecond is detected, the `random` component is incremented by 1 bit in the least significant bit position (with carrying). For example:
|
|
|
|
```javascript
|
|
import { monotonicFactory } from 'ulid'
|
|
|
|
const ulid = monotonicFactory()
|
|
|
|
// Assume that these calls occur within the same millisecond
|
|
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRZ
|
|
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS0
|
|
```
|
|
|
|
If, in the extremely unlikely event that, you manage to generate more than $2^{80}$ ULIDs within the same millisecond, or cause the random component to overflow with less, the generation will fail.
|
|
|
|
```javascript
|
|
import { monotonicFactory } from 'ulid'
|
|
|
|
const ulid = monotonicFactory()
|
|
|
|
// Assume that these calls occur within the same millisecond
|
|
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRY
|
|
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRZ
|
|
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS0
|
|
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS1
|
|
...
|
|
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZX
|
|
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZY
|
|
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZZ
|
|
ulid() // throw new Error()!
|
|
```
|
|
|
|
#### Overflow Errors when Parsing Base32 Strings
|
|
Technically, a 26-character Base32 encoded string can contain 130 bits of information, whereas a ULID must only contain 128 bits. Therefore, the largest valid ULID encoded in Base32 is `7ZZZZZZZZZZZZZZZZZZZZZZZZZ`, which corresponds to an epoch time of `281474976710655` or $2^{48}-1$.
|
|
|
|
Any attempt to decode or encode a ULID larger than this should be rejected by all implementations, to prevent overflow bugs.
|
|
|
|
### Binary Layout and Byte Order
|
|
The components are encoded as 16 octets. Each component is encoded with the Most Significant Byte first (network byte order).
|
|
|
|
```
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| 32_bit_uint_time_high |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| 16_bit_uint_time_low | 16_bit_uint_random |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| 32_bit_uint_random |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| 32_bit_uint_random |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
```
|