Skip to content

Commit

Permalink
Add 'formal' spec for typeid (#52)
Browse files Browse the repository at this point in the history
Having a more formal spec is helpful for third-party authors of
additional libraries.
I also add tests to make sure our `go` implementation complies with the
spec.
  • Loading branch information
loreto committed Jun 30, 2023
1 parent 883132d commit f7514c8
Show file tree
Hide file tree
Showing 6 changed files with 248 additions and 5 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ in Stripe's APIs.
TypeIDs are canonically encoded as lowercase strings consisting of three parts:
1. A type prefix (at most 63 characters in all lowercase ASCII [a-z])
2. An underscore '_' separator
3. A 128-bit UUIDv7 encoded as a 26-character string in base32 (using [Crockford's alphabet](https://www.crockford.com/base32.html) in lowercase).
3. A 128-bit UUIDv7 encoded as a 26-character string using a modified base32 encoding.

Here's an example of a TypeID of type `user`:

Expand All @@ -21,6 +21,8 @@ Here's an example of a TypeID of type `user`:
type uuid suffix (base32)
```

A [formal specification](./spec.md) defines the encoding in more detail.

## Benefits
+ **Type-safe:** you can't accidentally use a `user` ID where a `post` ID is expected. When debugging, you can
immediately understand what type of entity a TypeID refers to thanks to the type prefix.
Expand All @@ -31,13 +33,13 @@ Here's an example of a TypeID of type `user`:
selected for copy-pasting by double-clicking, and is a more compact encoding than the traditional hex encoding used by UUIDs (26 characters vs 36 characters).

## Implementations
Implementations should adhere to the formal [specification](./spec.md).

### Official Implementations by `jetpack.io`
| Language | Status |
| -------- | ------ |
| [Go](https://github.com/jetpack-io/typeid-go) | ✓ Implemented |
| Python | ... Coming Soon |
| Rust | ... Coming Soon |
| [SQL](https://github.com/jetpack-io/typeid-sql) | ✓ Implemented |
| [TypeScript](https://github.com/jetpack-io/typeid-ts) | ✓ Implemented |

Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ go 1.20

require (
github.com/spf13/cobra v1.7.0
go.jetpack.io/typeid v0.0.0-20230614212614-fe4f463275f1
go.jetpack.io/typeid v0.0.0-20230629192725-341e2b135e06
)

require (
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ github.com/spf13/cobra v1.7.0/go.mod h1:uLxZILRyS/50WlhOIKD7W6V5bgeIt+4sICxh6uRM
github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
go.jetpack.io/typeid v0.0.0-20230614212614-fe4f463275f1 h1:JZqIj87LLFIExe9E9SWQlQmiaBK7HfVk1Wse2UxD29I=
go.jetpack.io/typeid v0.0.0-20230614212614-fe4f463275f1/go.mod h1:5BRM/qwZR4JjQ6X0q2Ihcc3H7DHXkuMl5XLAJa2SF6k=
go.jetpack.io/typeid v0.0.0-20230629192725-341e2b135e06 h1:Kn+6UV9ARBlIxAHjz7nXNyIKsY+qNfP6H6+a8s7nIEo=
go.jetpack.io/typeid v0.0.0-20230629192725-341e2b135e06/go.mod h1:R2tsu0u8ZSmdJdnzsTZ5YsbE9fFlBeYj/oIcRfgdL7k=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
97 changes: 97 additions & 0 deletions spec/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# TypeID Specification (Version 0.1.0)

## Overview
TypeIDs are a type-safe extension of UUIDv7, they encode UUIDs in base32 and add a type suffix.

Here's an example of a TypeID of type `user`:

```
user_2x4y6z8a0b1c2d3e4f5g6h7j8k
└──┘ └────────────────────────┘
type uuid suffix (base32)
```

This document formalizes the specification for TypeIDs.

## Specification

A typeid consists of three parts:
1. A **type prefix**: a string denoting the type of the ID. The prefix should be
at most 63 characters in all lowercase ASCII [a-z].
1. A **separator**: an underscore '_' character.
1. A **UUID suffix**: a 128-bit UUIDv7 encoded as a 26-character string in base32.

### Type Prefix
A type prefix is a string denoting the type of the ID. The prefix should be at most
63 characters in all lowercase ASCII [a-z]. Valid prefixes should match the following
regex: `[a-z]{0,63}`.

The empty string is a valid prefix, it's there for very specific use cases in which
applications need to encode a typeid but elide the type information. In general though,
applications should use a prefix that is at least 3 characters long.

> Note: [There's a proposal](https://github.com/jetpack-io/typeid/issues/7) to add `_` as
> an allowed separator within type prefixes.
### Separator
The separator is a single underscore character `_`. If the prefix is empty, the separator
is omitted.

### UUID Suffix
The UUID suffix encodes exactly 128-bits of data in 26 characters. It uses the base32
encoding described below.

#### Base32 Encoding
Bytes from the UUID are encoded from left to right. Two zeroed bits are pre-pended
to the 128-bits of the UUID, resulting in 130-bits of data. The 130-bits are then
split into 5-bit chunks, and each chunk is encoded as a single character in the
base32 alphabet, resulting in a total of 26 characters.

In practice this is most often done by using bit-shifting and a lookup table. See
the [reference implementation encoding](https://github.com/jetpack-io/typeid-go/blob/main/base32/base32.go)
for an example.

Note that this is different from the standard base32 encoding which encodes in
groups of 5 bytes (40 bits) and appends any padding at the end of the data.

The encoding uses the following alphabet `0123456789abcdefghjkmnpqrstvwxyz` as
specified by the following table:

| Value | Symbol | Value | Symbol | Value | Symbol | Value | Symbol |
|-------|--------|-------|--------|-------|--------|-------|--------|
| 0 | 0 | 8 | 8 | 16 | g | 24 | r |
| 1 | 1 | 9 | 9 | 17 | h | 25 | s |
| 2 | 2 | 10 | a | 18 | j | 26 | t |
| 3 | 3 | 11 | b | 19 | k | 27 | v |
| 4 | 4 | 12 | c | 20 | m | 28 | w |
| 5 | 5 | 13 | d | 21 | n | 29 | x |
| 6 | 6 | 14 | e | 22 | p | 30 | y |
| 7 | 7 | 15 | f | 23 | q | 31 | z |

This is the same alphabet used by [Crockford's base32 encoding](https://www.crockford.com/base32.html),
but in our case the alphabet encoding is strict: always in lowercase, no hyphens allowed,
and we never decode multiple ambiguous characters to the same value.

#### Compatibility with UUID
When genarating a new TypeID, the generated UUID suffix MUST decode to a valid UUIDv7.

Implementations MAY allow encoding/decoding of other UUID variants when the
bits are provided by end users. This makes it possible for applications to encode
other UUID variants like UUIDv1 or UUIDv4 at their discretion.

## Versioning
This spec uses semantic versioning: `MAJOR.MINOR.PATCH`. The version is incremented
when the spec changes in a way that is not backwards compatible.

Libraries that implement this spec should also use semantic versioning, and their
MAJOR and MINOR versions should match the version of the spec they implement.
The PATCH version is up to the discretion of the library author.

## Validating Implementations
To assist library authors in validating their implementations, we provide:
+ A reference implementation in [Go](https://github.com/jetpack-io/typeid-go)
with extensive testing.
+ A [valid.toml](valid.toml) file containing a list of valid typeids along
with their corresponding decoded UUIDs.
+ An [invalid.toml](invalid.toml) file containing a list of strings that are
invalid typeids and should fail to parse/decode.
83 changes: 83 additions & 0 deletions spec/invalid.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# This file contains test data that should be treated as *invalid* TypeIDs by
# conforming implementations.
#
# Each example contains an invalid TypeID string. Implementations are expected
# to throw an error when attempting to parse/validate these strings.
#
# Last updated: 2023-06-29

- name: prefix-uppercase
typeid: "PREFIX_00000000000000000000000000"
description: "The prefix should be lowercase with no uppercase letters"

- name: prefix-numeric
typeid: "12345_00000000000000000000000000"
description: "The prefix can't have numbers, it needs to be alphabetic"

- name: prefix-period
typeid: "pre.fix_00000000000000000000000000"
description: "The prefix can't have symbols, it needs to be alphabetic"

- name: prefix-underscore
typeid: "pre_fix_00000000000000000000000000"
description: "The prefix can't have symbols, it needs to be alphabetic"

- name: prefix-non-ascii
typeid: "préfix_00000000000000000000000000"
description: "The prefix can only have ascii letters"

- name: prefix-spaces
typeid: " prefix_00000000000000000000000000"
description: "The prefix can't have any spaces"

- name: prefix-64-chars
# 123456789 123456789 123456789 123456789 123456789 123456789 1234
typeid: "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijkl_00000000000000000000000000"
description: "The prefix can't be 64 characters, it needs to be 63 characters or less"

- name: separator-empty-prefix
typeid: "_00000000000000000000000000"
description: "If the prefix is empty, the separator should not be there"

- name: separator-empty
typeid: "_"
description: "A separator by itself should not be treated as the empty string"

- name: suffix-short
typeid: "prefix_1234567890123456789012345"
description: "The suffix can't be 25 characters, it needs to be exactly 26 characters"

- name: suffix-long
typeid: "prefix_123456789012345678901234567"
description: "The suffix can't be 27 characters, it needs to be exactly 26 characters"

- name: suffix-spaces
# This example has the right length, so that the failure is caused by the space
# and not the suffix length
typeid: "prefix_1234567890123456789012345 "
description: "The suffix can't have any spaces"

- name: suffix-uppercase
# This example is picked because it would be valid in lowercase
typeid: "prefix_0123456789ABCDEFGHJKMNPQRS"
description: "The suffix should be lowercase with no uppercase letters"

- name: suffix-hyphens
# This example has the right length, so that the failure is caused by the hyphens
# and not the suffix length
typeid: "prefix_123456789-123456789-123456"
description: "The suffix should be lowercase with no uppercase letters"

- name: suffix-wrong-alphabet
typeid: "prefix_ooooooiiiiiiuuuuuuulllllll"
description: "The suffix should only have letters from the spec's alphabet"

- name: suffix-ambiguous-crockford
# This example would be valid if we were using the crockford disambiguation rules
typeid: "prefix_i23456789ol23456789oi23456"
description: "The suffix should not have any ambiguous characters from the crockford encoding"

- name: suffix-hyphens-crockford
# This example would be valid if we were using the crockford hyphenation rules
typeid: "prefix_123456789-0123456789-0123456"
description: "The suffix can't ignore hyphens as in the crockford encoding"
61 changes: 61 additions & 0 deletions spec/valid.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# This file contains test data that should parse as valid TypeIDs by conforming
# implementations.
#
# Each example contains:
# - The TypeID in its canonical string representation.
# - The prefix
# - The decoded UUID as a hex string
#
# Implementations should verify that they can encode/decode the data
# in both directions:
# 1. If the TypeID is decoded, it should result in the given prefix and UUID.
# 2. If the UUID is encoded as a TypeID with the given prefix, it should
# result in the given TypeID.
#
# In addition to using these examples, it's recommended that implementations
# generate a thousands of random ids during testing, and verify that after
# decoding and re-encoding the id, the result is the same as the original.
#
# In other words, the following property should always hold:
# random_typeid: encode(decode(random_typeid))
#
# Finally, while implementations should be able to decode the values below,
# note that not all of them are UUIDv7s. When *generating* new random typeids,
# implementations should always use UUIDv7s.
#
# Last updated: 2023-06-29

- name: nil
typeid: "00000000000000000000000000"
prefix: ""
uuid: "00000000-0000-0000-0000-000000000000"

- name: one
typeid: "00000000000000000000000001"
prefix: ""
uuid: "00000000-0000-0000-0000-000000000001"

- name: ten
typeid: "0000000000000000000000000a"
prefix: ""
uuid: "00000000-0000-0000-0000-00000000000a"

- name: sixteen
typeid: "0000000000000000000000000g"
prefix: ""
uuid: "00000000-0000-0000-0000-000000000010"

- name: thirty-two
typeid: "00000000000000000000000010"
prefix: ""
uuid: "00000000-0000-0000-0000-000000000020"

- name: valid-alphabet
typeid: "prefix_0123456789abcdefghjkmnpqrs"
prefix: "prefix"
uuid: "0110c853-1d09-52d8-d73e-1194e95b5f19"

- name: valid-uuidv7
typeid: "prefix_01h455vb4pex5vsknk084sn02q"
prefix: "prefix"
uuid: "01890a5d-ac96-774b-bcce-b302099a8057"

0 comments on commit f7514c8

Please sign in to comment.