From f7514c868ed0b901fd384124ed490f7c68c05a22 Mon Sep 17 00:00:00 2001 From: Daniel Loreto <279789+loreto@users.noreply.github.com> Date: Fri, 30 Jun 2023 07:52:27 -0500 Subject: [PATCH] Add 'formal' spec for typeid (#52) Having a more formal spec is helpful for third-party authors of additional libraries. I also add tests to make sure our `go` implementation complies with the spec. --- README.md | 6 ++- go.mod | 2 +- go.sum | 4 +- spec/README.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++++ spec/invalid.yml | 83 +++++++++++++++++++++++++++++++++++++++++ spec/valid.yml | 61 ++++++++++++++++++++++++++++++ 6 files changed, 248 insertions(+), 5 deletions(-) create mode 100644 spec/README.md create mode 100644 spec/invalid.yml create mode 100644 spec/valid.yml diff --git a/README.md b/README.md index 4278f9b..e8a562e 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ in Stripe's APIs. TypeIDs are canonically encoded as lowercase strings consisting of three parts: 1. A type prefix (at most 63 characters in all lowercase ASCII [a-z]) 2. An underscore '_' separator -3. A 128-bit UUIDv7 encoded as a 26-character string in base32 (using [Crockford's alphabet](https://www.crockford.com/base32.html) in lowercase). +3. A 128-bit UUIDv7 encoded as a 26-character string using a modified base32 encoding. Here's an example of a TypeID of type `user`: @@ -21,6 +21,8 @@ Here's an example of a TypeID of type `user`: type uuid suffix (base32) ``` +A [formal specification](./spec.md) defines the encoding in more detail. + ## Benefits + **Type-safe:** you can't accidentally use a `user` ID where a `post` ID is expected. When debugging, you can immediately understand what type of entity a TypeID refers to thanks to the type prefix. @@ -31,13 +33,13 @@ Here's an example of a TypeID of type `user`: selected for copy-pasting by double-clicking, and is a more compact encoding than the traditional hex encoding used by UUIDs (26 characters vs 36 characters). ## Implementations +Implementations should adhere to the formal [specification](./spec.md). ### Official Implementations by `jetpack.io` | Language | Status | | -------- | ------ | | [Go](https://github.com/jetpack-io/typeid-go) | ✓ Implemented | | Python | ... Coming Soon | -| Rust | ... Coming Soon | | [SQL](https://github.com/jetpack-io/typeid-sql) | ✓ Implemented | | [TypeScript](https://github.com/jetpack-io/typeid-ts) | ✓ Implemented | diff --git a/go.mod b/go.mod index c500937..1b3c9b4 100644 --- a/go.mod +++ b/go.mod @@ -4,7 +4,7 @@ go 1.20 require ( github.com/spf13/cobra v1.7.0 - go.jetpack.io/typeid v0.0.0-20230614212614-fe4f463275f1 + go.jetpack.io/typeid v0.0.0-20230629192725-341e2b135e06 ) require ( diff --git a/go.sum b/go.sum index a5dec63..a5122e8 100644 --- a/go.sum +++ b/go.sum @@ -11,8 +11,8 @@ github.com/spf13/cobra v1.7.0/go.mod h1:uLxZILRyS/50WlhOIKD7W6V5bgeIt+4sICxh6uRM github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA= github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk= -go.jetpack.io/typeid v0.0.0-20230614212614-fe4f463275f1 h1:JZqIj87LLFIExe9E9SWQlQmiaBK7HfVk1Wse2UxD29I= -go.jetpack.io/typeid v0.0.0-20230614212614-fe4f463275f1/go.mod h1:5BRM/qwZR4JjQ6X0q2Ihcc3H7DHXkuMl5XLAJa2SF6k= +go.jetpack.io/typeid v0.0.0-20230629192725-341e2b135e06 h1:Kn+6UV9ARBlIxAHjz7nXNyIKsY+qNfP6H6+a8s7nIEo= +go.jetpack.io/typeid v0.0.0-20230629192725-341e2b135e06/go.mod h1:R2tsu0u8ZSmdJdnzsTZ5YsbE9fFlBeYj/oIcRfgdL7k= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/spec/README.md b/spec/README.md new file mode 100644 index 0000000..1bb4295 --- /dev/null +++ b/spec/README.md @@ -0,0 +1,97 @@ +# TypeID Specification (Version 0.1.0) + +## Overview +TypeIDs are a type-safe extension of UUIDv7, they encode UUIDs in base32 and add a type suffix. + +Here's an example of a TypeID of type `user`: + +``` + user_2x4y6z8a0b1c2d3e4f5g6h7j8k + └──┘ └────────────────────────┘ + type uuid suffix (base32) +``` + +This document formalizes the specification for TypeIDs. + +## Specification + +A typeid consists of three parts: +1. A **type prefix**: a string denoting the type of the ID. The prefix should be + at most 63 characters in all lowercase ASCII [a-z]. +1. A **separator**: an underscore '_' character. +1. A **UUID suffix**: a 128-bit UUIDv7 encoded as a 26-character string in base32. + +### Type Prefix +A type prefix is a string denoting the type of the ID. The prefix should be at most +63 characters in all lowercase ASCII [a-z]. Valid prefixes should match the following +regex: `[a-z]{0,63}`. + +The empty string is a valid prefix, it's there for very specific use cases in which +applications need to encode a typeid but elide the type information. In general though, +applications should use a prefix that is at least 3 characters long. + +> Note: [There's a proposal](https://github.com/jetpack-io/typeid/issues/7) to add `_` as +> an allowed separator within type prefixes. + +### Separator +The separator is a single underscore character `_`. If the prefix is empty, the separator +is omitted. + +### UUID Suffix +The UUID suffix encodes exactly 128-bits of data in 26 characters. It uses the base32 +encoding described below. + +#### Base32 Encoding +Bytes from the UUID are encoded from left to right. Two zeroed bits are pre-pended +to the 128-bits of the UUID, resulting in 130-bits of data. The 130-bits are then +split into 5-bit chunks, and each chunk is encoded as a single character in the +base32 alphabet, resulting in a total of 26 characters. + +In practice this is most often done by using bit-shifting and a lookup table. See +the [reference implementation encoding](https://github.com/jetpack-io/typeid-go/blob/main/base32/base32.go) +for an example. + +Note that this is different from the standard base32 encoding which encodes in +groups of 5 bytes (40 bits) and appends any padding at the end of the data. + +The encoding uses the following alphabet `0123456789abcdefghjkmnpqrstvwxyz` as +specified by the following table: + +| Value | Symbol | Value | Symbol | Value | Symbol | Value | Symbol | +|-------|--------|-------|--------|-------|--------|-------|--------| +| 0 | 0 | 8 | 8 | 16 | g | 24 | r | +| 1 | 1 | 9 | 9 | 17 | h | 25 | s | +| 2 | 2 | 10 | a | 18 | j | 26 | t | +| 3 | 3 | 11 | b | 19 | k | 27 | v | +| 4 | 4 | 12 | c | 20 | m | 28 | w | +| 5 | 5 | 13 | d | 21 | n | 29 | x | +| 6 | 6 | 14 | e | 22 | p | 30 | y | +| 7 | 7 | 15 | f | 23 | q | 31 | z | + +This is the same alphabet used by [Crockford's base32 encoding](https://www.crockford.com/base32.html), +but in our case the alphabet encoding is strict: always in lowercase, no hyphens allowed, +and we never decode multiple ambiguous characters to the same value. + +#### Compatibility with UUID +When genarating a new TypeID, the generated UUID suffix MUST decode to a valid UUIDv7. + +Implementations MAY allow encoding/decoding of other UUID variants when the +bits are provided by end users. This makes it possible for applications to encode +other UUID variants like UUIDv1 or UUIDv4 at their discretion. + +## Versioning +This spec uses semantic versioning: `MAJOR.MINOR.PATCH`. The version is incremented +when the spec changes in a way that is not backwards compatible. + +Libraries that implement this spec should also use semantic versioning, and their +MAJOR and MINOR versions should match the version of the spec they implement. +The PATCH version is up to the discretion of the library author. + +## Validating Implementations +To assist library authors in validating their implementations, we provide: ++ A reference implementation in [Go](https://github.com/jetpack-io/typeid-go) + with extensive testing. ++ A [valid.toml](valid.toml) file containing a list of valid typeids along + with their corresponding decoded UUIDs. ++ An [invalid.toml](invalid.toml) file containing a list of strings that are + invalid typeids and should fail to parse/decode. \ No newline at end of file diff --git a/spec/invalid.yml b/spec/invalid.yml new file mode 100644 index 0000000..c1470a2 --- /dev/null +++ b/spec/invalid.yml @@ -0,0 +1,83 @@ +# This file contains test data that should be treated as *invalid* TypeIDs by +# conforming implementations. +# +# Each example contains an invalid TypeID string. Implementations are expected +# to throw an error when attempting to parse/validate these strings. +# +# Last updated: 2023-06-29 + +- name: prefix-uppercase + typeid: "PREFIX_00000000000000000000000000" + description: "The prefix should be lowercase with no uppercase letters" + +- name: prefix-numeric + typeid: "12345_00000000000000000000000000" + description: "The prefix can't have numbers, it needs to be alphabetic" + +- name: prefix-period + typeid: "pre.fix_00000000000000000000000000" + description: "The prefix can't have symbols, it needs to be alphabetic" + +- name: prefix-underscore + typeid: "pre_fix_00000000000000000000000000" + description: "The prefix can't have symbols, it needs to be alphabetic" + +- name: prefix-non-ascii + typeid: "préfix_00000000000000000000000000" + description: "The prefix can only have ascii letters" + +- name: prefix-spaces + typeid: " prefix_00000000000000000000000000" + description: "The prefix can't have any spaces" + +- name: prefix-64-chars + # 123456789 123456789 123456789 123456789 123456789 123456789 1234 + typeid: "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijkl_00000000000000000000000000" + description: "The prefix can't be 64 characters, it needs to be 63 characters or less" + +- name: separator-empty-prefix + typeid: "_00000000000000000000000000" + description: "If the prefix is empty, the separator should not be there" + +- name: separator-empty + typeid: "_" + description: "A separator by itself should not be treated as the empty string" + +- name: suffix-short + typeid: "prefix_1234567890123456789012345" + description: "The suffix can't be 25 characters, it needs to be exactly 26 characters" + +- name: suffix-long + typeid: "prefix_123456789012345678901234567" + description: "The suffix can't be 27 characters, it needs to be exactly 26 characters" + +- name: suffix-spaces + # This example has the right length, so that the failure is caused by the space + # and not the suffix length + typeid: "prefix_1234567890123456789012345 " + description: "The suffix can't have any spaces" + +- name: suffix-uppercase + # This example is picked because it would be valid in lowercase + typeid: "prefix_0123456789ABCDEFGHJKMNPQRS" + description: "The suffix should be lowercase with no uppercase letters" + +- name: suffix-hyphens + # This example has the right length, so that the failure is caused by the hyphens + # and not the suffix length + typeid: "prefix_123456789-123456789-123456" + description: "The suffix should be lowercase with no uppercase letters" + +- name: suffix-wrong-alphabet + typeid: "prefix_ooooooiiiiiiuuuuuuulllllll" + description: "The suffix should only have letters from the spec's alphabet" + +- name: suffix-ambiguous-crockford + # This example would be valid if we were using the crockford disambiguation rules + typeid: "prefix_i23456789ol23456789oi23456" + description: "The suffix should not have any ambiguous characters from the crockford encoding" + +- name: suffix-hyphens-crockford + # This example would be valid if we were using the crockford hyphenation rules + typeid: "prefix_123456789-0123456789-0123456" + description: "The suffix can't ignore hyphens as in the crockford encoding" diff --git a/spec/valid.yml b/spec/valid.yml new file mode 100644 index 0000000..964c96f --- /dev/null +++ b/spec/valid.yml @@ -0,0 +1,61 @@ +# This file contains test data that should parse as valid TypeIDs by conforming +# implementations. +# +# Each example contains: +# - The TypeID in its canonical string representation. +# - The prefix +# - The decoded UUID as a hex string +# +# Implementations should verify that they can encode/decode the data +# in both directions: +# 1. If the TypeID is decoded, it should result in the given prefix and UUID. +# 2. If the UUID is encoded as a TypeID with the given prefix, it should +# result in the given TypeID. +# +# In addition to using these examples, it's recommended that implementations +# generate a thousands of random ids during testing, and verify that after +# decoding and re-encoding the id, the result is the same as the original. +# +# In other words, the following property should always hold: +# random_typeid: encode(decode(random_typeid)) +# +# Finally, while implementations should be able to decode the values below, +# note that not all of them are UUIDv7s. When *generating* new random typeids, +# implementations should always use UUIDv7s. +# +# Last updated: 2023-06-29 + +- name: nil + typeid: "00000000000000000000000000" + prefix: "" + uuid: "00000000-0000-0000-0000-000000000000" + +- name: one + typeid: "00000000000000000000000001" + prefix: "" + uuid: "00000000-0000-0000-0000-000000000001" + +- name: ten + typeid: "0000000000000000000000000a" + prefix: "" + uuid: "00000000-0000-0000-0000-00000000000a" + +- name: sixteen + typeid: "0000000000000000000000000g" + prefix: "" + uuid: "00000000-0000-0000-0000-000000000010" + +- name: thirty-two + typeid: "00000000000000000000000010" + prefix: "" + uuid: "00000000-0000-0000-0000-000000000020" + +- name: valid-alphabet + typeid: "prefix_0123456789abcdefghjkmnpqrs" + prefix: "prefix" + uuid: "0110c853-1d09-52d8-d73e-1194e95b5f19" + +- name: valid-uuidv7 + typeid: "prefix_01h455vb4pex5vsknk084sn02q" + prefix: "prefix" + uuid: "01890a5d-ac96-774b-bcce-b302099a8057"