Snowflake is a PostgreSQL extension providing an int8
and sequence
based unique ID solution to optionally replace
the PostgreSQL built-in bigserial
data type.
Internally Snowflakes are 64 bit integers represented externally as bigint
values. The 64 bits are divided into bit fields
bit 63 - unused (sign of int8)
bits 22-62 - timestamp with millisecond precision
bits 10-21 - counter within one millisecond
bits 0-9 - unique PostgreSQL node number set in postgresql.conf
-
The timestamp is a 41-bit unsigned value representing millisecond precision and an epoch of 2023-01-01.
-
The counter is a 12-bit unsigned value that increments per ID allocation. This provides for 4096 unique IDs per millisecond, or 4 million IDs per second. Even the most aggressive allocation of sequences cannot reach one million per second and a modified pgbench with a Snowflake based primary key on the
history
table can only generate IDs in the double digit per millisecond range.Should it be possible in the future to generate more than 4096 snowflakes per millisecond the algorithm is going to bump the timestamp one millisecond into the future to keep snowflakes unique.
-
The
node number
is a 10-bit unique identifier of the PostgreSQL instance inside a global cluster. This value must be set with the GUCsnowflake.node
in thepostgresql.conf
file.
With this design a Snowflake ID is unique within one sequence
across multiple PostgreSQL instances in a distributed cluster.
Go to pgeEdge Github and install the pgEdge CLI.
./pgedge install pg16 --start : install snowflake
For installation from source code it is assumed that the user is familiar with how to build standard PostgreSQL extensions from a source.
cd contrib
git clone https://github.com/pgEdge/snowflake-sequences.git
cd snowflake-sequences
USE_PGXS=1 make
USE_PGXS=1 make install
The Snowflake extension uses a custom GUC snowflake.node
.
This configuration variable determines the node
part of every
snowflake, generated by this PostgreSQL instance. The
permitted values are 1 thru 1023.
This configuration option has an invalid default value (on purpose).
If not set, the Snowflake extension will throw an exception on
a call to snowflake.nextval()
. This (hopefully) prevents you from
accidentally missing this GUC in your postgresql.conf file.
If you ever intend to use Snowflakes in a multi-node, distributed or replicated setup, it is important to set the GUC to a unique value for each PostgreSQL instance. There is nothing in place to prevent you from shooting yourself in the foot by assigning multiple PostgreSQL instances in a multi-master cluster the same node number.
After creating the extension
via
CREATE EXTENSION snowflake;
the following functions become available:
-
snowflake.nextval([sequence regclass])
Generates the next snowflake for the given sequence. If no sequence is specified the internal, database-wide sequencesnowflake.id_seq
will be used.NOTE: snowflakes are only unique per database, per sequence. If you need snowflakes to be unique across all snowflake columns within a database, you need to use the built-in sequence
snowflake.id_seq
. However, that may interfere with the usage ofcurrval()
of your application logic. -
snowflake.currval([sequence regclass])
Returns the current value of the given sequence (or the default, internal sequence). Like for PostgreSQL sequences this value is undefined until the functionsnowflake.nextval()
has been called for the sequence in the current session. -
snowflake.get_epoch(snowflake int8)
Returns the timestamp part of the given snowflake as EPOCH (seconds since 2023-01-01) as a NUMERIC value with precision of three digits. One can useto_timestamp(snowflake.get_epoch(<value>))
to convert this into an actual timestamp. -
snowflake.get_count(snowflake int8)
Returns the count part of the given snowflake as int4 value. This is a unique value within the milliseconds of the snowflake's timestamp. -
snowflake.get_node(snowflake int8)
Returns the setting of GUCsnowflake.node
in postgresql.conf at the time, this snowflake was allocated. -
snowflake.format(snowflake int8)
Returns ajsonb
object of the given snowflake like:
{"node": 1, "ts": "2023-10-16 17:57:26.361+00", "count": 0}
-
snowflake.convert_sequence_to_snowflake(p_relid regclass)
Converts an existing sequence into a snowflake. This is done by changing any column that uses the given sequence with aDEFAULT pg_catalog.nextval(relid)
expression. The column is forced to be typeint8
and theDEFAULT
expression is altered to use thesnowflake.nextval()
function. Then all columns that eventually reference such column in a foreign key are forced toint8
as well. Finally the sequence's MAXVALUE is adjusted to the currentlast_value + 1
, which prevents the accidental use ofpg_catalog.nextval()
from user code.
This SQL code shows how to use snowflake in a newly created table.
-- Assuming extension snowflake has been installed
CREATE TABLE table1 (
id bigint PRIMARY KEY DEFAULT snowflake.nextval(),
some_data text
);
INSERT INTO table1 (some_data) VALUES ('first row');
INSERT INTO table1 (some_data) VALUES ('second row');
SELECT id, snowflake.format(id), some_data FROM table1;
Result:
id | format | some_data
-------------------+-----------------------------------------------------------+------------
18014518154714241 | {"node": 1, "ts": "2023-10-16 18:47:12.257+00", "count": 0} | first row
18014518154714242 | {"node": 1, "ts": "2023-10-16 18:47:12.258+00", "count": 0} | second row
(2 rows)
This SQL code demonstrates how to convert an existing bigserial
or
serial8
column into a snowflake.
CREATE TABLE table2 (
id bigserial PRIMARY KEY,
some_data text
);
INSERT INTO table2 (some_data) VALUES ('first row');
INSERT INTO table2 (some_data) VALUES ('second row');
ALTER TABLE table2 ALTER COLUMN id SET DEFAULT snowflake.nextval();
-- Alternatively, if you need to retain the ability to retrieve
-- the snowflake.currval('table2_id_seq') of the sequence individually, use
-- ALTER TABLE table2 ALTER COLUMN id SET DEFAULT snowflake.nextval('table2_id_seq');
INSERT INTO table2 (some_data) VALUES ('third row');
INSERT INTO table2 (some_data) VALUES ('fourth row');
SELECT id, snowflake.format(id), some_data FROM table2;
Result:
id | format | some_data
-------------------+-----------------------------------------------------------+------------
1 | {"node": 0, "ts": "2023-01-01 00:00:00.001+00", "count": 0} | first row
2 | {"node": 0, "ts": "2023-01-01 00:00:00.002+00", "count": 0} | second row
18014518155600128 | {"node": 1, "ts": "2023-10-16 19:01:58.144+00", "count": 0} | third row
18014518155600129 | {"node": 1, "ts": "2023-10-16 19:01:58.145+00", "count": 0} | fourth row
(4 rows)
NOTE: The int8
value of ID remains unique, although it does
jump ahead quite a bit (which is in compliance with standard PostgreSQL
sequences that may have gaps). Because the snowflake EPOCH is
2023-01-01; your existing database would have had to use over 18 trillion
sequence numbers before this conversion to cause any trouble with
possible duplicate key values.