pg_bm25
is a PostgreSQL extension that enables full text search over SQL tables using the BM25 algorithm, the state-of-the-art ranking function for full text search. It is built on top of Tantivy, the Rust-based alternative to Apache Lucene, using pgrx
.
pg_bm25
is supported on all versions supported by the PostgreSQL Global Development Group, which includes PostgreSQL 12+.
Check out the pg_bm25
benchmarks here.
- BM25 scoring
- Highlighting
- Boosted queries
- Filtering
- Bucket and metrics aggregations
- Autocomplete
- Fuzzy search
- Custom tokenizers
- JSON field search
- Datetime aggregations
- Facet fields
The easiest way to use the extension is to run the ParadeDB Dockerfile:
docker run \
-e POSTGRES_USER=<user> \
-e POSTGRES_PASSWORD=<password> \
-e POSTGRES_DB=<dbname> \
-p 5432:5432 \
-d \
paradedb/paradedb:latest
This will spin up a Postgres instance with pg_bm25
preinstalled.
If you are self-hosting Postgres and would like to use the extension within your existing Postgres, follow these steps:
We provide pre-built binaries for Debian-based Linux for PostgreSQL 15 (more versions coming soon). You can download the latest version for your architecture from the releases page.
ParadeDB collects anonymous telemetry to help us understand how many people are using the project. You can opt-out of telemetry by setting export TELEMETRY=false
(or unsetting the variable) in your shell or in your ~/.bashrc
file before running the extension.
We don't suggest running production workloads on macOS or Windows. As a result, we don't provide prebuilt binaries for these platforms. If you are running Postgres on macOS or Windows and want to install pg_bm25
, please follow the development instructions, but do cargo pgrx install --release
instead of cargo pgrx run
. This will build the extension from source and install it in your Postgres instance.
You can then create the extension in your database by running:
CREATE EXTENSION pg_bm25;
Note: If you are using a managed Postgres service like Amazon RDS, you will not be able to install pg_bm25
until the Postgres service explicitly supports it.
pg_bm25
comes with a helper function that creates a test table that you can use for quick experimentation.
SELECT paradedb.create_bm25_test_table();
CREATE TABLE mock_items AS SELECT * FROM paradedb.bm25_test_table;
To index the table, use the following SQL command:
CREATE INDEX idx_mock_items
ON mock_items
USING bm25 ((mock_items.*))
WITH (text_fields='{"description": {}, "category": {}}');
Once the indexing is complete, you can run various search functions on it.
Execute a search query on your indexed table:
SELECT description, rating, category
FROM mock_items
WHERE mock_items @@@ 'description:keyboard OR category:electronics'
LIMIT 5;
This will return:
description | rating | category
-----------------------------+--------+-------------
Plastic Keyboard | 4 | Electronics
Ergonomic metal keyboard | 4 | Electronics
Innovative wireless earbuds | 5 | Electronics
Fast charging power bank | 4 | Electronics
Bluetooth-enabled speaker | 3 | Electronics
(5 rows)
Scoring and highlighting are supported:
SELECT description, rating, category, paradedb.rank_bm25(ctid), paradedb.highlight_bm25(ctid, 'idx_mock_items', 'description')
FROM mock_items
WHERE mock_items @@@ 'description:keyboard OR category:electronics'
LIMIT 5;
This will return:
id | description | rating | category | rank_bm25 | highlight_bm25
----+-----------------------------+--------+-------------+-----------+---------------------------------
1 | Ergonomic metal keyboard | 4 | Electronics | 4.9403534 | Ergonomic metal <b>keyboard</b>
2 | Very plasticy keyboard | 4 | Electronics | 4.9403534 | Very plasticy <b>keyboard</b>
12 | Innovative wireless earbuds | 5 | Electronics | 2.1096356 |
22 | Fast charging power bank | 4 | Electronics | 2.1096356 |
32 | Bluetooth-enabled speaker | 3 | Electronics | 2.1096356 |
(5 rows)
Scores can be tuned via boosted queries:
SELECT description, rating, category
FROM mock_items
WHERE mock_items @@@ 'description:keyboard^2 OR category:electronics';
New data that arrives or rows that are changed are automatically reindexed and searchable. For instance, let's create and search for a new row in our table:
INSERT INTO mock_items (description, rating, category) VALUES ('New keyboard', 5, 'Electronics');
SELECT description, rating, category
FROM mock_items
WHERE mock_items @@@ 'description:keyboard OR category:electronics'
LIMIT 5;
This will return:
description | rating | category
-----------------------------+--------+-------------
New keyboard | 5 | Electronics
Plastic Keyboard | 4 | Electronics
Ergonomic metal keyboard | 4 | Electronics
Innovative wireless earbuds | 5 | Electronics
Fast charging power bank | 4 | Electronics
(5 rows)
Please refer to the documentation for a more thorough overview of pg_bm25
's query support.
Before developing the extension, ensure that you have Rust installed
(version >1.70), ideally via rustup
(we've observed issues with installing Rust via Homebrew on macOS).
If you are on macOS and using Postgres.app, you'll first need to add the pg_config
binary to your path:
export PATH="$PATH:/Applications/Postgres.app/Contents/Versions/latest/bin"
Then, install and initialize pgrx:
# Note: Replace --pg15 with your version of Postgres, if different (i.e. --pg16, --pg14, etc.)
cargo install --locked cargo-pgrx --version 0.11.0
cargo pgrx init --pg15=`which pg_config`
First, start pgrx:
cargo pgrx run
This will launch an interactive connection to Postgres. Inside Postgres, create the extension by running:
CREATE EXTENSION pg_bm25;
Now, you have access to all the extension functions.
If you make changes to the extension code, follow these steps to update it:
- Recompile the extension:
cargo pgrx run
- Recreate the extension to load the latest changes:
DROP EXTENSION pg_bm25;
CREATE EXTENSION pg_bm25;
To run the unit test suite, use the following command:
cargo pgrx test
This will run all unit tests defined in /src
. To add a new unit test, simply add tests inline in the relevant files, using the #[cfg(test)]
attribute.
To run the integration test suite, simply run:
./test/runtests.sh -p threaded
This will create a temporary database, initialize it with the SQL commands defined in fixtures.sql
, and run the tests in /test/sql
against it. To add a new test, simply add a new .sql
file to /test/sql
and a corresponding .out
file to /test/expected
for the expected output, and it will automatically get picked up by the test suite.
Note: the bash script takes arguments and allows you to run tests either sequentially or in parallel. For more info run ./test/runtests.sh -h
pg_bm25
is licensed under the GNU Affero General Public License v3.0.