This project is a "procedural language" extension of PostgreSQL allowing the execution of code in Haskell within SQL code. Despite the name, Haskell is, of course, not a procedural language as it is a functional language. However, "procedural language" is the term that PostgreSQL uses to describe languages that can be embedded in its SQL code.
The extension allows users, even unprivileged ones, to write, install, and run functions written in Haskell.
Copyright (C) 2024 Edward F. Behn, Jr.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
I wrote the project mostly for my own amusement and as a challenge. If you have any questions, comments, or issues, or if you find the project interesting or useful, please don't hesitate to contact me at [email protected].
Bug reports and feature requests should be submitted in the Github issues page.
This document assumes that the reader is familiar with PostgreSQL (specifically procedural languages) and with the Haskell language.
The easiest way to install the project is to use the RPM package repository for Fedora or the apt package repository for Ubuntu.
Add the repository:
$>
sudo dnf config-manager --add-repo https://ed-o-saurus.github.io/repos/plhaskell/fedora/plhaskell.repo
Update the repository information:
$>
sudo dnf update
Install the package:
$>
sudo dnf install plhaskell
Add the signing key to the apt keys:
$>
wget --quiet -O- https://ed-o-saurus.github.io/keys/A9DD4516.asc | gpg --dearmor | sudo tee /usr/share/keyrings/plhaskell-keyring.gpg > /dev/null
Add the repository:
$>
echo deb \[signed-by=/usr/share/keyrings/plhaskell-keyring.gpg\] https://ed-o-saurus.github.io/repos/plhaskell/ubuntu/$(lsb_release -cs)/apt-repo stable main | sudo tee /etc/apt/sources.list.d/plhaskell.list > /dev/null
Update the repository information:
$>
sudo apt update
Install the package:
$>
sudo apt install plhaskell
This extension is intended to be built and installed in a Linux environment. It has only been tested on the x86-64 architecture.
The following should be done as a non-root user.
Install git, rpmdevtools, make, checkpolicy, policycoreutils, postgresql-server-devel, ghc-compiler, ghc-bytestring-devel, ghc-text-devel, and ghc-hint-devel:
$>
sudo dnf install git rpmdevtools make selinux-policy-devel postgresql-server-devel ghc-compiler ghc-bytestring-devel ghc-text-devel ghc-hint-devel
Create the directories necessary to build the .rpm:
$>
rpmdev-setuptree
Download the project code:
$>
git clone https://github.com/ed-o-saurus/PLHaskell
Copy the .spec
file to the ~/rpmbuild/SPECS
directory:
$>
cp PLHaskell/spec/plhaskell.spec ~/rpmbuild/SPECS
Copy the source to the ~/rpmbuild/SOURCES
directory:
$>
tar --exclude-vcs -czf ~/rpmbuild/SOURCES/PLHaskell.tar.gz PLHaskell
Build the .rpm
package:
$>
rpmbuild -bb ~/rpmbuild/SPECS/plhaskell.spec
Install the .rpm
package written to the ~/rpmbuild/RPMS/
<arch>
directory on the target machine.
The following should be done as a non-root user.
Install git, devscripts, debhelper-compat, postgresql-server-dev-all ghc, libghc-hint-dev:
$>
sudo apt install git devscripts debhelper-compat postgresql-server-dev-all ghc libghc-hint-dev
Download the project code:
$>
git clone https://github.com/ed-o-saurus/PLHaskell
Build the .deb
package:
$>
cd PLHaskell
$>
debuild --no-tgz-check -j1
Install the .deb
package written to the parent directory on the target machine.
The alternative to building an .rpm
or .deb
package is to build and install the project manually.
The following are needed to build and install PL/Haskell:
- PostgreSQL server
- PostgreSQL development files
- Glasgow Haskell Compiler (GHC)
- The GHC Hint development module
- libpq-devel
Ensure that pg_config
is available in the search path.
From the repository's root directory, build all the files needed for the extension:
$>
make
Install the extension:
$>
sudo make install
To uninstall the extension:
$>
sudo make uninstall
Systems that use Security Enhanced Linux (SELinux) may encounter problems running the extension. This manifests as the ability to create functions and the inability to call them. Appropriate policies must be implemented. Full details are beyond the scope of this document.
A policy file designed to accommodate Red Hat based systems can be built by running make SELINUX=1
. Upon installation, this file is saved to /usr/share/selinux/packages/plhaskell.pp
. It is the user's responsibility to install the policy.
Test batteries are provided in the tests_trusted.sql
and tests_untrusted.sql
files in the root directory. They can be run with the commands
$>
psql -f tests_trusted.sql
$>
psql -f tests_untrusted.sql
If successful, they will terminate with the notice All tests passed
.
The package provides two language variants. plhaskell
is a "trusted" language. The PostgreSQL manual explains:
The optional key word TRUSTED specifies that the language does not grant access to data that the user would not otherwise have. Trusted languages are designed for ordinary database users (those without superuser privilege) and allows them to safely create functions and procedures.
As such, unprivileged users are permitted to write and execute functions without the possibility that they will be able to access information or resources that are not allowed to access. This is accomplished by enforcing Haskell's strong type system.
plhaskellu
is an "untrusted" language. It affords the user more flexibility but lacks the safeguards of the trusted variant. As such, only users with PostgreSQL superuser privilege can create functions with this variant.
To install the language in a database, run the SQL command CREATE EXTENSION plhaskell;
.
Functions in PL/Haskell are created the same manner as other PostgreSQL functions. The code must be valid Haskell.
The code must contain a function with the same name as the PostgreSQL function. Each of its arguments must be of the type Maybe
arg
where arg
is the Haskell type as determined by the PostgreSQL type as indicated by the following table.
PostgreSQL Type | Module | Haskell Type |
---|---|---|
bytea |
Data.ByteString |
ByteString |
text |
Data.Text |
Text |
char |
Prelude |
Char |
bool |
Prelude |
Bool |
smallint |
Data.Int |
Int16 |
integer |
Data.Int |
Int32 |
bigint |
Data.Int |
Int64 |
real |
Prelude |
Float |
float |
Prelude |
Double |
Trusted functions must return type PGm (Maybe
result
)
where result
is the appropriate Haskell type as determined by the return type of function while untrusted functions must return type IO (Maybe
result
)
. The PGm
monad type can be imported from the PGutils
module.
If a function returns a void (no value), the Haskell function should return type PGm ()
or IO ()
.
In addition, functions can use composite types as arguments or return values provided that the composite types consist of elements that are listed in the table above or are themselves composite types. Composite values are represented as Haskell tuples.
Arrays can be passed to and returned from functions. They are represented in Haskell by the Array a
type which can be imported from the PGutils
module. The constructors are the following:
Array a |
---|
ArrayEmpty |
Array1D Int32 [a] |
Array2D (Int32, Int32) [[a]] |
Array3D (Int32, Int32, Int32) [[[a]]] |
Array4D (Int32, Int32, Int32, Int32) [[[[a]]]] |
Array5D (Int32, Int32, Int32, Int32, Int32) [[[[[a]]]]] |
Array6D (Int32, Int32, Int32, Int32, Int32, Int32) [[[[[[a]]]]]] |
The Int32
values represent the lower bounds of the array indexes and the lists represent the contents of the array
The function arrayMap :: (a -> b) -> Array a -> Array b
applies the provided function to all elements in an array.
The function arrayMapM :: Monad m => (a -> m b) -> Array a -> m (Array b)
applies a monadic action to the elements in an array in order.
The extension supports anonymous code blocks with the use of the DO
keyword. Code is written the same way as in functions. The Haskell function to be run must be named _'
and have signature _' :: PGm ()
or _' :: IO ()
depending on the language variant used.
Functions can return sets of values by returning type PGm [Maybe
result
]
or IO [Maybe
result
]
where result
is the appropriate Haskell type as determined by the return type of function.
Set returning functions can return sets of voids by the Haskell code returing type PGm [()]
or IO [()]
.
To report a message or raise an error, use the function report :: ErrorLevel -> Text -> PGm ()
. ErrorLevel
is any of the following:
Level |
---|
exception |
warning |
notice |
info |
log |
debug1 |
debug2 |
debug3 |
debug4 |
debug5 |
In addition, the function raiseError :: Text -> a
stops execution and raises an error.
Queries can be performed from inside PL/Haskell with the function query :: Text -> [QueryParam] -> PGm QueryResults
. The first argument is the query itself. The second argument is a list of values to be substituted into the query using the expressions $
n
. This should be used rather than constructing a query string directly to prevent SQL injection attacks.
Queries in stable and immutable functions are executed in read-only mode, while volatile functions execute queries in read-write mode.
Any type that can be passed to PL/Haskell functions can be used as a parameter by using the appropriate constructor for the type QueryParam
. The constructors are the following:
QueryParam |
---|
QueryParamByteA (Maybe ByteString) |
QueryParamText (Maybe Text) |
QueryParamChar (Maybe Char) |
QueryParamBool (Maybe Bool) |
QueryParamInt2 (Maybe Int16) |
QueryParamInt4 (Maybe Int32) |
QueryParamInt8 (Maybe Int64) |
QueryParamFloat4 (Maybe Float) |
QueryParamFloat8 (Maybe Double) |
QueryParamComposite (Maybe Text, Text) (Maybe [QueryParam]) |
QueryParamArray (Maybe Text, Text) (Maybe (Array QueryParam)) |
The (Maybe Text, Text)
tuple in the QueryParamComposite
constructor is the schema and type name of the composite type. The (Maybe Text, Text)
tuple in the QueryParamArray
constructor is the schema and type name of the element type. If Nothing
is used as the schema name, the first matching type in the search path is used.
The constructors for QueryResults
are the following:
QueryResults |
---|
SelectResults Word64 [Text] [[QueryResultValue]] |
SelectIntoResults Word64 |
InsertResults Word64 |
DeleteResults Word64 |
UpdateResults Word64 |
InsertReturningResults Word64 [Text] [[QueryResultValue]] |
DeleteReturningResults Word64 [Text] [[QueryResultValue]] |
UpdateReturningResults Word64 [Text] [[QueryResultValue]] |
UtilityResults Word64 |
RewrittenResults Word64 |
The constructor indicates the type of query run. The Word64
field is the number of rows processed. The [Text]
field is the names of the columns returned and the [[QueryResultValue]]
is the data returned.
The constructors for QueryResultValue
are the following:
QueryResultValue |
---|
QueryResultValueByteA (Maybe ByteString) |
QueryResultValueText (Maybe Text) |
QueryResultValueChar (Maybe Char) |
QueryResultValueBool (Maybe Bool) |
QueryResultValueInt2 (Maybe Int16) |
QueryResultValueInt4 (Maybe Int32) |
QueryResultValueInt8 (Maybe Int64) |
QueryResultValueFloat4 (Maybe Float) |
QueryResultValueFloat8 (Maybe Double) |
QueryResultValueComposite (Text, Text) (Maybe [QueryResultValue]) |
QueryResultValueArray (Text, Text) (Maybe (Array QueryResultValue)) |
The (Text, Text)
tuple in the QueryResultValueComposite
constructor is the schema and type name of the composite type. The (Text, Text)
tuple in the QueryResultValueArray
constructor is the schema and type name of the element type.
The functions commit :: Bool -> PGm ()
and rollback :: Bool -> PGm ()
can be used to commit or rollback transactions in a non-atomic connection.
An argument of True
causes a new transaction to be started with the same characteristics as the one just completed. This is the equivalent of COMMIT AND CHAIN
or ROLLBACK AND CHAIN
.
An argument of False
causes a new transaction to be started with default characteristics. This is the equivalent of COMMIT
or ROLLBACK
.
The function unPGm :: PGm a -> IO a
from the PGutils
module can be used to convert a PGm
action to an IO
action.
By default, the maximum memory that can be used by the Haskell runtime system is 128 MB. This can be changed by setting the plhaskell.max_memory
variable in the postgresql.conf
file. Setting the value to zero disables the limit.
The function ghc_version
takes no arguments and returns the version of the underlying GHC API. Values are:
- 804 for GHC 8.4.x
- 806 for GHC 8.6.x
- etc...
It is known that the extension does not function properly if lc_messages
runtime parameter is set to a value other than 'C'
or 'en_US.utf8'
. The workaround is to use the SET
clause in function declarations.
CREATE FUNCTION forty_two() RETURNS int IMMUTABLE AS
$$
import PGutils (PGm, report, notice)
import Data.Int (Int32)
forty_two :: PGm (Maybe Int32)
forty_two = do
report notice "Don't Panic"
return (Just 42)
$$
LANGUAGE plhaskell
SET lc_messages TO 'C';
Attempting to make multiple files simultaneously can cause unpredictable errors. Do not attempt to use the -j
option with the make
command.
DO LANGUAGE plhaskell $$
import PGutils (PGm, report, notice)
_' :: PGm ()
_' = report notice "Hello, world!"
$$;
DO LANGUAGE plhaskellu $$
import PGutils (unPGm, report, notice)
_' :: IO ()
_' = unPGm $ report notice "Hello, world!"
$$;
CREATE FUNCTION add(int, int) RETURNS int IMMUTABLE AS
$$
import PGutils (PGm)
import Data.Int (Int32)
add :: Maybe Int32 -> Maybe Int32 -> PGm (Maybe Int32)
add Nothing Nothing = return Nothing
add (Just a) (Just b) = return $ Just $ a+b
add a Nothing = return a
add Nothing b = return b
$$
LANGUAGE plhaskell;
CREATE FUNCTION add(int, int) RETURNS int IMMUTABLE AS
$$
import Data.Int (Int32)
add :: Maybe Int32 -> Maybe Int32 -> IO (Maybe Int32)
add Nothing Nothing = return Nothing
add (Just a) (Just b) = return $ Just $ a+b
add a Nothing = return a
add Nothing b = return b
$$
LANGUAGE plhaskellu;
CREATE FUNCTION fibonacci(int) RETURNS int IMMUTABLE AS
$$
import PGutils (PGm)
import Data.Int (Int32)
fibonacci' :: Int32 -> Int32
fibonacci' 0 = 0
fibonacci' 1 = 1
fibonacci' n = fibonacci' (n-2) + fibonacci' (n-1)
fibonacci :: Maybe Int32 -> PGm (Maybe Int32)
fibonacci = return . fmap fibonacci'
$$
LANGUAGE plhaskell;
CREATE FUNCTION fibonacci(int) RETURNS int IMMUTABLE AS
$$
import Data.Int (Int32)
fibonacci' :: Int32 -> Int32
fibonacci' 0 = 0
fibonacci' 1 = 1
fibonacci' n = fibonacci' (n-2) + fibonacci' (n-1)
fibonacci :: Maybe Int32 -> IO (Maybe Int32)
fibonacci = return . fmap fibonacci'
$$
LANGUAGE plhaskellu;
This section shows how to return a set of results. The functions listed produce lists of prime numbers using the Sieve of Eratosthenes.
The functions in this section return a set of composite results.
CREATE TYPE n_p AS (n int, p int);
CREATE FUNCTION primes(int) RETURNS SETOF n_p IMMUTABLE AS
$$
import PGutils (PGm, raiseError)
import Data.Int (Int32)
sieve :: [Int32] -> [Int32]
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]
sieve [] = []
primes :: Maybe Int32 -> PGm [Maybe (Maybe Int32, Maybe Int32)]
primes Nothing = raiseError "Invalid Null"
primes (Just n) = return $ map Just $ zip [Just i | i <- [1..n]] $ map Just $ sieve [2..]
$$
LANGUAGE plhaskell;
CREATE FUNCTION primes(int) RETURNS SETOF n_p IMMUTABLE AS
$$
import PGutils (raiseError)
import Data.Int (Int32)
sieve :: [Int32] -> [Int32]
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]
sieve [] = []
primes :: Maybe Int32 -> IO [Maybe (Maybe Int32, Maybe Int32)]
primes Nothing = raiseError "Invalid Null"
primes (Just n) = return $ map Just $ zip [Just i | i <- [1..n]] $ map Just $ sieve [2..]
$$
LANGUAGE plhaskellu;
By running the query
SELECT *
FROM primes(10)
The following is produced
n | p |
---|---|
1 | 2 |
2 | 3 |
3 | 5 |
4 | 7 |
5 | 11 |
6 | 13 |
7 | 17 |
8 | 19 |
9 | 23 |
10 | 29 |
The following functions return a infinite list of prime numbers
CREATE FUNCTION primes() RETURNS SETOF int IMMUTABLE AS
$$
import PGutils (PGm)
import Data.Int (Int32)
sieve :: [Int32] -> [Int32]
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]
primes :: PGm [Maybe Int32]
primes = return $ map Just $ sieve [2..]
$$
LANGUAGE plhaskell;
CREATE FUNCTION primes() RETURNS SETOF int IMMUTABLE AS
$$
import Data.Int (Int32)
sieve :: [Int32] -> [Int32]
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]
primes :: IO [Maybe Int32]
primes = return $ map Just $ sieve [2..]
$$
LANGUAGE plhaskellu;
To generate the first twenty-five prime numbers, run:
SELECT primes()
LIMIT 25
primes |
---|
2 |
3 |
5 |
7 |
11 |
13 |
17 |
19 |
23 |
29 |
31 |
37 |
41 |
43 |
47 |
53 |
59 |
61 |
67 |
71 |
73 |
79 |
83 |
89 |
97 |
The following add one to all non-null elements of an array.
CREATE FUNCTION increment_array(int[]) RETURNS int[] IMMUTABLE AS
$$
import PGutils (PGm, Array, arrayMap)
import Data.Int (Int32)
increment_array :: Maybe (Array (Maybe Int32)) -> PGm (Maybe (Array (Maybe Int32)))
increment_array = return . (fmap $ arrayMap $ fmap succ)
$$
LANGUAGE plhaskell;
CREATE FUNCTION increment_array(int[]) RETURNS int[] IMMUTABLE AS
$$
import PGutils (Array, arrayMap)
import Data.Int (Int32)
increment_array :: Maybe (Array (Maybe Int32)) -> IO (Maybe (Array (Maybe Int32)))
increment_array = return . (fmap $ arrayMap $ fmap succ)
$$
LANGUAGE plhaskellu;
The following demonstrate how to show a notice from within a function.
CREATE FUNCTION forty_two() RETURNS int IMMUTABLE AS
$$
import PGutils (PGm, report, notice)
import Data.Int (Int32)
forty_two :: PGm (Maybe Int32)
forty_two = do
report notice "Don't Panic"
return $ Just 42
$$
LANGUAGE plhaskell;
CREATE FUNCTION forty_two() RETURNS int IMMUTABLE AS
$$
import PGutils (unPGm, report, notice)
import Data.Int (Int32)
forty_two :: IO (Maybe Int32)
forty_two = do
unPGm $ report notice "Don't Panic"
return $ Just 42
$$
LANGUAGE plhaskellu;
The following delete all elements of table t
and return the number of rows removed.
CREATE FUNCTION remove_all() RETURNS bigint VOLATILE AS
$$
import PGutils (PGm, query, QueryResults (DeleteResults))
import Data.Int (Int64)
remove_all :: PGm (Maybe Int64)
remove_all = do
DeleteResults processed <- query "DELETE FROM t" []
return $ Just $ fromIntegral processed
$$
LANGUAGE plhaskell;
CREATE FUNCTION remove_all() RETURNS bigint VOLATILE AS
$$
import PGutils (unPGm, query, QueryResults (DeleteResults))
import Data.Int (Int64)
remove_all :: IO (Maybe Int64)
remove_all = do
DeleteResults processed <- unPGm $ query "DELETE FROM t" []
return $ Just $ fromIntegral processed
$$
LANGUAGE plhaskellu;
The following return the last names of students with the passed first name.
CREATE FUNCTION last_names(text) RETURNS SETOF text IMMUTABLE AS
$$
import PGutils (PGm, query, QueryResults (SelectResults), QueryParam (QueryParamText), QueryResultValue (QueryResultValueText))
import Data.Text (Text)
extract_text :: [QueryResultValue] -> Maybe Text
extract_text [QueryResultValueText name] = name
last_names :: Maybe Text -> PGm [Maybe Text]
last_names first_name = do
SelectResults _processed _header results <- query "SELECT last_name FROM students WHERE first_name = $1" [QueryParamText first_name]
return $ map extract_text results
$$
LANGUAGE plhaskell;
CREATE FUNCTION last_names(text) RETURNS SETOF text IMMUTABLE AS
$$
import PGutils (unPGm, query, QueryResults (SelectResults), QueryParam (QueryParamText), QueryResultValue (QueryResultValueText))
import Data.Text (Text)
extract_text :: [QueryResultValue] -> Maybe Text
extract_text [QueryResultValueText name] = name
last_names :: Maybe Text -> IO [Maybe Text]
last_names first_name = do
SelectResults _processed _header results <- unPGm $ query "SELECT last_name FROM students WHERE first_name = $1" [QueryParamText first_name]
return $ map extract_text results
$$
LANGUAGE plhaskellu;