An educational C-like toy programming language that compiles to x64 binary.
The compiler is a self-contained Python program that weighs about 1000 LoC. It's part of an online tutorial on compilers and interpreters.
The hello world looks like this:
; the write() syscall:
; ssize_t write(int fd, const void *buf, size_t count);
(syscall 1 1 "Hello world!\n" 13)
0
Compile and run the program:
$ ./pl_comp.py ./samples/hello.txt -o ./hello
$ ./hello
Hello world!
The output is a tiny freestanding x64 Linux ELF binary.
$ file hello
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, no section header
$ wc -c hello
288 hello
The syntax is just S-expression, parsing strings is too boring for me.
The semantics are C-like. The only data types are integers and pointers. This should be enough to write any program in.
The peek
command reads data from a pointer and the poke
command writes to a pointer.
; copy data byte by byte
(def (memcpy void) ((dst ptr byte) (src ptr byte) (n int)) (do
(loop n (do
(poke dst (peek src))
(set dst (+ 1 dst))
(set src (+ 1 src))
(set n (- n 1))
))
))
List of control flow structures:
(? cond yes no)
(if cond (then yes blah blah) (else no no no))
(do a b c...)
(loop cond body)
(break)
(continue)
(call f a b c...)
(return val)
Some examples:
(def (fib int) ((n int))
(if (le n 0) (then 0) (else (+ n (call fib (- n 1))))))
(def (fib int) ((n int)) (do
(var r 0)
(loop (gt n 0) (do
(set r (+ r n))
(set n (- n 1))
))
(return r)
))
The only data types are:
byte
: unsigned 8-bit integer.int
: signed 64-bit integer.ptr elem_type
: pointer toelem_type
.
Variable types are automatically inferred:
(var a 123) ; int
(var b 45u8) ; byte
(var p (ptr int)) ; a null pointer to int
(var s "asdf") ; ptr byte
The type of the function return value and the argument must be specified explicitly:
(def (memcpy void) ((dst ptr byte) (src ptr byte) (n int)) (do
; ...
))
int
can be cast to any pointer types and vice versa.
(var i 0x1234) ; int
(var p (cast (ptr int) i)) ; ptr int
(var a (cast (int) (+ 1 p))) ; int
Memory management is very simple at this point, because it doesn't exist at all.
However, the language doesn't prevent you from building your own memory management routines. This usually starts with the mmap
syscall.
(var heap (ptr byte))
; a fake malloc
(def (malloc ptr byte) ((n int)) (do
(if (not heap) (do
; create the heap via mmap()
(var heapsz 1048576) ; 1M
(var prot 3) ; PROT_READ|PROT_WRITE
(var flags 0x22) ; MAP_PRIVATE|MAP_ANONYMOUS
(var fd -1)
(var offset 0)
(var r (syscall 9 0 heapsz prot flags fd offset))
(set heap (cast (ptr byte) r))
))
; just move the heap pointer forward
(var r heap)
(set heap (+ n heap))
(return r)
))
; TODO: figure out how to recycle the memory
(def (free void) ((p ptr byte)) (do))
The Pretty Laughable Language comes with the world's smallest standard library — no standard library — not even a builtin print
function.
But with the ability to make arbitrary syscalls and peek-poke the memory, you can build your own stdlibs. Let's add the print
function:
(def (strlen int) ((s ptr byte)) (do
(var start s)
(loop (peek s) (set s (+ 1 s)))
(return (- s start))
))
(def (print void) ((s ptr byte)) (do
(syscall 1 1 s (call strlen s))
))
(call print "Yes!\n")
0
Here is a more sophisticated program you can play with.
Language features:
- int, byte
- pointer
- syscall
- if-then-else, loop
- function
- nested function, nonlocal variable
- array
- struct, class
- function pointer
Explorations:
- module or
include
directive - macro?
- alternative syntax?
- Windows
- ARM64
- WASM
Optimizations:
- register allocation
- constants
- tail call
TBA
TBA
TBA
To be added.
But you can learn how to do it by reading the source code.
Or you might like the book From Source Code To Machine Code, which this repo is based on.