#Yarn A small embeddable VM with a custom instruction set and statically allocated heap.
##Overview
- Simple instruction set.
- Sequentially (non-pipelined) execution.
- Single space memory, stack and heap occupy the same memory space.
- 16 registers total, 11 multi-purpose.
- ~25 instructions total.
- Name inspired by cats.
##Usage: Compile yarn:
./build.sh
Assemble your code:
./tools/assemble.py code.asm code.o
Run it:
./bin/yarn code.o
If you want to enable some debug features to help you debug your code, compile yarn with -DYARN_DEBUG. You can pass this argument directly to ./build.sh:
./build.sh -DYARN_DEBUG
##Embedding and Extending Embedding is designed to be simple. Here is a simple example of embedding it:
Y = yarn_init(256*sizeof(yarn_int));
yarn_loadCode(Y,buffer,bufsize);
yarn_execute(Y, -1);
yarn_destroy(Y);
yarn_init
takes one argument, which is the number of bytes to allocate.
Typically you will want to multiply this by the size of the basic int type that
yarn uses. yarn_loadCode
will copy the object code into memory so it can be
executed. yarn_execute
executes the specified number of instructions, or the
whole program if -1 is specified.
System calls are the main way of extending Yarn. After creating the yarn_state,
you can register system calls to be used with it with the yarn_registerSysCall
function. Here is an example of a system call.
static void vyarn_getheight(yarn_state *Y) {
yarn_setRegister(Y, YARN_REG_RETURN, &screenHeight);
}
Y = yarn_init(256*sizeof(yarn_int));
yarn_registerSysCall(Y, 0xA0, vyarn_getheight);
Note you explicitly set the ID for the system call. If there is a preexisting system call with the same ID, it will overwrite that system call and replace it.
##Memory Layout All of the program memory is in one chunk. While the amount of possible memory is set by the environment, if you had 0x400 bytes of memory allocated It could be visualized like this:
Offset | Use |
0x0 | Program memory |
0x4 | ... |
...
0x3B8 | Base of stack |
0x3BC | Register 0xF (%null) |
0x3C0 | Register 0xE (%s5) |
... | ... |
0x3F4 | Register 0x1 (%bse) |
0x3F8 | Register 0x0 (%stk) |
0x3FC | Program status and flags |
0x400 | Not allocated |
##Registers Here is a list of registers
- %ins - Instruction pointer
- %stk - Stack pointer
- %bse - Base pointer
- %ret - Return register
- %c1 - %c6 - Callee save registers
- %s1 - %s5 - Caller save registers (also known as scratch registers)
- %null - Null register, used to indicate no register used
##Instructions
Each instruction type has a specific format it uses for encoding. Given here is
the encoding format with the byte offset given. The last byte of the next
instruction is given as context. If two values are 4 bits each a, they are
separated by a :
and are one byte combined.
| 0x0 | 0x1
| icode:ifun |
These one byte instructions control the program as a whole.
- halt ( stops the program execution )
- pause ( pauses the execution, however depending on the environment it may not get restarted)
- nop ( does nothing for one instruction )
| 0x0 | 0x1 | 0x2 | 0x6
| icode:ifun | rA:rB | d |
If rA is null then d will be used instead of the value in rA.
Here are the available instructions:
- add
- sub
- mul
- div
- divs (signed)
- lsh (left shift)
- rsh (right shift)
- rshs (right shift, signed)
- and
- or
- xor
- not
| 0x0 | 0x1 | 0x2 | 0x6
| icode:ifun | rA:rB | d |
If rA is null then d will be used instead of the value in rA. Although the assembler can figure out which move type is used and only have one move instruction, the VM cannot implicitly figure it out.
- irmov ( immediate to register )
- mrmov ( memory to register )
- rrmov ( register to register )
- rmmov ( register to memory )
| 0x0 | 0x1 | 0x2
| icode:ifun | rA |
These instructions help manipulate the stack pointer.
- push
- pop
| 0x0 | 0x1 | 0x5
| icode:ifun | d |
These instructions will jump to a new instruction location.
- call
- ret
- jmp
- jif (jump if, conditional jump will only jump if conditional flag is true)
- syscall (acts the same as call, d will specify which call to make)
| 0x0 | 0x1 | 0x2
| icode:ifun | rA:rB |
These instructions will compare the two registers and set the conditional flag based on the result. Signed comparisons will interpret the numbers as 2's complement signed integers.
- lt ( < )
- lts ( < signed comparison)
- lte ( <= )
- ltes ( <= signed comparison)
- eq ( == )
- neq ( != )
##Assembler Syntax Primer
The assembler as it stands is a very crude tool that gets the job done. This
syntax outlined here is subject to change when I get around to improving the
assembler. This is a primer for the syntax the assembler uses. You can use ;
for a comment.
The basic format is instruction arg1,arg2
where instruction is a mnemonic
given above, and arg1,arg2 are expressions that usually comprise of registers.
There are several basic types of symbols you can use:
Registers are prefixed with % and one of 16 types (but you typically only use
15), and generally have specific purposes associated with them, however this is
just by convention. Example: %ret
Literals are represented in base 10 or base 16 and are prefixed with $
or
0x
respectively to indicate as such. If trying to indicate a negative number,
the negative sign goes in front of the 0x
and after the $
. Examples: $255
0xFF
$-255
-0xFF
Locations are represented by a symbol in this format: :Name
and get replaced
with the location specified elsewhere in the assembly. They get specified in the
code by Name:
and get set to whatever address the next instruction is at.
Example:
:Callee
ret
:Caller
call :Callee
Memory addresses are given by this format: *(%reg+$offset)
where %reg
is
the specified register, and $offset
is a literal specifying the offset in
memory. Either the register or offset can be omitted. Examples: *(%bse)
*(%bse+$8)
.
##System Calls System calls are used just the same as "call" instructions, except instead of calling a memory address it will give an ID to specify which function to call. An example of getting the available memory and storing it in memory address 0x0 is:
syscall 0x02 ; stores available memory in %ret
mov %ret, *(0x0) ; moves memory from %ret into 0x0
ID | C equivalent declaration | Description |
---|---|---|
0x00 | uint time() | Returns the current time of the computer |
0x01 | uint cycles() | Returns the current amount of cycles executed by the VM |
0x02 | uint availablememory() | Returns the number of bytes available to the VM |
##Roadmap
- Create a proper assembler (with proper errors)
- Allow for some sort of linking/combining asm files
- Debugging tools