- Instructor: Daniel McCarthy
-
Introduction
-
Overview of the course
- Module 1
- We create a lexer takes a C source file then converts it into tokens
- We create a parser that creates many nodes forming an abstgract syntax tree
- Module 2
- We create a resolver which is responsible for taking an expression and create a series of rules
- We create a code generator that take in our abstract syntax rule and create NASM x86 32bit assembly language
- Module 3
- We create a semantic validator that ensures valid works in the source file
- We create preprocessor/macro systems so that #include, #ifdef, #define are supported
- Installation and Setup
- Ubuntu with gcc, gdb, make
- Preparing our project
- Helper functions from: https://github.com/nibblebits/DragonCompiler/tree/master/helpers
- Get buffer.c/h and vector.c/h
- main.c
#include <stdio.h>
#include "compiler.h"
int main()
{
int res = compile_file("./test.c", "./test",0);
if (res == COMPILER_FILE_COMPILED_OK)
{
printf("everything compiled file\n");
} else if (res == COMPILER_FAILED_WITH_ERRORS)
{
printf("compile failed\n");
} else
{
printf("Unknown response for compile file\n");
}
return 0;
}
- cprocess.c
#include <stdio.h>
#include <stdlib.h>
#include "compiler.h"
struct compile_process* compile_process_create(const char* filename, const char*
filename_out, int flags)
{
FILE * file = fopen(filename, "r");
if (!file)
{
return NULL;
}
FILE* out_file = NULL;
if (filename_out)
{
out_file = fopen(filename_out, "w");
if (!out_file)
{
return NULL;
}
}
struct compile_process* process = calloc(1, sizeof(struct compile_process));
process-> flags = flags;
process-> cfile.fp = file;
process-> ofile = out_file;
return process;
}
- compiler.c
#include "compiler.h"
int compile_file(const char* filename, const char* out_filename, int flags)
{
struct compile_process* process = compile_process_create(filename, out_filenam
e, flags);
if (!process)
return COMPILER_FAILED_WITH_ERRORS;
// Perform lexical analysis
// Perform parsing
// Perform code generation
return COMPILER_FILE_COMPILED_OK;
}
- compiler.h
#ifndef PEACHCOMPILER_H
#define PEACHCOMPILER_H
#include <stdio.h>
enum {
COMPILER_FILE_COMPILED_OK,
COMPILER_FAILED_WITH_ERRORS
};
struct compile_process
{
// The flags in regards this file should be compiled
int flags;
struct compile_process_input_file
{
FILE* fp;
const char* abs_path;
} cfile;
FILE * ofile;
};
int compile_file(const char* filename, const char* out_filename, int flags);
struct compile_process* compile_process_create(const char* filename, const char*
filename_out, int flags);
#endif
- Makefile
OBJECTS= ./build/compiler.o ./build/cprocess.o
INCLUDES= -I./
all: ${OBJECTS}
gcc main.c ${INCLUDES} -g -o ./main ${OBJECTS}
./build/compiler.o: ./compiler.c
gcc ./compiler.c ${INCLUDES} -o ./build/compiler.o -g -c
./build/cprocess.o: ./cprocess.c
gcc ./cprocess.c ${INCLUDES} -o ./build/cprocess.o -g -c
clean:
rm ./main
rm -rf ${OBJECTS}
- What is Lexical analysis
- Turning strings into tokens
- A token has a type and a value
- Lexer does lexical analysis
- Token types
- Identifier: variable name, function name, structure name
- Keyword: unsigned, int, short, char, break, continue, ...
- Operator: +, *, & { )
- Symbol: "", ;, :
- Number
- String
- Comment
- Newline
- Creating out token structures
struct token
{
int type;
int flags;
union
{
char cval;
const char* sval;
unsigned int lnum;
unsigned long long llnum;
void * any;
};
bool whitespace;
const char* between_brackets;
};
- Preparing our lexer
struct lex_process
{
struct pos pos;
struct vector* token_vec;
struct compile_process* compiler;
};
-
Creating a number token
-
Creating a string token
- What is parsing?
- Why parsers?
- Provides structure for an input file
- Nodes can branch off from each other providing stability in logic
- Makes it eaasier to validate the code
- Makes it easier to compile the input file
- Creating our parser structures
- NODE_TYPE_EXPRESSION
- NODE_TYPE_EXPRESSION_PARENTHESES
- NODE_TYPE_NUMBER
- NODE_TYPE_IDENITFIER
- NODE_TYPE_STRING
- NODE_TYPE_VARIABLE
- NODE_TYPE_VARIABLE_LIST
- NODE_TYPE_FUNCTION
- NODE_TYPE_BODY
- NODE_TYPE_STATEMENT_RETURN
- NODE_TYPE_STATEMENT_IF
- NODE_TYPE_STATEMENT_ELSE
- NODE_TYPE_STATEMENT_WHILE
- Module 1 Summary
- Module 2
- The code generator
- Takes abstract syntax tree and converts it to assembly language or machine code
- Process
- Loops through every root node in the abstract syntax tree
- Generates code for a node. Loops through child nodes where applicable
- Keeps the track of th eocd generation state such as if we are in a structure expression or not
-
Building the fundamentals
-
Understanding the label systems
- Assembly Labels
- Allows you to declare a reference to a particular part of an assembly file. You can JUMP to this label and execute the code or read data from this label
- When we generate labels
- for loop
- if statement
- while loop
- do while loop
- all other statements
- all global variables
- all strings
- Two stacks to be used
- Entry point stack
- Exit point stack
- Conclustion
- State tracking is very important in compiler development
- By tracking labels continue/break labels will know where to jump and we can easily exit a subroutine because the tracking functionality knows which label to jump to
- Q: is JMP expensive?
-
Implementing numerical values for global variables
-
Stackframes
- A technical term tha describes a part of a stack whilst a function is running
- Holds subroutine return addresses
- Holds function arguments
- Holds local variables
- Sample C function:
int sum_and_one(int x, int y) {
int one = 1;
return (x*y) + one;
}
- Corresponding assembly code:
0000000000000000 <sum_and_one>:
0: 55 push rbp ; save old base pointer
1: 48 89 e5 mov rbp,rsp ; move stack pointer to the base pointer
4: 89 7d ec mov DWORD PTR [rbp-0x14],edi ; x
7: 89 75 e8 mov DWORD PTR [rbp-0x18],esi ; y
a: c7 45 fc 01 00 00 00 mov DWORD PTR [rbp-0x4],0x1 ; one
11: 8b 45 ec mov eax,DWORD PTR [rbp-0x14]
14: 0f af 45 e8 imul eax,DWORD PTR [rbp-0x18] ; x*y
18: 89 c2 mov edx,eax
1a: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
1d: 01 d0 add eax,edx ; x*y + one
1f: 5d pop rbp
20: c3 ret
- Using a stack frame mechanism in the compiler will ensure that when we are about to generate a "POP" assembly instruction we know exactly what element we are popping
- The resolver explained