scc

Simple C99 Compiler
Log | Files | Refs | README | LICENSE

commit 36fbb9b46ef9beed84eb0f14588c8cebc1c92c81
parent 8182a159c71c03a6cd28b9c8651c26db5e1743c4
Author: Roberto E. Vargas Caballero <k0ga@shike2.com>
Date:   Mon, 24 Aug 2015 15:24:42 +0200

Update documentation about intermediate representation

This new documentation explains better the language and it is
updated with all the changes since the original documentation was
written. It also incorporate some of the changes that must be done
to the language as soon as possible.

Diffstat:
cc1/ir.md | 403+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
cc1/opcode.txt | 133-------------------------------------------------------------------------------
2 files changed, 403 insertions(+), 133 deletions(-)

diff --git a/cc1/ir.md b/cc1/ir.md @@ -0,0 +1,403 @@ +# Scc intermediate representation # + +Scc IR tries to be be a simple and easily parseable intermediate +representation, and it makes it a bit terse and criptic. The main +characteristic of the IR is that all the types and operations are +represented with only one letter, so parsing tables can be used +to parse it. + +The language is composed by lines, which represent statements, +and fields in statements are separated by tabulators. Declaration +statements begin in column 0, meanwhile expressions and control +flow begin with a tabulator. When the front end detects an error +it emits ???? and stop of emitting anything else. + +## Types ## + +Types are represented using upper case letters: +* C -- char +* I -- int +* W -- long +* Q -- long long +* M -- unsigned char +* N -- unsigned int +* Z -- unsigned long +* Q -- unsigned long long +* O -- void +* P -- pointer +* F -- function +* V -- vector +* U -- union +* S -- struct +* B -- bool +* J -- float +* D -- Double +* H -- double + +This list is built for the original Z80 backend, where 'int' +had the same size than 'short'. Several types need an identifier +after the type letter, mainly S, F, V and U, to be able to +differentiate between different structs, functions, vectors and +unions (S1, V12 ...). + +## Storage class ## + +Storage class is represented using upper case letters: +* A -- automatic +* R -- register +* G -- public (global variable declared in the module) +* X -- extern (global variable declared in another module) +* Y -- private (file scoped variable) +* T -- local (function scopped static variable) +* M -- member (struct/union member) +* L -- label + +## Declarations/definitions ## + +Variables names are composed by a storage class and an identifier, +A1, R2 or T3. Declarations/definitions are composed by a variable +name, a type and the name of the variable: + + A1 I i + W2 C c + A3 S4 str + +### Type declarations ### + +Some declarations need a previous declaration of the types involved +in the variable declaration. In the case of structs and unions +a '(' means that begins a list of members of the last type declaration +(the front end must ensure that '(; only can follow struct/union +declarations). + +For example the next code: + +> struct foo { +> int i; +> long c; +> } var1; + +will generate the next output: + +> S2 foo +> ( +> M3 I i +> M4 W c +> ) +> G5 S2 var1 + + +## Functions ## + +A function prototype like + +> int printf(char *cmd); + +will generate a type declaration and a variable declaration: + +> F3 P +> X6 F3 printf + +After the type specification of the function (F and an identifier), +is described the type of all the parameters of the function. +A '{' in the first column begins the body for the previously +declared function: For example: + +> int printf(char *p) {} + +will generate + +> G6 F3 printf +> { +> A7 P cmd +> - +> } + +Again, the front end must ensure that '{' appears only after the +declaration of a function. The character '-' marks the separation +between parameters and local variables: + +> int printf(register char *p) {int i;}; + +will generate + +> G6 F3 printf +> { +> R7 P cmd +> - +> A8 I i +> } + + +### Expressions ### + +Expressions are emitted as postorder expressions, making very easy +to parse them and convert them to a tree representation. + +#### Operators #### + +Operators allowed in expressions are: + +* + -- addition +* - -- substraction +* * -- multiplication +* % -- modulo +* / -- division +* l -- left shift +* r -- right shift +* < -- less than +* > -- greather than +* ] -- greather or equal than +* [ -- less or equal than +* = -- equal than +* ! -- different than +* & -- bitwise and +* | -- bitwise or +* ^ -- bitwise xor +* ~ -- bitwise complement +* : -- asignation +* _ -- unary negation +* c -- function call +* p -- parameter +* . -- field +* , -- comma operator +* ? -- ternary operator +* ' -- take address +* a -- logical shortcut and +* o -- logical shortcut or +* @ -- content of pointer + +Assignation has some suboperators: + +* :/ -- divide and assign +* :% -- modulo and assign +* :+ -- addition and assign +* :- -- substraction and assign +* :l -- left shift and assign +* :r -- right shift and assign +* :& -- bitwise and and assign +* :^ -- bitwise xor and assign +* :| -- bitwise or and assign +* ;+ -- post increment +* ;- -- post decrement + +Every operator in an expression has a type descriptor. Example: + +> int +> main(void) +> { +> int i, j; +> i = j+2*3; +> } + +generates: + +> F1 +> G1 F1 main +> { +> - +> A2 I i +> A3 I j +> A2 A3 #I6 +I :I +> } + +A special case of expressions are casts, which are indicated using +two type descriptors together. For example a cast from char to int +is indicated with CI. + +#### Constants #### + +constants are introduced by the character '#'. For example 10 is +translated to #IA (all the constant are emitted in hexadecimal), +where I indicates that is an integer constant. Strings represent +a special case because they are represented with the " character. +The constant "hello" is emiited as "68656C6C6F, + +### Statements ### +#### Jumps ##### + +Jumps have the next form: + +* j L? [expression] + +the optional expression field indicates some condition which +must be satisfied to jump. Example: +> +> int +> main(void) +> { +> int i; +> goto label; +> label: i -= i;; +> } + +generates: + +> F1 +> G1 F1 main +> { +> - +> A2 I i +> j L3 +> L3 +> A2 A2 :- +> } + +Another form of jump is the return statement, which uses the +letter 'y' and an optional expression. For example: + +> int +> main(void) +> { +> return 16; +> } + +produces: + +> F1 +> G1 F1 main +> { +> - +> yI #I10 +> } + + +#### Loops #### + +There is a two special characters that are used to indicate +to the backend that the next statements are part of the body +of a loop: + +* b -- begin of loop +* d -- end of loop + +#### Switch statement #### + +Switches are represented using a table, where it is indicated +the label where jump for every case. Common cases are represented +by 'v', meanwhile default is represented by 'f'. The switch +statement itself is represented is represented by 's' followed +by the label where the jump table is located and the expression +of the switch. For example: + +> int +> func(int n) +> { +> switch (n+1) { +> case 1: +> case 2: +> case 3: +> default: +> ++n; +> } +> } + +generates: + +> F2 I +> G1 F2 func +> { +> A1 I n +> - +> s L4 A1 #I1 + +> L5 +> L6 +> L7 +> L8 +> A1 #I1 :+I +> j L3 +> L4 +> t #4 +> v L7 #I3 +> v L6 #I2 +> v L5 #I1 +> f L8 +> L3 +> } + + +The beginning of the jump table is indicated by the the letter t, +followed by the number of cases (including default case) of the +switch. + +## Resumen ## + +* C -- char +* I -- int +* W -- long +* Q -- long long +* M -- unsigned char +* N -- unsigned int +* Z -- unsigned long +* Q -- unsigned long long +* O -- void +* P -- pointer +* F -- function +* V -- vector +* U -- union +* S -- struct +* B -- bool +* J -- float +* D -- Double +* H -- double +* A -- automatic +* R -- register +* G -- public (global variable declared in the module) +* X -- extern (global variable declared in another module) +* Y -- private (file scoped variable) +* T -- local (function scopped static variable) +* M -- member (struct/union member) +* L -- label +* ( -- begin of struct/enum definition +* ) -- end of struct/enum definition +* { -- end of function body +* } -- end of fucntion body +* - -- end of function parameters +* + -- addition +* - -- substraction +* * -- multiplication +* % -- modulo +* / -- division +* l -- left shift +* r -- right shift +* < -- less than +* > -- greather than +* ] -- greather or equal than +* [ -- less or equal than +* = -- equal than +* ! -- different than +* & -- bitwise and +* | -- bitwise or +* ^ -- bitwise xor +* ~ -- bitwise complement +* : -- asignation +* _ -- unary negation +* c -- function call +* p -- parameter +* . -- field +* , -- comma operator +* ? -- ternary operator +* ' -- take address +* a -- logical shortcut and +* o -- logical shortcut or +* @ -- content of pointer +* :/ -- divide and assign +* :% -- modulo and assign +* :+ -- addition and assign +* :- -- substraction and assign +* :l -- left shift and assign +* :r -- right shift and assign +* :& -- bitwise and and assign +* :^ -- bitwise xor and assign +* :| -- bitwise or and assign +* ;+ -- post increment +* ;- -- post decrement +* j -- jump +* y -- return +* b -- begin of loop +* d -- end of loop +* s -- switch statement +* t -- switch table +* v -- case entry in switch table +* f -- default entry in switch table +* ???? -- front end error diff --git a/cc1/opcode.txt b/cc1/opcode.txt @@ -1,133 +0,0 @@ -Ax -> automatic variable number x -Tx -> static variable number x -Gx -> global variable with name x -Xx -> global function with name x -Yx -> static function with name x -Lx -> label number x -Px -> parameter number x -Mx -> member number x - -@ -> content of variable -#x -> size of type x (R, I, C, W, N, Z, Q, H, Sx, Ux) -#x -> integer constant of value x -##x -> float constant of value x -#%x -> long constant of value x -" -> begin of string, each element is coded in decimal and comma (better in hexa!!) -a -> take address -:x -> assign of type x -+x -> sum of type x --x -> substraction of type x -*x -> multiplication of type x -/x -> division of type x -; -> used in post operators (A2++ -> A2 #1 ;+I) -: -> used in pre operators (++A2 -> A2 #1 :+I) -. -> struct field -_x -> sign negation of type x -~x -> logic negation of type x -%x -> modulo of type x -lx -> left shift of type x -rx -> right shift of type x ->x -> greater than of type x -<x -> less than of type x -[x -> less or equal of type x -]x -> greater or equal of type x -=x -> equal of type x -!x -> not equal of type x -&x -> logical and of type x -|x -> logical or of type x -^x -> logical exor of type x -m -> and (last two stack values) -s -> or (last two stack values) -` -> index array (*(val1 + val2)) -:yx -> operation y and assign of type x -:x -> assign of type x -?x -> ternary of type x (cond val1 val2) -cx -> call to function of type x -px -> pass parameter to function of type x -Fx -> x number of parameters passed in a function call -, -> comma operator (last two stack values) -d -> begin of loop -b -> end of loop -j -> jump (next stack value is the label) -yx -> return of type x -ex -> switch of type x -w -> case (next stack value is label and next is value) -n -> no operation (it is used in ! and in for conditions) -E -> no operation (it is used to mark end of function declaration) - -Conversions: ----------- -Sx -> struct number x -Ux -> union number x -Vx -> vector number x -R -> pointer -I -> integer -C -> char -W -> long -N -> unsigned -Z -> unsigned long -Q -> float -H -> double -F -> function - - -xy -> conversion from y to x - - -Examples: ---------- - -int -main() -{ - int i; - float v = 3.0; - - i = i != v; -} - --> - -Xmain I F E -{ -A1 I -A2 I -A3 Q - A3 ##4130000 QH :Q - A1 A1 QI A3 !Q #1 #0 ?I :I -} - -------------------------------------------- - -struct pepe { - int i, j; - struct p2 { - int k; - } k; -}; - -int -main() -{ - int i; - struct pepe p; - - i += p.i + p.k.k; -} - --> - -S4 ( -M5 I -) -S1 ( -M2 I -M3 I -M6 S4 -) -Xmain I F E -{ -A7 I -A8 S1 - A7 A8 M2 . A8 M6 . M5 . +I :+I -}