Skip to content
chengda300 edited this page Apr 9, 2025 · 5 revisions

Welcome to the go-slang wiki!

This wiki serves to host the documentation of the implementation of Go.

Go Programming Language Related Matters

The Go Programming Language Specification can be found here. A Tour of Go can be found here. For the full language, you can install Go here.

Architecture

This implementation of Go uses a virtual machine architecture. The initial implementation is done by Teow Hua Jun and Lim An Jun, who were students of CS4215 in AY2023/24 Semester 2 and it is open source. It can be found in this page. This implementation aims to extend from the initial implementation to implement more elements of the language.

In this virtual machine architecture, programs are written in Go, which will then be parsed by a parser written in PeggyJS to obtain an Abstract Syntax Tree (AST), which comprises of tokens. Afterwards, these tokens are compiled into Virtual Machine Instructions (VMI), in which the runtime system will then execute these instructions from top to bottom. The image below shows the tombstone diagram of the parsing and compilation steps on the left, and the execution steps on the right. tombstone

With this architecture, we do not need to install the language and we can still run programs written in that language! However, it means that we have to write our own parser, compiler and executor.

Automatic Semicolon Insertion

Before the code is actually parsed, there will be an automatic semicolon insertion operation. This is in line with how Go handles programs. In the Go Programming Language Specification, it states that:

Go programs may omit most of these semicolons using the following two rules:

  1. When the input is broken into tokens, a semicolon is automatically inserted into the token stream immediately after a line's final token if that token is
  1. To allow complex statements to occupy a single line, a semicolon may be omitted before a closing ")" or "}".

Parser

The parser is written in PeggyJS, which uses recursive parsing. After automatic semicolon insertion is done, the code will be parsed into an AST which comprises of tokens.

Compilation

With the tokens in the AST, the compiler will start to construct virtual machine instructions and also perform the related checks, such as type checking and namespace checking.

Execution

When the instructions are constructed, the runtime system will start executing these instructions from start to end.

Currently Supported Features

The current implementation supports the following features.

Functions returning multiple values at once

Typically, in most programming languages, they only allow you to return 1 value from a function. If you wanted to have it return multiple values, you would need to wrap it in a data structure and return that data structure. However, in Go, there is no need for that. You can simply assign the same number of variables at the left hand side of the assignment statement as the number of return values by the function call on the right hand side of the statement. The following code will be permissible in Go:

package main
import "fmt"

func help() (int, int) {
    return 2, 6
}

func main() {
    a, b := help()
    fmt.Println(a, b) // prints 2, 6
}

Return values in this implementation is implemented by simply pushing them onto the operand stack (OS). After which, there will be a LoadVariableInstruction to load the variable which is to be assigned that value, then StoreInstruction must be the next instruction to pop the variable and the value to store the value in that variable.

Goroutines

Goroutines are just like threads but they are much more lightweight. They can be used in this way:

package main
import "fmt"

func help() {
    fmt.Println("Hello World")
}

func main() {
    go help()
}

Of course, just like threads in other programming languages, this program will most likely not result in any output, as the main context exits before the goroutine can be executed. For the goroutine to be executed, we have to let the runtime system perform context switching into the goroutine's context before the main context exits. This can be achieved either by forcing the main context to perform busy waiting, or through the use of sync.WaitGroup.

package main
import "fmt"
import "sync"

var wg sync.WaitGroup

func help() {
    fmt.Println("Hello World")
    wg.Done()
}

func main() {
    wg.Add(1)
    go help()
    wg.Wait()
}

sync.WaitGroup has 3 functions implemented: Add(), Wait() and Done().

  • Wait(): Forces a context to block (wait) until the WaitGroup's variable counter is 0
  • Add(): Increments the WaitGroup's variable counter by the specified integer, it cannot be negative
  • Done(): Decrements the WaitGroup's variable counter by 1. Cannot decrement a WaitGroup's variable counter of 0.

In the case of accessing "critical sections" (sections where it is important that only 1 context can access it at any time), we can use sync.Mutex.

package main
import "fmt"
import "sync"

var dollar int
var wg sync.WaitGroup
var mu sync.Mutex

func steal() {
    mu.Lock()
    dollar = dollar - 10
    mu.Unlock()
    wg.Done()
}

func main() {
    dollar = 100
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go steal()
    }
    wg.Wait()
}

sync.Mutex has 2 functions implemented: Lock() and Unlock().

  • Lock(): Locks an unlocked mutex. If the mutex is already locked, the context will be blocked until it is unlocked.
  • Unlock(): Unlocks a locked mutex. If the mutex is already unlocked, it will panic (runtime error).

Defer statements

Defer statements basically defer the operation until the function returns. It is useful and convenient for writing code that will execute after the function ends. Defer statements must be function calls.

Structs

Structs is a data structure found in common programming languages. A struct can be declared either as a declared type or as an anonymous struct.

package main
import "fmt"

type Person struct {
    Name string
    Age int
}

func main() {
    a := Person{Name: "John", Age: 30} // declared type with key
    b := Person{"Tom", 33} // declared type without key, fields must be in the same sequence as the declared type
    c := struct{
        Name string
        Age int
    }{"Jack", 40} // anonymous struct
    fmt.Println(a) // returns {John 30}
    fmt.Println(b) // returns {Tom 33}
    fmt.Println(c) // returns {Jack 40}
}

To access a field in the struct, we use the dot operator. Using the above example:

fmt.Println(a.Name) // returns John
fmt.Println(b.Age) // returns 33
fmt.Println(c.Name) // returns Jack
fmt.Println(c.Age) // returns 40

In the memory system, structs are represented as a StructNode with pointers pointing to its fields. structnode

Pointers

Pointers are variables that store the address of other variables. In Go, we can obtain addresses of either variables or pointers, and indirect (also known as dereference in C) a pointer to get the value of the variable.

To get the pointer, we use the ampersand (&) unary operator. To indirect a pointer, we use the star (*) unary operator.

a := 3
b := &a // pointer to a which stores address of a
c := *b // indirect pointer b to get the value of a

In the case of arrays and structs, we can pass them into functions if functions require arguments of the pointer types instead of the array or struct. If the functions use pointer types as arguments, the underlying array or struct will be mutable (meaning it will also affect the original copy in the caller function). If it is not using a pointer type, then the underlying array or struct will be immutable (meaning it will only affect the copy in the callee function). When we use the dot operator on pointers to arrays or structs, Go will automatically indirect them, so there is no need to use the * unary operator to access the field or array element.

Pointers are represented as a ReferenceNode, where it points to the variable it is pointing to. pointer

Memory Management

The memory system uses a Heap object and uses a buddy memory allocation system. This means that the memory block will be recursively halved until it cannot be halved further (halving causes it to be less than the required memory), then the whole chunk will be allocated.

It uses Nodes (not to be confused with NodeJS, thus we use BaseNode in the codebase), which are basically pointers by themselves so that the runtime system can recognise which data type the variable is. As such, arrays and structs have their own ArrayNode and StructNode to be able to tell the difference between an array or struct and a primitive. (An ArrayNode or StructNode basically stores pointers to its children, which are the elements or fields).

It uses a mark-and-sweep garbage collection system, where unused variables will be marked and swept (freeing the memory allocated to unused variables). The garbage collector is executed when a memory allocation fails.

In the case of structs and arrays, the memory for all the fields or elements must be allocated all at once. It cannot be allocated bit by bit due to the buddy memory allocation system, which will most likely result in fragmentation. Thus, in those case, we will have to get the total size, allocate all at once, and manually tweak each memory location. The diagram on the top shows the incorrect allocation, while the diagram at the bottom shows the correct allocation. image

In the case of structs, this should be the memory allocation:

struct memory