1.2.2 Applications generation

GitHub last commit

Stages of compilation

Compilation is split into several stages:

The source code is parsed, removing unnecessary whitespace and comments.
From the remaining code, is then tokenized, producing a token stream, where information about keywords and identifiers are collected into a symbol table.

Syntax analysis

The token stream is first analysed and checked against the syntax of the language, ensuring that there are no syntax errors.
In the case of syntax errors, the errors are thrown with diagnostic information.

After the token stream is confirmed as syntax error free, the stream of tokens is parsed to produce an AST (Abstract Syntax Tree).
The AST represents the structure of the program in a tree like structure. This process will result in additional information being added to the symbol table, such as datatypes, scoping rules, etc.

Semantic analysis

The AST is checked for semantic errors. These are any non syntactic errors or logic errors, such as:

Incorrect types
Multiple variable declarations
Undeclared variables

Code generation

The AST is used to produce object code that represents the program functionality.
At this point, the object code is not executable, because it has not undergone linking.

Optimization

The generated object code is optimized. This can result in:

Improved runtime performance
Reduced memory usage
by:
Removing redundant/inaccessible code
Optimizing loops
Switching functions that evaluate to constant values for a constant value.

Linking

When programs are compiled, they often depend on third party code in libraries that need to be discoverable by the program before it can run.
There are two types of linking, static and dynamic

Static Linking

Static linking is where the third party libraries are embedded inside of the executable.
This is done by the compiler, where the object code from the libraries is packaged alongside the executable during link-time, producing a singular executable that contains its own dependencies.

Pros	Cons
The executable is ‘portable’ and can be copied to other systems on its own	Larger binary size
The executable is standalone and requires no other third party libraries	Longer overall compile times
Third party libraries updating doesn’t affect the executable, since it uses its own versions	Could potentially be redundant if the same libraries and versions of those libraries are present on the system

Dynamic Linking

Dynamic linking is the opposite, where the application does not package its own libraries, and relies on them being present in the runtime environment.
This is done by providing the compiled executable with the library’s symbol names, metadata, and references, without copying any of the code from the library. This means that the program is able to call code from the library, if the code was present.

To ensure that the code is present at runtime, the operating system can use a dynamic linker to load the third party shared libraries into memory, then bind the symbols to the correct address in memory where the library’s code is now present.

Dynamic linking can also make use of lazy loading, where the functions of the library are only loaded into memory when requested by the program. This can reduce the initial load time and memory usage, however the first function call will be slower, since you essentially just moved the loading from the starting of the program to now.

Pros	Cons
Multiple applications depending on the same version of the same library can use the same libary file	The library version present at runtime needs to be compatible with the version that the program was compiled with
Saves overall disk space if multiple applications use the same library.	Dynamic linking can result in slower runtime performance
Memory usage can be shared if multiple running applications need to use the same librrary
Easier to update libraries together

OCR H446 Textbook