1.2.2 Applications generation
Stages of compilation
Compilation is split into several stages:
Lexical analysis
The source code is parsed, removing unnecessary whitespace and comments.
From the remaining code, is then tokenized, producing a token stream, where information about keywords and identifiers are collected into a symbol table.
Syntax analysis
The token stream is first analysed and checked against the syntax of the language, ensuring that there are no syntax errors.
In the case of syntax errors, the errors are thrown with diagnostic information.
After the token stream is confirmed as syntax error free, the stream of tokens is parsed to produce an AST (Abstract Syntax Tree).
The AST represents the structure of the program in a tree like structure. This process will result in additional information being added to the symbol table, such as datatypes, scoping rules, etc.
Semantic analysis
The AST is checked for semantic errors. These are any non syntactic errors or logic errors, such as:
- Incorrect types
- Multiple variable declarations
- Undeclared variables
Code generation
The AST is used to produce object code that represents the program functionality.
At this point, the object code is not executable, because it has not undergone linking.
Optimization
The generated object code is optimized. This can result in:
- Improved runtime performance
- Reduced memory usage
by: - Removing redundant/inaccessible code
- Optimizing loops
- Switching functions that evaluate to constant values for a constant value.
Linking
When programs are compiled, they often depend on third party code in libraries that need to be discoverable by the program before it can run.
There are two types of linking, static and dynamic
Static Linking
Static linking is where the third party libraries are embedded inside of the executable.
This is done by the compiler, where the object code from the libraries is packaged alongside the executable during link-time, producing a singular executable that contains its own dependencies.
| Pros | Cons |
|---|---|
| The executable is ‘portable’ and can be copied to other systems on its own | Larger binary size |
| The executable is standalone and requires no other third party libraries | Longer overall compile times |
| Third party libraries updating doesn’t affect the executable, since it uses its own versions | Could potentially be redundant if the same libraries and versions of those libraries are present on the system |
Dynamic Linking
Dynamic linking is the opposite, where the application does not package its own libraries, and relies on them being present in the runtime environment.
This is done by providing the compiled executable with the library’s symbol names, metadata, and references, without copying any of the code from the library. This means that the program is able to call code from the library, if the code was present.
To ensure that the code is present at runtime, the operating system can use a dynamic linker to load the third party shared libraries into memory, then bind the symbols to the correct address in memory where the library’s code is now present.
Dynamic linking can also make use of lazy loading, where the functions of the library are only loaded into memory when requested by the program. This can reduce the initial load time and memory usage, however the first function call will be slower, since you essentially just moved the loading from the starting of the program to now.
| Pros | Cons |
|---|---|
| Multiple applications depending on the same version of the same library can use the same libary file | The library version present at runtime needs to be compatible with the version that the program was compiled with |
| Saves overall disk space if multiple applications use the same library. | Dynamic linking can result in slower runtime performance |
| Memory usage can be shared if multiple running applications need to use the same librrary | |
| Easier to update libraries together |