1.1.2 Types of processor
We have many different types of processor, all designed to complete different tasks. Some processors are designed to accelerate specific tasks, while others are designed to be general purpose.
CPUs can be separated into two families, being CISC and RISC.
Processor architectures
Rising from different philosophies, modern processors fall under two groups, being RISC and CISC. They both have different properties, pros, and cons that may make one architecture more suitable than the other.
RISC
RISC (Reduced Instruction Set Computer) is one of the two main CPU architecture in use. It is present in most mobile devices, embedded computers, microcontrollers, and is growing in use in laptops and servers particularly by Apple with their M-series and Qualcomm with their Snapdragon X lineup.
RISC processors maintain the philosophy of one operation per one instruction within one clock cycle.
This results in RISC processors having much fewer and less complicated instructions compared to CISC since they perform less and theres fewer operations that need to be given a dedicated instruction.
Something simple such as adddition could take many instructions such as loading the values from memory and then calling the add instruction.
Additional traits of RISC:
- Easier to pipeline since each instruction is predictable and simple
- RISC programs require more ram to store the additional instructions
- RISC processors are generally more power and heat efficient
- Fewer abstractions
CISC
CISC (Complex Instruction Set Compiter) is the CPU architecture most commonly used in desktops and laptops.
Note
For additional reading on CISC instructions, see this video on strange x86 CPU instructions
CISC processors are significantly more complex (hence the name) compared to RISC processors, as these processors are capable of having single instructions that can perform multiple tasks.
Generally, these instructions wrap around the simpler instructions.
For example, the instruction DPPS ( Dot Product of Packed Single Precision Floating-Point Values) from the AVX SSE4.1 instruction set does significantly more than just one operation.
CISC processors will have more instructions than RISC, since there’s more clock cycles that can be allocated to a single instruction.
Additional traits of CISC:
- Simplifies compilation because the instructions can more closely resemble higher level statements
- Could also make the optimization stage more difficult
- Less heat/power efficient
- Physically larger
- Lower memory usage#
Parallel processing
Parallel processing is where multiple tasks are completed separately from each other, resulting in all of the tasks being completed in a shorter time than if they were to be executed sequentially.
Systems with more cores are capable of executing more tasks in parallel.
For systems without multiple cores, a single core can make use of threading, which is a form of concurrency on a single core.
Caution
Concurrency and parallelisation ARE DIFFERENT.
See this video (2:04-3:07) for the explanation, though I highly recommend watching the whole video.
Feel free to ignore the Rust specific parts. From what I know this is on the specification.
The negatives of parallel processing
Parallel processing may not guarantee a speed increase. Often, it can cause bugs and rarely cause slowdowns.
When performing parallel processing, the task needs to be allocated to the different cores/processors, which induces a small overhead.
In the cases of small tasks being parallelised, the overhead of allocating many different tasks could overshadow the speed benefit of parallelising the task in the first place.
Parallel processing is also significantly harder to program and utilise.
The Rust Programming Language Book phrases this very well:
Splitting the computation in your program into multiple threads to run multiple tasks at the same time can improve performance, but it also adds complexity. Because threads can run simultaneously, there’s no inherent guarantee about the order in which parts of your code on different threads will run. This can lead to problems, such as:
- Race conditions, in which threads are accessing data or resources in an inconsistent order
- Deadlocks, in which two threads are waiting for each other, preventing both threads from continuing
- Bugs that only happen in certain situations and are hard to reproduce and fix reliably
These possible circumstances make development and testing with parallelism very difficult in comparison to single threaded programming. It’s up to the programmer to determine whether it is worth it or not to implement parallelism/concurrency.
Coprocessors and Accelerators
Along with our primary processor (generally the CPU), there can also be additional processors that tasks can be delegated to.
Co-processors are specialised processors that are capable of doing specific tasks much faster than the CPU can. Things like floating point arithmetic, cryptography, matrix multiplication, and other easily parallelised tasks are frequently offloaded to such coprocessors.
Hardware acceleration will often use coprocessors such as a GPU or NPU/TPU, which will offload computationally intensive tasks onto the additional device. This can improve render times and performance overall as these devices are Mathematically intensive tasks like rendering, ML workloads, etc, will almost always be offloaded to the GPU, as it is able to perform orders of magnitude faster than the CPU in these tasks.
Here are some common coprocessors:
Note
You don’t need to know any of these except for the GPU.
| Acronym | Full name | Task |
|---|---|---|
| GPU | Graphics Processing Unit | Rendering graphics, Parallelised arithmetic |
| NPU | Neural Processing Unit | AI and machine learning |
| TPU | Tensor Processing Unit | Variant of an NPU by Google for neural networks |
| QPU | Quantum Processing Unit | Quantum computing |
GPUs
The GPU (Graphics processing unit) is a specialised processor originally designed to accelerate the computation of graphics and 3D space.
GPUs in comparison to CPUs will instead pack the die with thousands of cores to maximise the capabilities of the parallel processing.
Since the only thing the GPU will be doing is completing tasks sent from the CPU, it does not need nearly as much silicon space for administrative parts.
| GPU | CPU |
|---|---|
| More cores | Less cores |
| Simpler cores | complicated cores |
| Specialised for parallel processing | Specialised for sequential processing |
| Computes more specialised tasks | Computes more general tasks |