How does Multi-threaded Code Run in Assembly Language?

In the traditional model of computing, programmers write their codes in C or other high-level (i.e. human-readable) languages. Then a compiler (e.g. gcc) converts that code into assembly and machine (byte) instructions. This is because the microprocessor can understand only 0s and 1s, whereas the humans tend go crazy trying to make sense of such code. The assembly language is a happy compromise between the two. It presents the machine or byte-instructions in human-readable format.

A few days back I was trying to understand what happens to this model, when multiple cores comes into picture. These days, bioinformatics programmers use multiple threads to take advantage of multiple cores/processors. What happens to such high-level codes, when they are compiled? In other words, how does the code compiled in assembly language deal with multiple processors or cores?

Such intricate knowledge of computer hardware is not necessary unless one wants to make the bioinformatcs programs highly efficient. The kmer-counting program Jellyfish is an example that takes advantage of compare-and-swap hardware option provided by the processors. Also, Heng Li mentioned in his blog that Intel was helping him improve the performance of BWAMEM using specialized understanding of hardware. A third example is DALIGN for long noisy reads written by Gene Myers that takes advantage of cache coherence. Apart from those three examples, several others are accelerating bioinformatics code in GPUs or FPGAs.

Getting back to multi-threaded code in assembly language, the short answer is that the traditional model of computing mentioned above does not work with multi-threaded code. It is not possible to write multi-threaded code in straight assembly, and only the operating system can allow programs to take advantage of many cores. Curious readers can check the following three stackoverflow threads for more details.

  1. What does multicore assembly language look like

As I understand it, each “core” is a complete processor, with its own register set. Basically, the BIOS starts you off with one core running, and then the operating system can “start” other cores by initializing them and pointing them at the code to run, etc.

Synchronization is done by the OS. Generally, each processor is running a different process for the OS, so the multi-threading functionality of the operating system is in charge of deciding which process gets to touch which memory, and what to do in the case of a memory collision.”

  1. How is thread synchronization implemented at the assembly language level?

  2. Is it possible to create threads without system calls in linux x86 gas assembly?

Written by M. //