Where is the assembly language stored

Assembler crash course (80x86)

What can an assembler do?

An assembler can actually do very little, namely only what the processor understands directly. All the beautiful constructs of higher programming languages, which allow the programmer to translate his algorithms into understandable, (fairly) error-free programs, are missing:
  • no complex instructions
  • no comfortable -, -, - loops, but almost only s
  • no structured data types
  • no subroutines with parameter transfer
  • ...
Examples:
  • The C statement sum = a + b + c + d; is too complicated for an assembler and must therefore be divided into several instructions. The 80x86 assembler can only add two numbers and store the result in one of the two "variables" used (accumulator register). The following C program therefore corresponds more to an assembler program: sum = a; sum = sum + b; sum = sum + c; sum = sum + d; and would look like this with the 80x86 assembler: mov eax, [a] add eax, [b] add eax, [c] add eax, [d]
  • Simple constructs are already too difficult for assembler: if (a == 4711) {...} else {...} and must therefore be expressed using s: if (a! = 4711) goto unequal equal: ... goto continue: unequal: ... continue: ... In the 80x86 assembler it looks like this: cmp eax, 4711 jne unequal equal: ... jmp further unequal: ... further: ...
  • Simple counting loops are better supported by the 80x86 processor. The following C program for (i = 0; i <100; i ++) {sum = sum + a; } looks something like this in the 80x86 assembler: mov ecx, 100 loop: add eax, [a] loop loop The loop command implicitly decrements the register and only executes the jump if the contents of the register are not 0 afterwards .

What is a register?

In the examples mentioned so far, the names of registers were always used instead of the variable names of the C program. A register is a tiny piece of hardware inside the processor that on the 80386 and higher can store up to 32 bits, i.e. 32 digits in the range 0 and 1.

The 80386 has the following registers:

General registers
Surnamecomment
Generally applicable, special meaning for arithmetic commands
generally applicable
general use, special meaning for grinding
generally applicable
Basepointer
Source for string operations
Destination for string operations
Stack pointer
Segment register
Surnamecomment
Code segment
Data segment
Stack segment
any segment
any segment
any segment
Other registers
Surnamecomment
Instruction pointer
Flags

The lower two bytes of the registers,, and have their own names, the register looks like this:


What is memory?

Most of the time, there are not enough registers to solve a problem. In this case, the main memory of the computer must be accessed, which can store considerably more information. To the assembler programmer, main memory looks like a huge array of registers that are 8, 16, or 32 bits "wide" as desired. The smallest addressable unit is therefore a byte (= 8 bits). Therefore, the size of the memory is also measured in bytes. In order to be able to access a particular entry of the "main memory" array, the programmer must enter the index, i. H. the address of the entry. The first byte of the main memory gets the address 0, the second the address 1 etc.

Variables can be created in an assembler program by assigning a label to a memory address and reserving memory space of the desired size.

[SECTION .data] greeting: db 'hello, world' misfortune: dw 13 million: dd 1000000 [SECTION .text] mov ax, [million] ...

What is a stack?

You don't always want to come up with a new label just to save the value of a register for a short time; for example, because you need the register for a certain instruction, but you don't want to lose the old value. In this case, you want something like a scrap of paper. You get that with the stack. The stack is actually nothing more than a piece of the main memory, only that fixed addresses are not used there, but the data to be saved is simply always written on top of it () or fetched down from above (). So it is very easy to access, provided you remember the order in which the data was put on the stack. A special register that Stack pointer , always points to the top element of the stack. Since we can only transfer 32 bits at a time, the stack is shown in the following figure as four bytes wide.

Addressing types

Most commands of the 80x86 can take their operands either from registers, from the memory or directly from a constant. The following forms are possible for the command (among others), whereby the first operand always specifies the target and the second always the source of the copy action:

  • Register addressing: The value of one register is transferred to another.
  • Direct addressing: The constant is transferred to the register.
  • Direct addressing: The value in the specified memory location is transferred to the register.
  • Indirect register addressing: The value in the memory location identified by the second register is transferred to the first register.
  • Base register addressing: The value in the memory location resulting from the sum of the contents of the second register and the constants is transferred to the first register.

Note: If the 80x86 processor is operated in real mode (e.g. when working with the MS-DOS operating system), memory addresses are specified by a segment register and an offset. At the operating systems event this is not necessary (it is even wrong) because OOStuBS runs in protected mode and the segment registers have already been initialized for you by us.


Procedures

The concept of function or procedure is known from the higher programming languages. The advantage of this concept over one is that the procedure can be called from any point in the program and the program is then continued at exactly the point that follows after the procedure call. The procedure itself does not need to know from where it was called and where it will continue afterwards. Somehow it happens automatically. But how?

The solution is that not only the data of the program, but also the program itself are in the main memory and therefore each machine code instruction has its own address. In order for the processor to execute a program, it must be Command pointer point to the beginning of the program, i.e. the address of the first machine code instruction in the special register of the Instruction pointer getting charged. The processor will then execute the instruction designated in this way and, in the normal case, then increase the content of the instruction pointer by the length of the instruction in the memory, so that it points to the next machine instruction. In the case of a jump command, the command pointer is not increased or decreased by the length of the command, but by the specified relative target address.

In order to now call a procedure or function (the same in assembler), the procedure is initially the same as for the jump command, except that the old value of the command pointer (+ length of the command) is first written to the stack. At the end of the function, a jump to the address stored on the stack is sufficient to return to the calling program.

With the 80x86, the return address is stored implicitly on the stack with the aid of the command. In the same way, the command implicitly makes a jump to the address on the stack:

; ----- main program -----; main: ... call f1 xy: ...; ----- function f1 f1: ... ret

If the function is to receive parameters, these are usually also written to the stack, of course before the command. Afterwards, of course, they have to be removed again, either with or by moving the stack pointer directly:

push eax; second parameter for f1 push ebx; first parameter for f1 call f1 add esp, 8; Remove parameters from the stack

In order to be able to access the parameters within the function, the Basepointer taken to help. If it is saved right at the beginning of the function and then assigned the value of the stack pointer, the first parameter can always be reached via and the second parameter via, regardless of how many and operations have been used since the start of the function.

f1: push ebp mov ebp, esp ... mov ebx, [ebp + 8]; 1. Load parameters into ebx mov eax, [ebp + 12]; 2. Load parameters into eax ... pop ebp ret


Volatile and non-volatile registers / connection to C

So that functions can be called from different places in the assembler program, it is important to determine which register contents can be changed by the function and which must still (or again) have the old value when the function is exited. It is of course safest to save all registers that the function needs to perform its task on the stack at the beginning of the function and to reload them immediately before exiting the function.

The assembler programs generated by the GNU C compiler, however, follow a slightly different strategy: They assume that many registers are only used for a short time anyway, for example as counting variables in small loops, or to set the parameters for a function to the Write stack. Here it would be a waste to laboriously save the already outdated values ​​at the beginning of a function and restore them at the end. Since there is no way of looking at a register as to whether its content is valuable or not, the developers of the GNU C compiler simply specified that the registers, and basically as fleeting Registers are to be considered whose content can simply be overwritten. The register also has a special role: It supplies the return value of the function (if necessary). On the other hand, the values ​​of the other registers must be saved before they can be overwritten by a function. You will therefore non-volatile Called register.