Important notice - 06 April 2013

All eosgarden activities have been closed forever, in order to focus on new projects.
The content of this website will stay as is, for archive purpose, but won't be updated anymore.
eosgarden software are still available for download, but are no longer maintained. Support is no longer available.
 
 

Calling conventions

Author: Jean-David Gadina <macmade(at)eosgarden.com>
Copyright (C) Jean-David Gadina.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.
 
 
Programming languages let us write human readable code, with concepts like variables, functions, objects, methods, etc.
Those concepts don't exists for a computer, and the human readable code needs to be converted into machine code (compilation), so the CPU can execute it.
This article explains how functions are called, from a machine code perspective.
Let's take a simple C program. We define an add function, that adds two numbers, and returns the result:
int add( int x, int y );
int add( int x, int y )
{
return x + y;
}

int main( void )
{
int x = add( 1, 2 );

return 0;
}
To see what instructions will be executed by the CPU, we can ask GCC to produce assembly output from the C file, so we can see how the machine will execute our code:
gcc -Wall -S filename.c
This will generate a filename.s file, with AT&T assembly code.
Let's look what's in this file:
.text
.globl _add
_add:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -8(%rbp), %eax
addl -4(%rbp), %eax
leave
ret
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $2, %esi
movl $1, %edi
call _add
movl %eax, -4(%rbp)
movl $0, %eax
leave
ret
We can see our functions names, prefixed by an underscore (_add and _main). Those are our functions' symbols (the entry points).
The program will start by jumping to the _main symbol. The first three lines creates stack space for the local variables.
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
Then, decimal values 1 and 2 are added to the EDI and ESI registers. These values are the arguments we have passed to the add function, from our C code.
movl $2, %esi
movl $1, %edi
The next instruction, 'call', will jump to the _add symbol, and execute its code.
call _add
Here again, stack space is created (so the stack from main isn't corrupted).
Then, values from the EDI and ESI registers (our arguments) are moved to the stack:
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
The programs then moves one of the arguments to the EAX register, and adds it with the other argument. The result will say in EAX.
movl -8(%rbp), %eax
addl -4(%rbp), %eax
Then, the function returns, and the previous code location is executed (in _main):
leave
ret
Then the main function returns, placing it's return code in the EAX register.
What we've seen here is a perfect example of what's called a calling convention.
We've seen that the arguments are passed through the EDI and ESI registers, and that the return value is stored in the EAX register.
This calling convention is called System V. If you're on Mac OS or Linux, this is certainly the calling convention that's used.
But other calling conventions exists. You may have heard of cdecl, fastcall or this'll.
What's the difference between them? Let's take our previous example, in which we used two arguments. What would happen if we used three arguments?
We've seen that argument one is passed in the EDI register, and that argument two through ESI. What about a third argument?
The System V ABI calling convention specify that arguments needs to be passed respectively through the EDI, ESI, EDX, ECX, R8 and R9 registers. Note that if you're on a 64bits machine, registers will be RDI, RSI, RCX and RCX.
Ok, now we can pass 6 arguments. But what if we have seven, or more?
The System V calling convention specifies that additional arguments are passed on the stack, meaning they'll be pushed, using the 'push' instruction.
There's a little exception with the floating point arguments (float or double). That kind of argument needs to be passed in special registers, which are mmx0 to mmx7. Those are SSE registers.
That's only for the System V calling convention. What about the others?
Let's take cdecl. With that calling convention, no register are used to pass arguments. It means they will all be passed on the stack.
The return value is also on the EAX register.
Let's take our main function, et let's try to write it using the cdecl calling convention.
The code to call our add function will be:
push $1
push $2
call _add
It may seem easier, but remember that stack operations are slower that operations on registers, hence the other calling conventions.
I won't cover the other one here, but if you're interested on other calling conventions, take a look at the Wikipedia page.