Linking is the final stage of the gcc compilation process.
In the linking process, object files are linked together and all the references to external symbols are resolved, final addresses are assigned to function calls, etc.
In this article we will mainly focus on the following aspects of gcc linking process:
- Object files and how are they linked together
- Code relocations
Before you read this article, make sure you understand all the 4 stages that a C program has to go through before becoming an executable (pre-processing, compilation, assembly and linking).
LINKING OBJECT FILES
Lets understand this first step through an example. First create the following main.c program.
$ vi main.c #include <stdio.h> extern void func(void); int main(void) { printf("\n Inside main()\n"); func(); return 0; }
Next create the following func.c program. In the file main.c we have declared a function func() through keyword ‘extern’ and have defined this function in a separate file func.c
$ vi func.c void func(void) { printf("\n Inside func()\n"); }
Create the object file for func.c as shown below. This will create the file func.o in the current directory.
$ gcc -c func.c
Similarly create the object file for main.c as shown below. This will create the file main.o in the current directory.
$ gcc -c main.c
Now execute the following command to link these two object files to produce a final executable. This will create the file ‘main’ in the current directory.
$ gcc func.o main.o -o main
When you execute this ‘main’ program you’ll see the following output.
$ ./main Inside main() Inside func()
From the above output, it is clear that we were able to link the two object files successfully into a final executable.
What did we acheive when we separated function func() from main.c and wrote it in func.c?
The answer is that here it may not have mattered much if we would have written the function func() in the same file too but think of very large programs where we might have thousands of lines of code. A change to one line of code could result in recompilation of the whole source code which is not accceptable in most cases. So, very large programs are sometimes divided into small peices which are finaly linked together to produce the executable.
The make utility which works on makefiles comes into the play in most of these situations because this utility knows which source files have been changed and which object files need to be recompiled. The object files whose corresponding source files have not been altered are linked as it is. This makes the compilation process very easy and manageable.
So, now we understand that when we link the two object files func.o and main.o, the gcc linker is able to resolve the function call to func() and when the final executable main is executed, we see the printf() inside the function func() being executed.
Where did the linker find the definition of the function printf()? Since Linker did not give any error that surely means that linker found the definition of printf(). printf() is a function which is declared in stdio.h and defined as a part of standard ‘C’ shared library (libc.so)
We did not link this shared object file to our program. So, how did this work? Use the ldd tool to find out, which prints the shared libraries required by each program or shared library specified on the command line.
Execute ldd on the ‘main’ executable, which will display the following output.
$ ldd main linux-vdso.so.1 => (0x00007fff1c1ff000) libc.so.6 => /lib/libc.so.6 (0x00007f32fa6ad000) /lib64/ld-linux-x86-64.so.2 (0x00007f32faa4f000)
The above output indicates that the main executable depends on three libraries. The second line in the above output is ‘libc.so.6’ (standard ‘C” library). This is how gcc linker is able to resolve the function call to printf().
The first library is required for making system calls while the third shared library is the one which loads all the other shared libraries required by the executable. This library will be present for every executable which depends on any other shared libraries for its execution.
During linking, the command that is internally used by gcc is very long but from users prespective, we just have to write.
$ gcc <object files> -o <output file name>
CODE RELOCATION
Relocations are entries within a binary that are left to be filled at link time or run time. A typical relocation entry says: Find the value of ‘z’ and put that value into the final executable at offset ‘x’
Create the following reloc.c for this example.
$ vi reloc.c extern void func(void); void func1(void) { func(); }
In the above reloc.c we declared a function func() whose definition is still not provided, but we are calling that function in func1().
Create an object file reloc.o from reloc.c as shown below.
$ gcc -c reloc.c -o reloc.o
Use readelf utility to see the relocations in this object file as shown below.
$ readelf --relocs reloc.o Relocation section '.rela.text' at offset 0x510 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000005 000900000002 R_X86_64_PC32 0000000000000000 func - 4 ...
The address of func() is not known at the time we make reloc.o so the compiler leaves a relocation of type R_X86_64_PC32. This relocation indirectly says that “fill the address of the function func() in the final executable at offset 000000000005”.
The above relocation was corresponding to the .text section in the object file reloc.o (again one needs to understand the structure of ELF files to understand various sections) so lets disassemble the .text section using objdump utility:
$ objdump --disassemble reloc.o reloc.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <func1>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: e8 00 00 00 00 callq 9 <func1+0x9> 9: c9 leaveq a: c3 retq
In the above output, the offset ‘5’ (entry with value ‘4’ relative to starting address 0000000000000000) has 4 bytes waiting to be writen with the address of function func().
So, there is a relocation pending for the function func() which will get resolved when we link reloc.o with the object file or library that contains the defination of function func().
Lets try and see whether this relocation gets reolved or not. Here is another file main.c that provides defination of func() :
$ vi main.c #include<stdio.h> void func(void) // Provides the defination { printf("\n Inside func()\n"); } int main(void) { printf("\n Inside main()\n"); func1(); return 0; }
Create main.o object file from main.c as shown below.
$ gcc -c main.c -o main.o
Link reloc.o with main.o and try to produce an executable as shown below.
$ gcc reloc.o main.o -o reloc
Execute objdump again and see whether the relocation has been resolved or not:
$ objdump --disassemble reloc > output.txt
We redirected the output because an executable contains lots and lots of information and we do not want to get lost on stdout.
View the content of the output.txt file.
$ vi output.txt ... 0000000000400524 <func1>: 400524: 55 push %rbp 400525: 48 89 e5 mov %rsp,%rbp 400528: e8 03 00 00 00 callq 400530 <func> 40052d: c9 leaveq 40052e: c3 retq 40052f: 90 nop ...
In the 4th line, we can clearly see that the empty address bytes that we saw earlier are now filled with the address of function func().
To conclude, gcc compiler linking is such a vast sea to dive in that it cannot be covered in one article. Still, this article made an attempt to peel off the first layer of linking process to give you an idea about what happens beneath the gcc command that promises to link different object files to produce an executable.
Comments on this entry are closed.
Very Good. Thanks
Hi,
Very nice and usable article
thanks again for another quality article,
it will be great if you mention at the end a few sources or references that you would recommend for the people who want to know more, with a short comment on each.
@behzad.
Sure I’ll take care of this from now on and will definitely add some references at the end of my articles.
In the last example’s program you are calling func1() but the defined function name is func(). Please correct it.
@Viren
I am calling func1() which is defined in reloc.c.
Consider including in func.c so that it does not throw implicit declaration warning.
very good material
whats the difference between
$ gcc -o
and
$ gcc -c reloc.c -o reloc.o
“-o” operates differently in both lines
could you explain it ?
Thanks,