C Programming/Compiling
< C ProgrammingHaving covered the basic concepts of C programming, we can now briefly discuss the process of compilation.
Like any programming language, C by itself is completely incomprehensible to a microprocessor. Its purpose is to provide an intuitive way for humans to provide instructions that can be easily converted into machine code that is comprehensible to a microprocessor. The compiler is what takes this code, and translates it into the machine code.
To those new to programming, this seems fairly simple. A naive compiler might read in every source file, translate everything into machine code, and write out an executable. This could work, but has two serious problems. First, for a large project, the computer may not have enough memory to read all of the source code at once. Second, if you make a change to a single source file, you would rather not have to recompile the entire application.
To deal with these problems, compilers break their job down into steps; for each source file (each .c
file), the compiler reads the file, reads the files it references with #include
, and translates it to machine code. The result of this is an "object file" (.o
). Once every object file is made, a "linker" collects all of the object files and writes the actual program. This way, if you change one source file, only that file needs to be recompiled and then the application needs to be re-linked.
Without going into the painful details, it can be beneficial to have a superficial understanding of the compilation process.
Preprocessor
The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control. Many times you will need to give special instructions to your compiler. This is done by inserting preprocessor directives into your code. When you begin compiling your code, a special program called the preprocessor scans the source code and performs simple substitution of tokenized strings for others according to predefined rules. The preprocessor is not a part of the C language.
In C language, all preprocessor directives begin with the hash character (#). You can see one preprocessor directive in the Hello world program introduced in A taste of C:
Example:
#include <stdio.h>
This directive causes the header to be included into your program. Other directives such as #pragma
control compiler settings and macros. The result of the preprocessing stage is a text string. You can think of the preprocessor as a non-interactive text editor that prepares your code for the compilation step.
The language of preprocessor directives is agnostic to the grammar of C, so the C preprocessor can also be used independently to process other kinds of text files.
Syntax Checking
This step ensures that the code is valid and will sequence into an executable program. Under most compilers, you may get messages or warnings indicating potential issues with your program (such as a statement always being true or false, etc.)
When an error is detected in the program, the compiler will normally report the file name and line that is preventing compilation.
Object Code
The compiler produces a machine code equivalent of the source code that can then be linked into the final program. The code itself can't be executed yet, as it has to complete the linking stage.
It's important to note after discussing the basics that compilation is a "one way street". That is, compiling a C source file into machine code is easy, but "decompiling" (turning machine code into the C source that creates it) is not. Decompilers for C do exist, but they rarely create useful code.
Linking
Linking combines the separate object codes into one complete program by integrating libraries and the code and producing either an executable program or a library. Linking is performed by a linker, which is often part of a compiler.
Common errors during this stage are either missing functions, or duplicate functions.
Automation
For large C projects, many programmers choose to automate compilation, both in order to reduce user interaction requirements and to speed up the process by only recompiling modified files.
Most integrated development environments have some kind of project management, which makes such automation very easy. On UNIX-like systems, make and Makefiles are often used to accomplish the same.
Once gcc is installed, it can be called with a list of c source files
that have been written but not yet compiled. eg. there is a main.c file
that includes some functions described in myfun.h and implemented
in myfun_a.c and myfun_b.c , then it is enough to write
gcc main.c myfun_a.c myfun_b.c
myfun.h is included in main.c , but if is in a separate header files directory , then that directory can be listed after a "-I " switch.
In larger programs, Makefiles and gnu make program can compile c files into intermediate files ending with suffix .o which can be linked by gcc.
How to compile each object file is usually described in the Makefile
with the object file as a label ending with a colon followed by two spaces
(tabs are often bad characters) followed by a list of other files that are
dependencies, eg. .c files and .o files compiled in another section,
and on the next line, the invocation of gcc that is required.
typing man make
or
info make
often gives the information
needed to jog the memory on how to use make, and the same
goes for gcc, although gcc has a lot of option switches, the
main ones being -g to generate debugging for gdb to allow
it to show source code during stepping through of the machine
code program. gdb has a 'h' command that shows what it can do,
and is usually started with 'gdb a.out' if a.out is the anonymous
executable machine code file that was compiled by gcc.