What is buiding code?¶
Building is transforming code from the input format to the final format.
This can mean different things in different contexts. For example:
the course website is built from markdown files in to html output using myst
the courseutils docs are built from the python docstrings and markdown with sphinx
a C program is built C source code to executable as the output
We sometimes say that compiling takes code from source to executable, but this process is actually multiple stages and compiling is one of those steps.
We will focus on what has to happen more than how it all happens.
CSC301, 402, 501, 502 go into greater detail on how languages work.
Our, here, goal is to:
(where applicable) give you a preview
get enough understanding of what happens to know where to look when debugging
Using SSH Keys¶
We are going to work on seawulf so that we all have the same compiler.
ssh seawulfLast login: Tue Oct 28 12:58:26 2025 from 172.20.24.214Using an interactive session¶
Last class we worked on the login node, but that is not best practice.
Today we will use an interactive session using the salloc program.
sallocsalloc: Granted job allocation 28290
salloc: Waiting for resource configuration
salloc: Nodes n005 are ready for jobWe will make an empty directory to work in for today.
mkdir compilec
cd compilec/
lsan empty folder!
Overall Build process¶
A simple program¶
We will use nano to write a very small program:
nano hello.c#include <stdio.h>
void main () {
printf("Hello world\n");
}and again, see what is in our file
lshello.cPreprocessing with gcc¶
First we handle the preprocessing which pulls in headers that are included. We will use the compiler gcc
We will use gcc for many steps, and use its options to have it do subsets of what it can possibly do:
-Estops after preprocessing-omakes it write the .i file and passes the file name for it
gcc -E hello.c -o hello.iIf it succeeds, we see no output, but we can check the folder
lshello.c hello.inow we have a new file
We can inspect what it does using wc
wc -l hello.* 6 hello.c
842 hello.i
848 totalwe started with just 6 lines of code and the preprocessing added a lot of lines
Since it is long, we will fist look at the top
head hello.i# 1 "hello.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 375 "/usr/include/features.h" 3 4and the end
tail hello.iextern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
# 943 "/usr/include/stdio.h" 3 4
# 2 "hello.c" 2
void main () {
printf("Hello world\n");
}cat hello.c#include <stdio.h>
void main () {
printf("Hello world\n");
}we see that our original program, is at the end of the file, and the beginning is where the include line has been expanded.
Compiling¶
Next we take our preprocessed file and compile it to get assembly code.
Again, we use gcc:
-Stells it to produce assemblywe will use the preprocessed file as input
gcc -S hello.ibut we can see what it output:
lshello.c hello.i hello.swe have a new file as well with the .s extension.
Again, lets inspect
wc -l hello.* 6 hello.c
842 hello.i
25 hello.s
873 totalthis is longer than the source, but not as long as the header. The header contains lots of information that we might need, but the assembly is only what we do.
And it’s manageable, so we inspect it directly:
cat hello.s .file "hello.c"
.section .rodata
.LC0:
.string "Hello world"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-44)"
.section .note.GNU-stack,"",@progbitsThere are many more steps and they are lower level operations, but it is still human readable text stored in the file.
Learning more about assembly languages is a good explore badge topic
Assembling¶
Assembling is to take the assembly code and get object code. Assembly is relatively broad and there are families of assembly code, it is also still written for humans to understand it readily. It’s more complex than source code because it is closer to the hardware. The object code however, is specific instructions to your machine and not human readable.
Again, with gcc:
-ctells it to stop at the object file-oagain gives it the name of the file to write
gcc -c hello.s -o hello.oAgain, check what it does by looking at files
lshello.c hello.i hello.o hello.snow we see a new file, the .o
and again check its length
wc -l hello.* 6 hello.c
842 hello.i
5 hello.o
25 hello.s
878 totalthis is very short
wc hello.o 5 17 1496 hello.oit is not even too many characters
cat hello.oELF>?@@
UH???]?Hello worldGCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-44)zRx
K A?C
?? hello.cmainputs
???????? .symtab.strtab.shstrtab.rela.text.data.bss.rodata.comment.note.GNU-stack.rela.eh_frame @?0
&PP1P
90\.B?W?R@
?
?0aThis is not human readable, though
Linking¶
Now we can link it all together; in this program there are not a lot of other depdencies, but this fills in anything from libraries and outputs an executble
once again with gcc:
-oflag specifies the name for output-lmtells it to link from the .o file.
gcc -o hello hello.o -lmlshello hello.c hello.i hello.o hello.sIf we look at the permissions
ls -ltotal 44
-rwxr-xr-x. 1 brownsarahm spring2022-csc392 8360 Oct 30 12:59 hello
-rw-r--r--. 1 brownsarahm spring2022-csc392 64 Oct 30 12:40 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 30 12:45 hello.i
-rw-r--r--. 1 brownsarahm spring2022-csc392 1496 Oct 30 12:55 hello.o
-rw-r--r--. 1 brownsarahm spring2022-csc392 433 Oct 30 12:49 hello.swe can see that the executable file was automatically given x permissions for everyone.
the executable is not readable though
cat helloELF>P@@(@8 @@@@@@?88@8@@@ ``48 ``?TT@T@DDP?td??@?@44Q?tdR?td``/lib64/ld-linux-x86-64.so.2GNU GNU?hm|Y??!w\??د5P?S$)
libm.so.6__gmon_start__libc.so.6puts__libc_start_mainGLIBC_2.2.5ui ;?`` `(`H?H?
H??t?CH???5?
?%?
@?%?
h??????%?
h??????%?
h?????1?I??^H??H???PTI???@H??P@H??=@?????fD??`UH-8`H??H??w]øH??t?]?8`????8`UH-8`H??H??H??H???H?H??u]úH??t?]H?ƿ8`????==
uUH???~???]?*
??@H?= t?H??tU?`H????]?{????s???UH???@?????]?AWA??AVI??AUI??ATL?% UH?- SL)?1?H??H??e???H??t?L??L??D??A??H??H9?u?H?[]A\A]A^A_Ðf.???H?H??Hello world0$???|d???LQ????d????????
zRx
???*zRx
$????@FJ
K ??;*3$"D????A?C
Dd????eB?E?E ?E(?H0?H8?M@l8A0A(B BB?????@?@
???o?@@?@ ?@
G
`H?@?@ ???oh@???o???o`@`&@6@F@GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-44)8@T@t@?@?@@`h@ ?@
@8#T@T 1t@t$D???o?@N
?@ ?@?V@G^???o`@k???oh@hz?@?B?@???@??@@?P@Pr??@? ??@??@?4?`???0`04`4?04-h?.`4`?? PNow we can run the program
./helloHello worldsucces!!
Putting it all together¶
We can repeat with a different name for the executable and work directly from source to executable:
gcc -o demohello hello.c -lmcheck what it looks like
ls -ltotal 56
-rwxr-xr-x. 1 brownsarahm spring2022-csc392 8360 Oct 30 13:03 demohello
-rwxr-xr-x. 1 brownsarahm spring2022-csc392 8360 Oct 30 12:59 hello
-rw-r--r--. 1 brownsarahm spring2022-csc392 64 Oct 30 12:40 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 30 12:45 hello.i
-rw-r--r--. 1 brownsarahm spring2022-csc392 1496 Oct 30 12:55 hello.o
-rw-r--r--. 1 brownsarahm spring2022-csc392 433 Oct 30 12:49 hello.sonly an executable no intermediate files. It still did all of those proesses but it didn’t write files for them.
./demohelloHello worldIf we edit the source:
nano hello.c#include <stdio.h>
void main () {
printf("Hello world!\n");
}the executable does not change
./helloHello worlduntil we build it again, which we can do from source
gcc -o demohello hello.c -lmand then run
./demohelloNow it’s changed.
Hello world!Working with multiple files¶
This all looks a bit different if we have our code split across files.
we will make a new file main.c
nano main.c/* Used to illustrate separate compilation.
Created: Joe Zachary, October 22, 1992
Modified:
*/
#include <stdio.h>
void main () {
int n;
printf("Please enter a small positive integer: ");
scanf("%d", &n);
printf("The sum of the first n integers is %d\n", sum(n));
printf("The product of the first n integers is %d\n", product(n));
}Then help.c
nano help.c/* Used to illustrate separate compilation
Created: Joe Zachary, October 22, 1992
Modified:
*/
/* Requires that "n" be positive. Returns the sum of the
first "n" integers. */
int sum (int n) {
int i;
int total = 0;
for (i = 1; i <= n; i++)
total += i;
return(total);
}
/* Requires that "n" be positive. Returns the product of the
first "n" integers. */
int product (int n) {
int i;
int total = 1;
for (i = 1; i <= n; i++)
total *= i;
return(total);
}First we will compile and assemble the main.c
gcc -Wall -g -c main.cmain.c:8:6: warning: return type of ‘main’ is not ‘int’ [-Wmain]
void main () {
^
main.c: In function ‘main’:
main.c:12:2: warning: implicit declaration of function ‘sum’ [-Wimplicit-function-declaration]
printf("The sum of the first n integers is %d\n", sum(n));
^
main.c:13:2: warning: implicit declaration of function ‘product’ [-Wimplicit-function-declaration]
printf("The product of the first n integers is %d\n", product(n));
^we get some warnings, but that is okay
next we do the same for the helpers
gcc -Wall -g -c help.cfinally we link them togehter
gcc -o demo -lm main.o help.onow it runs:
./demoPlease enter a small positive integer: 5
The sum of the first n integers is 15
The product of the first n integers is 120we can modify one part
nano main.c/* Used to illustrate separate compilation.
Created: Joe Zachary, October 22, 1992
Modified:
*/
#include <stdio.h>
void main () {
int n;
printf("Enter a small positive integer: ");
scanf("%d", &n);
printf("The sum of the first n integers is %d\n", sum(n));
printf("The product of the first n integers is %d\n", product(n));
}We need to recompile and reassemble that part.
gcc -Wall -g -c main.cmain.c:8:6: warning: return type of ‘main’ is not ‘int’ [-Wmain]
void main () {
^
main.c: In function ‘main’:
main.c:12:2: warning: implicit declaration of function ‘sum’ [-Wimplicit-function-declaration]
printf("The sum of the first n integers is %d\n", sum(n));
^
main.c:13:2: warning: implicit declaration of function ‘product’ [-Wimplicit-function-declaration]
printf("The product of the first n integers is %d\n", product(n));
^and re-link, but we do not have to recompile or reassemble the help.c file; the orignal object file works well.
gcc -o demo -lm main.o help.oand we can run the code
./demoEnter a small positive integer: 7
The sum of the first n integers is 28
The product of the first n integers is 5040Why this is important¶
The build process includes different steps, so an error at different steps tell you to look in differen places for the source of the problem.
Consider the following:
Where should you look if it says a linking error?
If it’s a compiler error, where should you look?
What step would catch syntax errors?
Having a modular process means that for large, complex code bases, the parts can be split up. It also means that if you only change one part of the code you only need to recompile that part. For complex code the compilation and the optimizations that happen at compile time can take time. That means you dont’ have to that all the time.
Efficient code development means not only less waiting for you, but a smaller environmental impact while you work and when your code is distributed.
Prepare for Next Class¶
Review the notes about floats to prepare for lab.
Think about what you know about how computer execute code to prepare for class/
Badges¶
Review the notes from today
Create some variations of the
hello.cwe made in class. Make hello2.c print twice with 2 print commands. Make hello5.c print 5 times with a for loop and hello7.c print 7 times with a for loop. Build them all on the command line and make sure they run correctly.Write a bash script, assembly.sh to compile each program to assembly and print the number of lines in each file.
Put the output of your script in hello_assembly_compare.md. Add to the file some notes on how they are similar or different based on your own reading of them.
Review the notes from today
On Seawulf, modfiy main.c from class to accept the integer as a command line argument instead of via input while running the program. See this tutorial for an example.
Write a bash script demo_test.sh that runs your compiled program for each integer from 10 to 30 (syntax for a range is
{start..end}so this would be{10..30})Write an sbatch script, batchrun.sh to run your script on a compute node and save the output to a file. The sbatch script should compile and link the program and then call the script. see the options
use scp to download your modified main, script files, and output to your local computer and include them in your kwl repo.
Create some variations of the
hello.cwe made in class. Make hello2.c print twice with 2 print commands. Make hello5.c print 5 times with a for loop and hello7.c print 7 times with a for loop. Build them all on the command line and make sure they run correctly.Write a bash script, assembly.sh to compile each program to assembly and print the number of lines in each file.
Put the output of your script in hello_assembly_compare.md. Add to the file some notes on how they are similar or different based on your own reading of them.
Experience Report Evidence¶
Questions After Today’s Class¶
Is it possible to reverse engineer source code from compiled code?¶
From the assembly, the output of the compiler it is definitely possible, but, it is lossy. Many different source codes can produce equivalent assembly code.
Doing an experiment around this is a good explore badge topic
From the executable it is a lot harder, but not compeltely impossible to get something close.
What happens when you give gcc a “preprocessed” file that isn’t actually preprocessed during the compilation stage?¶
This is an explore badge, try this out and see how it works.
What’s actually inside the object file before linking?¶
The Object code is executable, but mising some dependencies from modular code or precompiled libraries. We will learn more about these instruction sets next!
What higher level courses focus on assembly code?¶
There are not full courses on assembly exclusively here but CSC411 in some sections does (Alvarez/Estevs version does; Daniels version focuses on optimization using Rust instead of authoring in assembly).
301 is the prereq but then 402 has more about how progrmaming languages are developed and how to execute code with interpreters and compilers.