15. What happens when I build code in C?#
15.1. What is buiding code?#
Building is transforming code from the input format to the final format.
This can mean different things in different contexts. For example:
the course website is built from markdown files in to html output using jupyter-book
a C program is built C source code to executable as the output
We sometimes say that compiling takes code from source to executable, but this process is actually multiple stages and compiling is one of those steps.
We will focus on what has to happen more than how it all happens.
CSC301, 402, 501, 502 go into greater detail on how languages work.
Our goal is to:
(where applicable) give you a preview
get enough understanding of what happens to know where to look when debugging
15.2. Using ssh keys to authenticate#
generate a key pair
store the public key on the server
Request to login, tell server your public key, get back a session ID from the server
if it has that public key, then it generates a random string, encrypts it with your public key and sends it back to your computer
On your computer, it decrypts the message + the session ID with your private key then hashes the message and sends it back
the server then hashes its copy of the message and session ID and if the hash received and calculated match, then you are loggied in
Lots more networking detals in the full zine available for purchase or I have a copy if you want to borrow it.
15.3. Generating a Key Pair#
We can use ssh-keygen
to create a keys.
-f
option allows us to specify the file name of the keys.-t
option allows us to specify the encryption algorithm-b
option allows us to specify the size of the key in bits
ssh-keygen -f ~/.ssh/seawulf -t rsa -b 4096
Generating public/private rsa key pair.
/Users/brownsarahm/.ssh/seawulf already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/brownsarahm/.ssh/seawulf
Your public key has been saved in /Users/brownsarahm/.ssh/seawulf.pub
The key fingerprint is:
SHA256:mT+Qs5vkOCRjSj9Lym48FO6fIBinZWebNq7aHjBOkJo brownsarahm@68.105.20.172.s.wireless.uri.edu
The key's randomart image is:
+---[RSA 4096]----+
| |
| . |
|o |
|o.. + |
|Eo+.o S |
|+O+o+o. = |
|+*o+*+ o o |
| +*Boo.+ o . |
|.=B+=o..+ |
+----[SHA256]-----+
15.4. Sending the public key to a server#
again -i
to specify the file name
ssh-copy-id -i ~/.ssh/seawulf brownsarahm@seawulf.uri.edu
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/Users/brownsarahm/.ssh/seawulf.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
brownsarahm@seawulf.uri.edu's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'brownsarahm@seawulf.uri.edu'"
and check to make sure that only the key(s) you wanted were added.
15.5. Logging in#
To login without usng a password you have to tell ssh which key to use:
ssh -i ~/.ssh/
config known_hosts.old seawulf.pub
known_hosts seawulf
ssh -i ~/.ssh/seawulf brownsarahm@seawulf.uri.edu
Last failed login: Fri Oct 25 11:17:23 EDT 2024 from pool-100-40-65-212.prvdri.fios.verizon.net on ssh:notty
There were 4 failed login attempts since the last successful login.
Last login: Thu Oct 24 12:54:46 2024 from 172.20.105.68
pwd
/home/brownsarahm
ls
bash-lesson.tar.gz SRR307025_2.fastq
compilec SRR307026_1.fastq
dmel-all-r6.19.gtf SRR307026_2.fastq
dmel_unique_protein_isoforms_fb_2016_01.tsv SRR307027_1.fastq
example SRR307027_2.fastq
gene_association.fb SRR307028_1.fastq
SRR307023_1.fastq SRR307028_2.fastq
SRR307023_2.fastq SRR307029_1.fastq
SRR307024_1.fastq SRR307029_2.fastq
SRR307024_2.fastq SRR307030_1.fastq
SRR307025_1.fastq SRR307030_2.fastq
We will make an empty directory to work in for today.
mkdir compilec
and go into it to work
cd compilec/
15.6. An overview#
15.7. A simple program#
nano hello.c
cat hello.c
and we will put in a simple hello world program
#include <stdio.h>
void main () {
printf("Hello world\n");
}
we will confirm that it exists
ls
hello.c
15.8. Preprocessing with gcc#
First we handle the preprocessing which pulls in headers that are included. We will use the compiler gcc
We will use gcc
for many steps, and use its options to have it do subsets of what it can possibly do:
-E
stops after preprocessing-o
makes it write the .i file and passes the file name for it
gcc -E -o hello.i hello.c
If it succeeds, we see no output, but we can check the folder
ls
hello.c hello.i
now we have a new file
we can check the size
ls -l
total 24
-rw-r--r--. 1 brownsarahm spring2022-csc392 64 Oct 29 12:58 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 29 13:00 hello.i
If we think that the .i
file might be big, what can we use to compare the two to see the impact of preprocesing?
We can inspect what it does using wc
wc -l hello*
6 hello.c
842 hello.i
848 total
we started with just 6 lines of code and we get a lot more after preprocessing
Since it is long, we will fist look at the top
head hello.i
# 1 "hello.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 375 "/usr/include/features.h" 3 4
and the end
tail hello.i
extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
# 943 "/usr/include/stdio.h" 3 4
# 2 "hello.c" 2
void main () {
printf("Hello world\n");
}
we see that our original program, is at the end of the file, and the beginning is where the include line has been expanded.
15.9. Compiling#
Next we take our preprocessed file and compile it to get assembly code.
Again, we use gcc
:
-S
tells it to produce assemblywe will use the preprocessed file as input
gcc -S hello.i
but we can see what it output:
ls
hello.c hello.i hello.s
we have a new file as well with the .s
extension.
we can use man
to see docs for any command
man gcc
we get a new file
ls -l
total 28
-rw-r--r--. 1 brownsarahm spring2022-csc392 64 Oct 29 12:58 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 29 13:00 hello.i
-rw-r--r--. 1 brownsarahm spring2022-csc392 433 Oct 29 13:06 hello.s
we can see how long it is in lines too
wc -l hello.s
25 hello.s
this is longer than the source, but not as long as the header. The header contains lots of information that we might need, but the assembly is only what we do.
And it’s manageable, so we inspect it directly:
cat hello.s
.file "hello.c"
.section .rodata
.LC0:
.string "Hello world"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-44)"
.section .note.GNU-stack,"",@progbits
There are many more steps and they are lower level operations, but it is text stored in the file.
Hint
Learning more about assembly languages is a good explore badge topic
15.10. Assembling#
Assembling is to take the assembly code and get object code. Assembly is relatively broad and there are families of assembly code, it is also still written for humans to understand it readily. It’s more complex than source code because it is closer to the hardware. The object code however, is specific instructions to your machine and not human readable.
Again, with gcc
:
-c
tells it to stop at the object file-o
again gives it the name of the file to write
gcc -c -o hello.o hello.s
Again, check what it does by looking at files
ls -l
total 32
-rw-r--r--. 1 brownsarahm spring2022-csc392 64 Oct 29 12:58 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 29 13:00 hello.i
-rw-r--r--. 1 brownsarahm spring2022-csc392 1496 Oct 29 13:14 hello.o
-rw-r--r--. 1 brownsarahm spring2022-csc392 433 Oct 29 13:06 hello.s
now we see a new file, the .o
and again check its length
wc -l hello.o
5 hello.o
we can check how many characters and words
wc hello.o
5 17 1496 hello.o
it is not even too many characters
wc hello.c
6 9 64 hello.c
cat hello.o
ELF>?@@
UH???]?Hello worldGCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-44)zRx
K A?C
?? hello.cmainputs
???????? .symtab.strtab.shstrtab.rela.text.data.bss.rodata.comment.note.GNU-stack.rela.eh_frame @?0
&PP1P
90\.B?W?R@
?
?0a
this is not human readable though
This file is written in binary, but our terminal reads it back in 8 bit chunks. To get an idea, imagine we had a small object language where there were 4 total operations:
pick a memory location (00)
store a value (01)
add (10)
move value to ALU (11)
In this case code like:
a = 1
b = 2
c = a+ b
Would get translated into assembly:
pick memory 0
store 1
pick memory 1
store 2
move to ALU location 0
move to ALU location 1
add ALU
and then object code like:
00 00
01 01
00 01
01 10
11 00
11 01
11 00
then the terminal read it in 8 bit chunks
00000101 0001010 11001101 11000000
which would render as characters according to utf-8 or ascii or whatever encoding your system uses as characters numbered: 5,10,205,192
15.11. Linking#
Now we can link it all together; in this program there are not a lot of other depdencies, but this fills in anything from libraries and outputs an executble
once again with gcc
:
-o
flag specifies the name for output-lm
tells it to link from the .o file.
gcc -o hello hello.o -lm
ls
hello hello.c hello.i hello.o hello.s
Finally we can run our program
./hello
Hello world
ls -l
total 44
-rwxr-xr-x. 1 brownsarahm spring2022-csc392 8360 Oct 29 13:30 hello
-rw-r--r--. 1 brownsarahm spring2022-csc392 64 Oct 29 12:58 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 29 13:00 hello.i
-rw-r--r--. 1 brownsarahm spring2022-csc392 1496 Oct 29 13:14 hello.o
-rw-r--r--. 1 brownsarahm spring2022-csc392 433 Oct 29 13:06 hello.s
./hello
Hello world
15.12. Putting it all together#
We can repeat with a different name and work directly from source to executable:
gcc -o demohello hello.c -lm
ls
demohello hello hello.c hello.i hello.o hello.s
and run again.
./demohello
Hello world
15.13. Working with multiple files#
This all looks a bit different if we have our code split across files.
we will make a new file main.c
nano main.c
with the following content
/* Used to illustrate separate compilation.
Created: Joe Zachary, October 22, 1992
Modified:
*/
#include <stdio.h>
void main () {
int n;
printf("Please enter a small positive integer: ");
scanf("%d", &n);
printf("The sum of the first n integers is %d\n", sum(n));
printf("The product of the first n integers is %d\n", product(n));
}
Then help.c
nano help.c
/* Used to illustrate separate compilation
Created: Joe Zachary, October 22, 1992
Modified:
*/
/* Requires that "n" be positive. Returns the sum of the
first "n" integers. */
int sum (int n) {
int i;
int total = 0;
for (i = 1; i <= n; i++)
total += i;
return(total);
}
/* Requires that "n" be positive. Returns the product of the
first "n" integers. */
int product (int n) {
int i;
int total = 1;
for (i = 1; i <= n; i++)
total *= i;
return(total);
}
now we can compile each part:
gcc -Wall -g -c main.c
main.c:8:6: warning: return type of ‘main’ is not ‘int’ [-Wmain]
void main () {
^
main.c: In function ‘main’:
main.c:12:2: warning: implicit declaration of function ‘sum’ [-Wimplicit-function-declaration]
printf("The sum of the first n integers is %d\n", sum(n));
^
main.c:13:2: warning: implicit declaration of function ‘product’ [-Wimplicit-function-declaration]
printf("The product of the first n integers is %d\n", product(n));
^
Now we preprocess, compile and assemble the helper code:
gcc -Wall -g -c help.c
then we can link it all together
gcc -o demo main.o help.o -lm
and run our program
./demo
Please enter a small positive integer: 3
The sum of the first n integers is 6
The product of the first n integers is 6
./demo
Please enter a small positive integer: 8
The sum of the first n integers is 36
The product of the first n integers is 40320
and finally exit from seawulf
exit
logout
Connection to seawulf.uri.edu closed.
15.14. Prepare for Next Class#
Review your
idethoughts.md
from a few weeks ago and add some summary notes.Think about what features or extensions in your favorite IDE you like the most and that you think others may not know about be prepared to share a small demo of using a feature or explain how it works(as in, open a file that has content in your IDE so that you could show thatt feature).
15.15. Badges#
Create some variations of the
hello.c
we made in class. Make hello2.c print twice with 2 print commands. Make hello5.c print 5 times with a for loop and hello7.c print 7 times with a for loop. Build them all on the command line and make sure they run correctly.Write a bash script, assembly.sh to compile each program to assembly and print the number of lines in each file.
Put the output of your script in hello_assembly_compare.md. Add to the file some notes on how they are similar or different based on your own reading of them.
On Seawulf, modfiy main.c from class to accept the integer as a command line argument instead of via input while running the program. See this tutorial for an example.
Write a bash script demo_test.sh that runs your compiled program for each integer from 10 to 30 (syntax for a range is
{start..end}
so this would be{10..30}
)Write an sbatch script, batchrun.sh to run your script on a compute node and save the output to a file. The sbatch script should compile and link the program and then call the script. see the options
use scp to download your modified main, script files, and output to your local computer and include them in your kwl repo.
15.16. Experience Report Evidence#
15.17. Questions After Today’s Class#
15.17.1. What’s the process like for an interpreted language, like python?#
An interpreter reads and executes each line in a REPL or it runs the script file like how bash runs the shell.
More details of the steps for an interpretted language is a good explore badge topic.
15.17.2. Is it possible to not link everything together?#
If you do not link, then it’s not executable.
15.17.3. what do other compilers have that gcc dont have. Why have different options versus a standard for c++ for example?#
Different compilers will mostly have different optimization. but this is also a good explore badge topic.
15.17.4. are the steps that we did in class are similar when working with languages other than C#
they are pretty common for compiled langauges, but comparing and contrasting with other languages like Rust could be a good explore badge.
15.17.5. What programming languages can we use in nano mode other than C++?#
You can work with any programming language in any text editor, all programming languages are plain text.
15.17.6. What steps does an interpreter abstract? Is it an addition in the beginning of the compilation process or is it an entirely different process?#
It’s different completely, for example most Python interpretters are written in C, so the interpretter is compiled like this process, but then in parses and runs the code interactively.
15.17.7. Is there any benefit to being able to read a low level language?#
Obscure skills can often pay well.
Also, a low level language is needed for working on resource constrained hardware. It can also be used for creating drivers to connect hardware to a computer, like peripherals.