What happens when I build code in C?

15. What happens when I build code in C?#

15.1. What is buiding code?#

Building is transforming code from the input format to the final format.

This can mean different things in different contexts. For example:

the course website is built from markdown files in to html output using jupyter-book
a C program is built C source code to executable as the output

We sometimes say that compiling takes code from source to executable, but this process is actually multiple stages and compiling is one of those steps.

We will focus on what has to happen more than how it all happens.

CSC301, 402, 501, 502 go into greater detail on how languages work.

Our goal is to:

(where applicable) give you a preview
get enough understanding of what happens to know where to look when debugging

15.2. Using ssh keys to authenticate#

generate a key pair
store the public key on the server
Request to login, tell server your public key, get back a session ID from the server
if it has that public key, then it generates a random string, encrypts it with your public key and sends it back to your computer
On your computer, it decrypts the message + the session ID with your private key then hashes the message and sends it back
the server then hashes its copy of the message and session ID and if the hash received and calculated match, then you are loggied in

a toy example

cheatsheet on ssh from julia evans

from wizardzines

Lots more networking detals in the full zine available for purchase or I have a copy if you want to borrow it.

15.3. Generating a Key Pair#

We can use ssh-keygen to create a keys.

-f option allows us to specify the file name of the keys.
-t option allows us to specify the encryption algorithm
-b option allows us to specify the size of the key in bits

ssh-keygen -f ~/.ssh/seawulf -t rsa -b 4096

Generating public/private rsa key pair.
/Users/brownsarahm/.ssh/seawulf already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /Users/brownsarahm/.ssh/seawulf
Your public key has been saved in /Users/brownsarahm/.ssh/seawulf.pub
The key fingerprint is:
SHA256:mT+Qs5vkOCRjSj9Lym48FO6fIBinZWebNq7aHjBOkJo brownsarahm@68.105.20.172.s.wireless.uri.edu
The key's randomart image is:
+---[RSA 4096]----+
|                 |
| .               |
|o                |
|o..      +       |
|Eo+.o   S        |
|+O+o+o.  =       |
|+*o+*+  o o      |
| +*Boo.+ o .     |
|.=B+=o..+        |
+----[SHA256]-----+

15.4. Sending the public key to a server#

again -i to specify the file name

ssh-copy-id -i ~/.ssh/seawulf brownsarahm@seawulf.uri.edu

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/Users/brownsarahm/.ssh/seawulf.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
brownsarahm@seawulf.uri.edu's password: 

Number of key(s) added:        1

Now try logging into the machine, with:   "ssh 'brownsarahm@seawulf.uri.edu'"
and check to make sure that only the key(s) you wanted were added.

15.5. Logging in#

To login without usng a password you have to tell ssh which key to use:

ssh -i ~/.ssh/

config           known_hosts.old  seawulf.pub      
known_hosts      seawulf          

ssh -i ~/.ssh/seawulf brownsarahm@seawulf.uri.edu

Last failed login: Fri Oct 25 11:17:23 EDT 2024 from pool-100-40-65-212.prvdri.fios.verizon.net on ssh:notty
There were 4 failed login attempts since the last successful login.
Last login: Thu Oct 24 12:54:46 2024 from 172.20.105.68

pwd

/home/brownsarahm

ls

bash-lesson.tar.gz                           SRR307025_2.fastq
compilec                                     SRR307026_1.fastq
dmel-all-r6.19.gtf                           SRR307026_2.fastq
dmel_unique_protein_isoforms_fb_2016_01.tsv  SRR307027_1.fastq
example                                      SRR307027_2.fastq
gene_association.fb                          SRR307028_1.fastq
SRR307023_1.fastq                            SRR307028_2.fastq
SRR307023_2.fastq                            SRR307029_1.fastq
SRR307024_1.fastq                            SRR307029_2.fastq
SRR307024_2.fastq                            SRR307030_1.fastq
SRR307025_1.fastq                            SRR307030_2.fastq

We will make an empty directory to work in for today.

mkdir compilec

and go into it to work

cd compilec/

15.6. An overview#

15.7. A simple program#

nano hello.c

cat hello.c 

and we will put in a simple hello world program

#include <stdio.h>
void main () {

 printf("Hello world\n");

}

we will confirm that it exists

ls

hello.c

15.8. Preprocessing with gcc#

First we handle the preprocessing which pulls in headers that are included. We will use the compiler gcc

We will use gcc for many steps, and use its options to have it do subsets of what it can possibly do:

-E stops after preprocessing
-o makes it write the .i file and passes the file name for it

gcc -E -o hello.i hello.c

If it succeeds, we see no output, but we can check the folder

ls

hello.c  hello.i

now we have a new file

we can check the size

ls -l

total 24
-rw-r--r--. 1 brownsarahm spring2022-csc392    64 Oct 29 12:58 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 29 13:00 hello.i

If we think that the .i file might be big, what can we use to compare the two to see the impact of preprocesing?

We can inspect what it does using wc

wc -l hello*

hello.c
hello.i
total

we started with just 6 lines of code and we get a lot more after preprocessing

Since it is long, we will fist look at the top

head hello.i

# 1 "hello.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 375 "/usr/include/features.h" 3 4

and the end

tail hello.i

extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
# 943 "/usr/include/stdio.h" 3 4

# 2 "hello.c" 2
void main () {

 printf("Hello world\n");

}

we see that our original program, is at the end of the file, and the beginning is where the include line has been expanded.

15.9. Compiling#

Next we take our preprocessed file and compile it to get assembly code.

Again, we use gcc:

-S tells it to produce assembly
we will use the preprocessed file as input

gcc -S hello.i

but we can see what it output:

ls

hello.c  hello.i  hello.s

we have a new file as well with the .s extension.

we can use man to see docs for any command

man gcc

we get a new file

ls -l

total 28
-rw-r--r--. 1 brownsarahm spring2022-csc392    64 Oct 29 12:58 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 29 13:00 hello.i
-rw-r--r--. 1 brownsarahm spring2022-csc392   433 Oct 29 13:06 hello.s

we can see how long it is in lines too

wc -l hello.s

25 hello.s

this is longer than the source, but not as long as the header. The header contains lots of information that we might need, but the assembly is only what we do.

And it’s manageable, so we inspect it directly:

cat hello.s

	.file	"hello.c"
	.section	.rodata
.LC0:
	.string	"Hello world"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	call	puts
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-44)"
	.section	.note.GNU-stack,"",@progbits

There are many more steps and they are lower level operations, but it is text stored in the file.

Hint

Learning more about assembly languages is a good explore badge topic

15.10. Assembling#

Assembling is to take the assembly code and get object code. Assembly is relatively broad and there are families of assembly code, it is also still written for humans to understand it readily. It’s more complex than source code because it is closer to the hardware. The object code however, is specific instructions to your machine and not human readable.

Again, with gcc:

-c tells it to stop at the object file
-o again gives it the name of the file to write

gcc -c -o hello.o hello.s 

Again, check what it does by looking at files

ls -l

total 32
-rw-r--r--. 1 brownsarahm spring2022-csc392    64 Oct 29 12:58 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 29 13:00 hello.i
-rw-r--r--. 1 brownsarahm spring2022-csc392  1496 Oct 29 13:14 hello.o
-rw-r--r--. 1 brownsarahm spring2022-csc392   433 Oct 29 13:06 hello.s

now we see a new file, the .o

and again check its length

wc -l hello.o

5 hello.o

we can check how many characters and words

wc  hello.o

   5   17 1496 hello.o

it is not even too many characters

wc  hello.c

 6  9 64 hello.c

cat hello.o

ELF>?@@
UH???]?Hello worldGCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-44)zRx
K                                                               A?C
??      hello.cmainputs


???????? .symtab.strtab.shstrtab.rela.text.data.bss.rodata.comment.note.GNU-stack.rela.eh_frame @?0
&PP1P
     90\.B?W?R@
?
	?0a

this is not human readable though

This file is written in binary, but our terminal reads it back in 8 bit chunks. To get an idea, imagine we had a small object language where there were 4 total operations:

pick a memory location (00)
store a value (01)
add (10)
move value to ALU (11)

In this case code like:

a = 1
b = 2
c = a+ b

Would get translated into assembly:

pick memory 0
store 1
pick memory 1
store 2
move to ALU location 0
move to ALU location 1
add ALU

and then object code like:

then the terminal read it in 8 bit chunks

00000101 0001010 11001101 11000000

which would render as characters according to utf-8 or ascii or whatever encoding your system uses as characters numbered: 5,10,205,192

15.11. Linking#

Now we can link it all together; in this program there are not a lot of other depdencies, but this fills in anything from libraries and outputs an executble

once again with gcc:

-o flag specifies the name for output
-lm tells it to link from the .o file.

gcc -o  hello hello.o -lm

ls

hello  hello.c  hello.i  hello.o  hello.s

Finally we can run our program

./hello

Hello world

ls -l

total 44
-rwxr-xr-x. 1 brownsarahm spring2022-csc392  8360 Oct 29 13:30 hello
-rw-r--r--. 1 brownsarahm spring2022-csc392    64 Oct 29 12:58 hello.c
-rw-r--r--. 1 brownsarahm spring2022-csc392 16865 Oct 29 13:00 hello.i
-rw-r--r--. 1 brownsarahm spring2022-csc392  1496 Oct 29 13:14 hello.o
-rw-r--r--. 1 brownsarahm spring2022-csc392   433 Oct 29 13:06 hello.s

./hello

Hello world

15.12. Putting it all together#

We can repeat with a different name and work directly from source to executable:

gcc -o demohello hello.c -lm

ls

demohello  hello  hello.c  hello.i  hello.o  hello.s

and run again.

./demohello

Hello world

15.13. Working with multiple files#

This all looks a bit different if we have our code split across files.

we will make a new file main.c

nano main.c

with the following content

/* Used to illustrate separate compilation.
Created: Joe Zachary, October 22, 1992
Modified:
*/

#include <stdio.h>

void main () {
 int n;
 printf("Please enter a small positive integer: ");
 scanf("%d", &n);
 printf("The sum of the first n integers is %d\n", sum(n));
 printf("The product of the first n integers is %d\n", product(n));
}

Then help.c

nano help.c

/* Used to illustrate separate compilation

Created: Joe Zachary, October 22, 1992
Modified:

*/

/* Requires that "n" be positive. Returns the sum of the
  first "n" integers. */

int sum (int n) {
 int i;
 int total = 0;
 for (i = 1; i <= n; i++)
  total += i;
 return(total);
}


/* Requires that "n" be positive. Returns the product of the
  first "n" integers. */

int product (int n) {
 int i;
 int total = 1;
 for (i = 1; i <= n; i++)
  total *= i;
 return(total);
}

now we can compile each part:

gcc -Wall -g -c  main.c 

main.c:8:6: warning: return type of ‘main’ is not ‘int’ [-Wmain]
 void main () {
      ^
main.c: In function ‘main’:
main.c:12:2: warning: implicit declaration of function ‘sum’ [-Wimplicit-function-declaration]
  printf("The sum of the first n integers is %d\n", sum(n));
  ^
main.c:13:2: warning: implicit declaration of function ‘product’ [-Wimplicit-function-declaration]
  printf("The product of the first n integers is %d\n", product(n));
  ^

Now we preprocess, compile and assemble the helper code:

gcc -Wall -g -c help.c 

then we can link it all together

gcc -o demo main.o help.o -lm

and run our program

./demo

Please enter a small positive integer: 3
The sum of the first n integers is 6
The product of the first n integers is 6

./demo

Please enter a small positive integer: 8
The sum of the first n integers is 36
The product of the first n integers is 40320

and finally exit from seawulf

exit

logout
Connection to seawulf.uri.edu closed.

15.14. Prepare for Next Class#

Review your idethoughts.md from a few weeks ago and add some summary notes.
Think about what features or extensions in your favorite IDE you like the most and that you think others may not know about be prepared to share a small demo of using a feature or explain how it works(as in, open a file that has content in your IDE so that you could show thatt feature).

15.15. Badges#

Review

Create some variations of the hello.c we made in class. Make hello2.c print twice with 2 print commands. Make hello5.c print 5 times with a for loop and hello7.c print 7 times with a for loop. Build them all on the command line and make sure they run correctly.
Write a bash script, assembly.sh to compile each program to assembly and print the number of lines in each file.
Put the output of your script in hello_assembly_compare.md. Add to the file some notes on how they are similar or different based on your own reading of them.

Practice

On Seawulf, modfiy main.c from class to accept the integer as a command line argument instead of via input while running the program. See this tutorial for an example.
Write a bash script demo_test.sh that runs your compiled program for each integer from 10 to 30 (syntax for a range is {start..end} so this would be {10..30})
Write an sbatch script, batchrun.sh to run your script on a compute node and save the output to a file. The sbatch script should compile and link the program and then call the script. see the options
use scp to download your modified main, script files, and output to your local computer and include them in your kwl repo.

15.16. Experience Report Evidence#

15.17. Questions After Today’s Class#

15.17.1. What’s the process like for an interpreted language, like python?#

An interpreter reads and executes each line in a REPL or it runs the script file like how bash runs the shell.

More details of the steps for an interpretted language is a good explore badge topic.

15.17.2. Is it possible to not link everything together?#

If you do not link, then it’s not executable.

15.17.3. what do other compilers have that gcc dont have. Why have different options versus a standard for c++ for example?#

Different compilers will mostly have different optimization. but this is also a good explore badge topic.

15.17.4. are the steps that we did in class are similar when working with languages other than C#

they are pretty common for compiled langauges, but comparing and contrasting with other languages like Rust could be a good explore badge.

15.17.5. What programming languages can we use in nano mode other than C++?#

You can work with any programming language in any text editor, all programming languages are plain text.

15.17.6. What steps does an interpreter abstract? Is it an addition in the beginning of the compilation process or is it an entirely different process?#

It’s different completely, for example most Python interpretters are written in C, so the interpretter is compiled like this process, but then in parses and runs the code interactively.

15.17.7. Is there any benefit to being able to read a low level language?#

Obscure skills can often pay well.

Also, a low level language is needed for working on resource constrained hardware. It can also be used for creating drivers to connect hardware to a computer, like peripherals.