How does git make a commit?

13. How does git make a commit?#

13.1. Important#

If you missed a class and want to make up for the experience badge, you must complete the tasks found in “Experience Report Evidence” if found at the end of the notes

Today we will dig into how git really works. This will be a deep dive and provide a lot of details about how git creates a commit. It will reinforce important concepts, which is of practical use when fixing things give you some ideas about how you might fix things when things go wrong.

Later, we will build on this more on the practical side, but these concepts are very important for making sense of the more practical aspects of fixing things in git.

This deep dive in git is to help you build a correct, flexbile understanding of git so that you can use it independently and efficiently. The plumbing commands do not need to be a part of your daily use of git, but they are the way that we can dig in and see what actually happens when git creates a commit.

this is also to serve as an example method you could apply in understanding another complex system

Inspecting a system’s components is a really good way to understand it and correctly understanding it will impact your ability to ask good questions and even look up the right thing to do when you need to fix things.

Also, looking at the parts of git is a good way to reinforce specific design patterns that are common in CS in a practical way. This means that today we will also:

review and practice with the bash commands we have seen so far
see a practical example of hashing
reinforce through examples what a pointer does

navigate to your github inclass repo

Recall: git stores important content in files that it uses like variables.

What is one example of a file in git that works like a pointer?

For example:

cat .git/HEAD 

ref: refs/heads/organization

holds the current branch that git will compare the working directory with.

What do we find inside that file?

cat .git/refs/heads/organization

99c73d4f2c0c87c3aef7a61366e56e3894a040a5

We find a hash. If you remember this hash is a commit. We can confirm that by looking at the git log

cat .git/config

[core]
        repositoryformatversion = 0
        filemode = false
        bare = false
        logallrefupdates = true
        symlinks = false
        ignorecase = true
[remote "origin"]
        url = https://github.com/compsys-progtools/gh-inclass-AymanBx
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
        remote = origin
        merge = refs/heads/main
[branch "1-create-an-about-file"]
        remote = origin
        merge = refs/heads/1-create-an-about-file
[branch "organization"]
        remote = origin
        merge = refs/heads/organization

stores information about the different branches and remotes.

There are many:

ls .git

COMMIT_EDITMSG  ORIG_HEAD    hooks/  logs/        refs/
FETCH_HEAD      config       index   objects/
HEAD            description  info/   packed-refs

What file stores information for git and lives outside of the .git directory?

[x] .gitignore
[ ] HEAD
[ ] config
[ ] ORIG_HEAD

.gitignore is a file in the working direcotry that contains a list of files and patterns to not track. We keep it in the working directory because we want it to be tracked by git like other files. So it can be pushed to GitHub also so files that are meant to be ignored can be ignored on all local copies of the repo

We can see it is a hidden file in the working directory with ls -a

ls -a

./          CONTRIBUTING.md         helper_functions.py
../         LICENSE.md              important_classes.py
.git/       README.md               my_secrets/
.github/    about.md                philosophy.md
.gitignore  abstract_base_class.py  scratch.ipynb
.secrets    alternative_classes.py  setup.py
API.md      docs/                   tests/

and what it contains:

cat .gitignore

.secrets
my_secrets/*

13.2. Creating a repo from scratch#

We will start in the top level course directory.

cd ..
ls

gh-inclass-AymanBx/  prep/  spring25-kwl-AymanBx/  tiny-book/

Yours should also have your kwl repo, gh inclass repo, course website clone, etc.

We can create an empty repo from scratch using git init <path>

Last time we used an existing directory like git init . because we were working in the directory that already existed

We can create an empty repo from scratch using git init <path>

Last time we used an existing directory like git init ., what does that mean about our path when we used the command?

[ ] we had made a shortcut to the content we want to use for the repo already at .
[x] the folder we wanted to use as the repo was th currnt working directory
[ ] we wanted to make the repo name interactively after running the command

Today we will create a new directory called test and initialize it as a repo at the same time:

git init test

we get this message again, see context from last week

We can see what it did by first looking at the working directory

ls

gh-inclass-AymanBx/  prep/  spring25-kwl-AymanBx/  
test/  tiny-book/ 

it made a new folder named as we said

and we can go into that directory

cd test/

and then rename the branch

git branch -m main

To clarify we will look at the status

git status

Notice that there are no commits, and no origin.

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

ls .git

HEAD		description	info		refs
config		hooks		objects

we can see the basic requirements of an empty repo here.

What does this tell us about the other files we saw in the gh inclass .git directory?

Additional variables that are created along the use of the repo

What command lets us search the file system, meaning the names of files for matches to a pattern?

[ ] search
[ ] grep
[ ] awk
[x] find
[ ] wc
[ ] fsearch

13.3. Searching the file system#

We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.

find .git/objects/

.git/objects/
.git/objects//pack
.git/objects//info

we have a few items in that directory and the directory itself.

We can limit by type, to only files with the -type option set to f

find .git/objects/ -type f

And we have no results. We have no objects yet. Because this is an empty repo

Which of the following is not a type of git object?

[ ] commit
[x] branch
[ ] tree
[ ] blob

13.4. Git Objects#

Remember our 3 types of objects

blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots

classDiagram class tree{ List: - hash: blob - string: type - string:file name } class commit{ hash: parent hash: tree string: message string: author string: time } class blob{ binary: contents } class object{ hash: name } object <|-- blob object <|-- tree object <|-- commit

13.4.1. How to create an object#

All git objects are files stored with the name that is the hash of the content in the file

Remember git is a content-addressable file systsem… so it uses key- value pairs.

Let’s create our first git object. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our .git directory with this.

The git hash-object command works on files, but we do not have any files yet. We can create a file, but we do not have to. Remememer, everything is a file.

What file do we always have access to to use for temporary use?

[ ] tmp.md
[x] stdout
[ ] terminal
[ ] temporary

What bash commmand sends its input to stdout?

[ ] print
[ ] show
[x] echo
[ ] tmpstore
[ ] stdstore

13.4.2. Note#

As a fun exercise, for the remainder of the class try to follow along with exactly the same text that I use in commands. Same capitalization and spacing. It isn’t a big deal if you didn’t

When we use things like echo it writes to the stdout file.

echo "test content"

test content

which shows on our terminal.

What allows us to chain commands together using output from one as input to the next?

[ ] a link (&)
[ ] a redirect (>>)
[x] a pipe (|)

We can us a pipe to connect the stdout of on command to the stdin of the next.

Pipes are an important part of computer sience too. we’re seeing them in context of real uses, and we will keep seeing them. Pipes can connect the std out of one command to the std in of the next.

echo "test content" | git hash-object --stdin

We can break down this command:

git hash-object would take the content you handed to it and merely return the unique key
--stdin option tells git hash-object to get the content to be processed from stdin instead of a file
the | is called a pipe (what we saw before was a redirect) it pipes a process output into the next command
echo would write to stdout, withthe pip it passes that to std in of the git-hash

we get back the hash:

d670460b4b4aece5915caf5c68d12f560a9fe3e4

Notice how we all got the same hash. This is because the hash is generated by a hashing algorithm that uses the content we provided to generate the hash

Let’s check if it wrote to the directory.

find .git/objects/ -type f

Now let’s run it again with a slight modification. -w option tells the command to also write that object to the database

echo "test content" | git hash-object -w --stdin

d670460b4b4aece5915caf5c68d12f560a9fe3e4

and we can check if it wrote to the directory.

find .git/objects/ -type f

.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

and we see a file that it was supposed to have!

13.4.3. Viewing git objects#

We can try with cat

cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

Notice that the first two characters from the hash were used as a folder name

xK??OR04f(I-.QH??+I?+?K?

This is binary output that we cannot understand. Fortunately, git provides a utility. We can use cat-file to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (7046 will not work, but d670 will). Because the hash starts with at the name of the folder

cat-file requires an option -t is for type

git cat-file -t d670

blob

we see that it is a blob.

Then we can do it with the -p option for pretty print instead to see the content

git cat-file -p d670

test content

where we see the content we put in to the hashing function

This is the content that we put in, as expected.

13.4.4. Hashing a file#

What allows us to send the outputs of a command to a different file instead of stdout?

[ ] a link (&)
[x] a redirect (>>)
[ ] a pipe (|)

let’s create a file.

echo "version 1" >test.txt

and store it, by hashing it

git hash-object -w test.txt 

83baae61804e65cc73a7201a7252750c76066a30

we can look at what we have.

find .git/objects/ -type f

we see two objects as expected

.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

Now this is the status of our repo.

classDiagram class d67046{ + "test content" +(blob) } class 83baae{ + version 1 + (blob) }

We can check the type of files with git cat-file and -t

git cat-file -t 83baa

git cat-file -t d670

blob

it is a blob object as expected

Notice, however, that we only have one file in the working directory.

ls

test.txt

it is the one test.txt, the first blob we made had no file in the working directory associated to it.

the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in a closer relationship that we currently have unless we use plumbling commands, but this is good to build a solid understanding of how the .git directory relates to your working directory.

git status

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	test.txt

nothing added to commit but untracked files present (use "git add" to track)

So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in a tree and there are no commits that point to that part of the tree.

When working with porcelain commands, what do we do before we can make a commit?

[ ] hash the content
[ ] push to remote
[x] stage the content (put it in the index)

What porcelain command puts content in the index?

[ ] commit
[ ] stash
[x] add
[ ] pull

This is still what our repo looks like

classDiagram class d67046{ +"test content" +(blob) } class 83baae{ +version 1 +(blob) }

13.5. Updating the Index#

Now, we can add our file as it is to the index.

git update-index --add --cacheinfo 100644 \
 83baae61804e65cc73a7201a7252750c76066a30 test.txt

the \ lets us wrap onto a second line.

this the plumbing command git update-index updates or in this case creates an index which is the staging area of our repository
the --add option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)
--cacheinfo because the file we’re adding isn’t in your directory but is in the database.
in this case, we’re specifying a mode of 100644, which means it’s a normal file.
then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.
finally the file name of that content

Again, we check in with status

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

We have the files staged as expected

Now the file is staged.

Let’s edit it further.

echo "version 2" >> test.txt

Remember >> allows us to append

We can look at the content to ensure it as expected

cat test.txt

version 1
version 2

So the file has two lines

Now check status again.

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.

We can hash and store this version too.

git hash-object -w test.txt

0c1e7391ca4e59584f8b773ecdbbb9467eba1547

We can then look again at our list of objects.

find .git/objects/ -type f

.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

So now our repo has 3 items, all blobs:

classDiagram class d67046{ +"test content" +(blob) } class 83baae{ +version 1 +(blob) } class 0c1e73{ +version 1 +version 2 +(blob) }

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

hashing the object does not impact the index, which is what git status uses

13.5.1. Preparing to Commit#

When we work with porcelain commands, we use add then commit. We have staged the file, which we know is what happens when we add. What else has to happen to make a commit.

We know that commits are comprised of:

a message
author and times stamp info
a pointer to a tree
a pointer to the parent (except the first commit)

We do not have any of these items yet.

Let’s make a tree next.

Now we can write a tree from the index,

git write-tree

d8329fc1cc938780ffdd9f94e0d364e0ea74f579

and we get a hash

Notice how we didn’t have to pass any arguments to that command. It uses what is in the index to build the tree

Lets examine the tree, first check the type

git cat-file -t d832

tree

it is as expected

and now we can look at its contents

git cat-file -p d832

100644 blob 83baae61804e65cc73a7201a7252750c76066a30	test.txt

Let’s look at the objects now

find .git/objects/ -type f

.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579

Now this is the status of our repo:

classDiagram class d67046{ +"test content" +(blob) } class 83baae{ +version 1 +(blob) } class d8329f{ +blob: 83baae +filename: test.txt +(tree) } class 0c1e73{ +version 1 +verson 2 +(blob) } d8329f --|> 83baae

Again, we will check in with git via git status

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

Nothing has changed, making the tree does not yet make the commit

This only keeps track of the objects, there are also still the HEAD that we have not dealt with and the index.

cat .git/HEAD

ref: refs/heads/main

ls .git/refs/heads/

heads folder is still empty. That’s an inducator that as far as git knows. No commits to any branch have been made yet.

13.5.2. Creating a commit manually#

We can echo a commit message through a pipe into the commit-tree plumbing function to commit a particular hashed object.

echo "first commit"

first commit

the git commit-tree command requires a message via stdin and the tree hash. We will use stdin and a pipe for the message

echo "first commit" | git commit-tree d832

62467cd0c316a93201e98685a9d9f9beed8e6e27

and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user.

Now we check the final list of objects that we have for today

find .git/objects/ -type f

.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/62/467cd0c316a93201e98685a9d9f9beed8e6e27

Important

Check that you also hav 5 objects and 4 of them should match mine, the one that would not match is the 188a75e one it should be different because of the different timestamp and commit author.

We can also look at its type

git cat-file -t 62467

commit

and we can look at the content

git cat-file -p 62467

tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author AymanBx <ayman_sandouk@uri.edu> 1742319374 -0400
committer AymanBx <ayman_sandouk@uri.edu> 1742319374 -0400

first commit

Visually, this is what our repo looks like:

classDiagram class d67046{ +test content +(blob) } class 83baae{ +version 1 +(blob) } class d8329f{ +blob: 83baae +filename: test.txt +(tree) } class 0c1e73{ +version 1 +verson 2 +(blob) } class 097898{ +tree d8329f +author name +commiter time +(commit) } d8329f --|> 83baae 097898 --|> d8329f

13.6. What does git status do?#

compares the working directory to the current state of the active branch

we can see the working directory with: ls
we can see the active branch in the HEAD file
what is its status?

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   test.txt

we see it is “on main” this is because we set the branch to main , but since we have not added the commit to main git still doesn’t know we made a commit. Notice that when we use the porcelain command for commit, it does this automatically; the porcelain commands do many things.

Notice, git says we have no commits yet even though we have written a commit.

In our case because we made the commit manually, we did not update the branch.

This is because the main branch does not point to any commit.

We can verify by looking at the HEAD file

cat .git/HEAD

ref: refs/heads/main

and then viewing that file

cat .git/refs/heads/main

cat: .git/refs/heads/main: No such file or directory

which does not even exist!

ls .git/refs/heads

nothing exists there yet!

But we can see the heads folder exists

ls .git/refs

heads	tags

we can see the objects though:

find .git/objects/ -type f

git cat-file -t 62467

commit

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

This is because git status works off the HEAD file and we have not updated that or set our branch to point to our commit yet.

13.7. Git References#

We can make that file manually

echo 188a75ef66b6a85be0ab68d8575ec27808881dfc > .git/refs/heads/main

git status

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

no changes added to commit (use "git add" and/or "git commit -a")

We can see that indeed we have one object that is a commit

git cat-file -p 62467

tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author AymanBx <ayman_sandouk@uri.edu> 1741922840 -0400
committer AymanBx <ayman_sandouk@uri.edu> 1741922840 -0400

first commit

git cat-file -p d8329

100644 blob 83baae61804e65cc73a7201a7252750c76066a30	test.txt

git cat-file -p 83baa

So we now have HEAD-> main and main -> our commit -> tree –> blob.

13.8. Experience Report Evidence#

Important

You need to have a test repo that matches this for class on tuesday.

Generate your evidence with the following in your test repo

find .git/objects/ -type f > testobj.md

then append the contents of your commit object to that file.

Move the testobj.md to your kwl repo in the experiences folder.

13.9. Prepare for Next Class#

[ ] Review the GitHub Action file from last lab and make note of what if any syntax in there is unfamilar. (note that link will not work on the rendered website, but will work on badge issues)
[ ] Use quote reply or edit to see how I made a relative path to a location within the repo in this issue. (to see another application of paths)
[ ] Check out the github action marketplace to see other actions that are available and try to get a casual level of understanding of the types of things that people use actions for.

13.10. Experience Report Evidence#

13.11. Badges#

Review

Make a table in gitplumbingreview.md in your KWL repo that relates the two types of git commands we have seen: plumbing and porcelain. The table should have two columns, one for each type of command (plubming and porcelain). Each row should have one git porcelain command and at least one of the corresponding git plumbing command(s). Include two rows: add and commit.

Practice

Read more details about git internals to review what we did in class in greater detail. Make a file gitplumbingdetail.md and create a a table or mermaid diagram that shows the relationship between at least three porcelain commands and their corresponding plumbing commands (generally more than one each).
Create gitislike.md and explain main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby.