13. How does git make a commit?#
13.1. Important#
If you missed a class and want to make up for the experience badge, you must complete the tasks found in “Experience Report Evidence” if found at the end of the notes
Today we will dig into how git really works. This will be a deep dive and provide a lot of details about how git creates a commit. It will reinforce important concepts, which is of practical use when fixing things give you some ideas about how you might fix things when things go wrong.
Later, we will build on this more on the practical side, but these concepts are very important for making sense of the more practical aspects of fixing things in git.
This deep dive in git is to help you build a correct, flexbile understanding of git so that you can use it independently and efficiently. The plumbing commands do not need to be a part of your daily use of git, but they are the way that we can dig in and see what actually happens when git creates a commit.
this is also to serve as an example method you could apply in understanding another complex system
Inspecting a system’s components is a really good way to understand it and correctly understanding it will impact your ability to ask good questions and even look up the right thing to do when you need to fix things.
Also, looking at the parts of git is a good way to reinforce specific design patterns that are common in CS in a practical way. This means that today we will also:
review and practice with the bash commands we have seen so far
see a practical example of hashing
reinforce through examples what a pointer does
navigate to your github inclass repo
Recall: git stores important content in files that it uses like variables.
What is one example of a file in git that works like a pointer?
For example:
cat .git/HEAD
ref: refs/heads/organization
holds the current branch that git will compare the working directory with.
What do we find inside that file?
cat .git/refs/heads/organization
99c73d4f2c0c87c3aef7a61366e56e3894a040a5
We find a hash. If you remember this hash is a commit. We can confirm that by looking at the git log
cat .git/config
[core]
repositoryformatversion = 0
filemode = false
bare = false
logallrefupdates = true
symlinks = false
ignorecase = true
[remote "origin"]
url = https://github.com/compsys-progtools/gh-inclass-AymanBx
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
remote = origin
merge = refs/heads/main
[branch "1-create-an-about-file"]
remote = origin
merge = refs/heads/1-create-an-about-file
[branch "organization"]
remote = origin
merge = refs/heads/organization
stores information about the different branches and remotes.
There are many:
ls .git
COMMIT_EDITMSG ORIG_HEAD hooks/ logs/ refs/
FETCH_HEAD config index objects/
HEAD description info/ packed-refs
What file stores information for git and lives outside of the .git directory?
[x] .gitignore
[ ] HEAD
[ ] config
[ ] ORIG_HEAD
.gitignore
is a file in the working direcotry that contains a list of files and patterns to not track.
We keep it in the working directory because we want it to be tracked by git like other files. So it can be pushed to GitHub also so files that are meant to be ignored can be ignored on all local copies of the repo
We can see it is a hidden file in the working directory with ls -a
ls -a
./ CONTRIBUTING.md helper_functions.py
../ LICENSE.md important_classes.py
.git/ README.md my_secrets/
.github/ about.md philosophy.md
.gitignore abstract_base_class.py scratch.ipynb
.secrets alternative_classes.py setup.py
API.md docs/ tests/
and what it contains:
cat .gitignore
.secrets
my_secrets/*
13.2. Creating a repo from scratch#
We will start in the top level course directory.
cd ..
ls
gh-inclass-AymanBx/ prep/ spring25-kwl-AymanBx/ tiny-book/
Yours should also have your kwl repo, gh inclass repo, course website clone, etc.
We can create an empty repo from scratch using git init <path>
Last time we used an existing directory like git init .
because we were working in the directory that already existed
We can create an empty repo from scratch using git init <path>
Last time we used an existing directory like git init .
, what does that mean about our path when we used the command?
[ ] we had made a shortcut to the content we want to use for the repo already at
.
[x] the folder we wanted to use as the repo was th currnt working directory
[ ] we wanted to make the repo name interactively after running the command
Today we will create a new directory called test
and initialize it as a repo at the same time:
git init test
we get this message again, see context from last week
We can see what it did by first looking at the working directory
ls
gh-inclass-AymanBx/ prep/ spring25-kwl-AymanBx/
test/ tiny-book/
it made a new folder named as we said
and we can go into that directory
cd test/
and then rename the branch
git branch -m main
To clarify we will look at the status
git status
Notice that there are no commits, and no origin.
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)
ls .git
HEAD description info refs
config hooks objects
we can see the basic requirements of an empty repo here.
What does this tell us about the other files we saw in the gh inclass .git directory?
Additional variables that are created along the use of the repo
What command lets us search the file system, meaning the names of files for matches to a pattern?
[ ] search
[ ] grep
[ ] awk
[x] find
[ ] wc
[ ] fsearch
13.3. Searching the file system#
We can use the bash command find
to search the file system note that this does not search the contents of the files, just the names.
find .git/objects/
.git/objects/
.git/objects//pack
.git/objects//info
we have a few items in that directory and the directory itself.
We can limit by type, to only files with the -type
option set to f
find .git/objects/ -type f
And we have no results. We have no objects yet. Because this is an empty repo
Which of the following is not a type of git object
?
[ ] commit
[x] branch
[ ] tree
[ ] blob
13.4. Git Objects#
Remember our 3 types of objects
blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots
13.4.1. How to create an object#
All git objects are files stored with the name that is the hash of the content in the file
Remember git is a content-addressable file systsem… so it uses key- value pairs.
Let’s create our first git object. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our .git directory with this.
The git hash-object
command works on files, but we do not have any files yet. We can create a file, but we do not have to. Remememer, everything is a file.
What file do we always have access to to use for temporary use?
[ ] tmp.md
[x] stdout
[ ] terminal
[ ] temporary
What bash commmand sends its input to stdout?
[ ] print
[ ] show
[x] echo
[ ] tmpstore
[ ] stdstore
13.4.2. Note#
As a fun exercise, for the remainder of the class try to follow along with exactly the same text that I use in commands. Same capitalization and spacing. It isn’t a big deal if you didn’t
When we use things like echo
it writes to the stdout file.
echo "test content"
test content
which shows on our terminal.
What allows us to chain commands together using output from one as input to the next?
[ ] a link (&)
[ ] a redirect (>>)
[x] a pipe (|)
We can us a pipe to connect the stdout of on command to the stdin of the next.
Pipes are an important part of computer sience too. we’re seeing them in context of real uses, and we will keep seeing them. Pipes can connect the std out of one command to the std in of the next.
echo "test content" | git hash-object --stdin
We can break down this command:
git hash-object would take the content you handed to it and merely return the unique key
--stdin
option tells git hash-object to get the content to be processed from stdin instead of a filethe
|
is called a pipe (what we saw before was a redirect) it pipes a process output into the next commandecho
would write to stdout, withthe pip it passes that to std in of thegit-hash
we get back the hash:
d670460b4b4aece5915caf5c68d12f560a9fe3e4
Notice how we all got the same hash. This is because the hash is generated by a hashing algorithm that uses the content we provided to generate the hash
Let’s check if it wrote to the directory.
find .git/objects/ -type f
Now let’s run it again with a slight modification. -w
option tells the command to also write that object to the database
echo "test content" | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4
and we can check if it wrote to the directory.
find .git/objects/ -type f
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
and we see a file that it was supposed to have!
13.4.3. Viewing git objects#
We can try with cat
cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
Notice that the first two characters from the hash were used as a folder name
xK??OR04f(I-.QH??+I?+?K?
This is binary output that we cannot understand. Fortunately, git provides a utility. We can use cat-file
to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (7046
will not work, but d670
will). Because the hash starts with at the name of the folder
cat-file
requires an option -t
is for type
git cat-file -t d670
blob
we see that it is a blob.
Then we can do it with the -p
option for pretty print instead to see the content
git cat-file -p d670
test content
where we see the content we put in to the hashing function
This is the content that we put in, as expected.
13.4.4. Hashing a file#
What allows us to send the outputs of a command to a different file instead of stdout?
[ ] a link (&)
[x] a redirect (>>)
[ ] a pipe (|)
let’s create a file.
echo "version 1" >test.txt
and store it, by hashing it
git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30
we can look at what we have.
find .git/objects/ -type f
we see two objects as expected
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
Now this is the status of our repo.
We can check the type of files with git cat-file
and -t
git cat-file -t 83baa
git cat-file -t d670
blob
it is a blob object as expected
Notice, however, that we only have one file in the working directory.
ls
test.txt
it is the one test.txt, the first blob we made had no file in the working directory associated to it.
the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in a closer relationship that we currently have unless we use plumbling commands, but this is good to build a solid understanding of how the .git
directory relates to your working directory.
git status
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
test.txt
nothing added to commit but untracked files present (use "git add" to track)
So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in a tree and there are no commits that point to that part of the tree.
When working with porcelain commands, what do we do before we can make a commit?
[ ] hash the content
[ ] push to remote
[x] stage the content (put it in the index)
What porcelain command puts content in the index?
[ ] commit
[ ] stash
[x] add
[ ] pull
This is still what our repo looks like
13.5. Updating the Index#
Now, we can add our file as it is to the index.
git update-index --add --cacheinfo 100644 \
83baae61804e65cc73a7201a7252750c76066a30 test.txt
the \
lets us wrap onto a second line.
this the plumbing command
git update-index
updates or in this case creates an index which is the staging area of our repositorythe
--add
option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)--cacheinfo
because the file we’re adding isn’t in your directory but is in the database.in this case, we’re specifying a mode of 100644, which means it’s a normal file.
then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.
finally the file name of that content
Again, we check in with status
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
We have the files staged as expected
Now the file is staged.
Let’s edit it further.
echo "version 2" >> test.txt
Remember >>
allows us to append
We can look at the content to ensure it as expected
cat test.txt
version 1
version 2
So the file has two lines
Now check status again.
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.
We can hash and store this version too.
git hash-object -w test.txt
0c1e7391ca4e59584f8b773ecdbbb9467eba1547
We can then look again at our list of objects.
find .git/objects/ -type f
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
So now our repo has 3 items, all blobs:
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
hashing the object does not impact the index, which is what git status uses
13.5.1. Preparing to Commit#
When we work with porcelain commands, we use add then commit. We have staged the file, which we know is what happens when we add. What else has to happen to make a commit.
We know that commits are comprised of:
a message
author and times stamp info
a pointer to a tree
a pointer to the parent (except the first commit)
We do not have any of these items yet.
Let’s make a tree next.
Now we can write a tree from the index,
git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
and we get a hash
Notice how we didn’t have to pass any arguments to that command. It uses what is in the index to build the tree
Lets examine the tree, first check the type
git cat-file -t d832
tree
it is as expected
and now we can look at its contents
git cat-file -p d832
100644 blob 83baae61804e65cc73a7201a7252750c76066a30 test.txt
Let’s look at the objects now
find .git/objects/ -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
Now this is the status of our repo:
Again, we will check in with git via git status
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
Nothing has changed, making the tree does not yet make the commit
This only keeps track of the objects, there are also still the HEAD that we have not dealt with and the index.
cat .git/HEAD
ref: refs/heads/main
ls .git/refs/heads/
heads folder is still empty. That’s an inducator that as far as git knows. No commits to any branch have been made yet.
13.5.2. Creating a commit manually#
We can echo a commit message through a pipe into the commit-tree plumbing function to commit a particular hashed object.
echo "first commit"
first commit
the git commit-tree
command requires a message via stdin and the tree hash. We will use stdin and a pipe for the message
echo "first commit" | git commit-tree d832
62467cd0c316a93201e98685a9d9f9beed8e6e27
and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user.
Now we check the final list of objects that we have for today
find .git/objects/ -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/62/467cd0c316a93201e98685a9d9f9beed8e6e27
Important
Check that you also hav 5 objects and 4 of them should match mine, the one that would not match is the 188a75e
one it should be different because of the different timestamp and commit author.
We can also look at its type
git cat-file -t 62467
commit
and we can look at the content
git cat-file -p 62467
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author AymanBx <ayman_sandouk@uri.edu> 1742319374 -0400
committer AymanBx <ayman_sandouk@uri.edu> 1742319374 -0400
first commit
Visually, this is what our repo looks like:
13.6. What does git status do?#
compares the working directory to the current state of the active branch
we can see the working directory with:
ls
we can see the active branch in the
HEAD
filewhat is its status?
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
we see it is “on main” this is because we set the branch to main , but since we have not added the commit to main git still doesn’t know we made a commit. Notice that when we use the porcelain command for commit, it does this automatically; the porcelain commands do many things.
Notice, git says we have no commits yet even though we have written a commit.
In our case because we made the commit manually, we did not update the branch.
This is because the main branch does not point to any commit.
We can verify by looking at the HEAD
file
cat .git/HEAD
ref: refs/heads/main
and then viewing that file
cat .git/refs/heads/main
cat: .git/refs/heads/main: No such file or directory
which does not even exist!
ls .git/refs/heads
nothing exists there yet!
But we can see the heads
folder exists
ls .git/refs
heads tags
we can see the objects though:
find .git/objects/ -type f
git cat-file -t 62467
commit
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
This is because git status works off the HEAD file and we have not updated that or set our branch to point to our commit yet.
13.7. Git References#
We can make that file manually
echo 188a75ef66b6a85be0ab68d8575ec27808881dfc > .git/refs/heads/main
git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
no changes added to commit (use "git add" and/or "git commit -a")
We can see that indeed we have one object that is a commit
git cat-file -p 62467
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author AymanBx <ayman_sandouk@uri.edu> 1741922840 -0400
committer AymanBx <ayman_sandouk@uri.edu> 1741922840 -0400
first commit
git cat-file -p d8329
100644 blob 83baae61804e65cc73a7201a7252750c76066a30 test.txt
git cat-file -p 83baa
So we now have HEAD-> main and main -> our commit -> tree –> blob.
13.8. Experience Report Evidence#
Important
You need to have a test repo that matches this for class on tuesday.
Generate your evidence with the following in your test repo
find .git/objects/ -type f > testobj.md
then append the contents of your commit object to that file.
Move the testobj.md
to your kwl repo in the experiences folder.
13.9. Prepare for Next Class#
[ ] Review the GitHub Action file from last lab and make note of what if any syntax in there is unfamilar. (note that link will not work on the rendered website, but will work on badge issues)
[ ] Use quote reply or edit to see how I made a relative path to a location within the repo in this issue. (to see another application of paths)
[ ] Check out the github action marketplace to see other actions that are available and try to get a casual level of understanding of the types of things that people use actions for.
13.10. Experience Report Evidence#
13.11. Badges#
Make a table in gitplumbingreview.md in your KWL repo that relates the two types of git commands we have seen: plumbing and porcelain. The table should have two columns, one for each type of command (plubming and porcelain). Each row should have one git porcelain command and at least one of the corresponding git plumbing command(s). Include two rows:
add
andcommit
.
Read more details about git internals to review what we did in class in greater detail. Make a file gitplumbingdetail.md and create a a table or mermaid diagram that shows the relationship between at least three porcelain commands and their corresponding plumbing commands (generally more than one each).
Create gitislike.md and explain main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby.