Today we will dig into how git really works. This will be a deep dive and provide a lot of details about how git creates a commit. It will reinforce important concepts, which are of practical when using git.
You could use the basic commands without understanding how it works, or with even an incorrect understanding. However, to really get the most important advantages out of git and use it fluently, know how it works in great detail is a huge advantage.
Today is also an exercise in careful examination of a system. You will do this a lot in your career, whenever you are debugging something, being able to systematically break it down and examine all of the parts is the central skill.
Later, we will build on this more on the practical side, but these concepts are very important for making sense of the more practical aspects of fixing things in git.
This deep dive in git is to help you build a correct, flexbile understanding of git so that you can use it independently and efficiently. The plumbing commands do not need to be a part of your daily use of git, but they are the way that we can dig in and see what actually happens when git creates a commit.
Inspecting a system’s components is a really good way to understand it and correctly understanding it will impact your ability to ask good questions and even look up the right thing to do when you need to fix things.
Outcomes for today¶
Also, looking at the parts of git is a good way to reinforce specific design patterns that are common in CS in a practical way. This means that today we will also:
review and practice with the bash commands we have seen so far
see a practical example of hashing
reinforce through examples what a pointer does
Review¶
Recall: git stores important content in files that it uses like variables.
navigate to your github inclass repo
cd gh-inclass-fa25-brownsarahm/Some files in git work like pointers:
cat .git/HEADref: refs/heads/organizationholds the current branch
ls .gitCOMMIT_EDITMSG hooks ORIG_HEAD
config index packed-refs
description info REBASE_HEAD
FETCH_HEAD logs refs
HEAD objects.gitignore is a file in the working direcotry that contains alist of files and patterns to not track.
We can see it is a hidden file in the working directory with ls -a
ls -a. docs
.. example.md
.git helper_functions.py
.github important_classes.py
.gitignore LICENSE.md
.secret README.md
abstract_base_class.py scratch.ipynb
alternative_classes.py setup.py
API.md tests
CONTRIBUTING.mdCreating a repo from scratch¶
We will start in the top level course directory.
cd ..
lsfall25-kwl-brownsarahm
gh-inclass-fa25-brownsarahm
tiny-bookYours should also have your kwl repo, gh inclass repo, course website clone, etc.
We can confirm that we are not in a git repository already:
git statusfatal: not a git repository (or any of the parent directories): .gitWe can create an empty repo from scratch using git init <path>
Last time we used an existing directory like git init . because we were working in the directory that already existed
Today we will create a new directory called test and initialize it as a repo at the same time:
git init testhint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/test/.git/First we go into that directory:
cd test/I will change mine manually so that I keep getting the warning to teach about it.
git branch -m mainTo confirm we will look at the status
git statusOn branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)Notice that there are no commits, and no origin.
Examining an empty repo¶
ls .gitconfig hooks refs
description info
HEAD objectswe can see the basic requirements of an empty repo here.
We can also see the basic configurations:
cat .git/config[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = trueSearching the file system¶
We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.
find .git/objects/.git/objects/
.git/objects/pack
.git/objects/infowe have a few items in that directory and the directory itself.
We can limit by type, to only files with the -type option set to f
find .git/objects/ -type fAnd we have no results. We have no objects yet. Because this is an empty repo
Git Objects¶
Remember our 3 types of objects
blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots
How to create an object¶
All git objects are files stored with the name that is the hash of the content in the file
Remember git is a content-addressable file systsem... so it uses key- value pairs.
Let’s create our first git object. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our .git directory with this.
The git hash-object command works on files, but we do not have any files yet. We can create a file, but we do not have to. Remememer, everything is a file.
When we use commands like echo it writes to the stdout file.
echo "test content"test contentwhich shows on our terminal. We can us a pipe to connect the stdout of on command to the stdin of the next.
pipes (|) are an important content too. we’re seeing them in context of real uses, and we will keep seing them. Pipes connect the std out of one command t othe std in of the next.
We can put these together:
echo "test content" | git hash-object --stdinWe can break down this command:
git hash-objectwould take the content you handed to it and return the unique key--stdinoption tellsgit hash-objectto get the content to be processed from stdin instead of a filethe
|is called a pipe (what we saw before was a redirect) it pipes a process output into the next commandechowould write to stdout, with the pipe it passes that to std in of thegit hash-object
we get back the hash:
d670460b4b4aece5915caf5c68d12f560a9fe3e4Now let’s run it again with a slight modification. -w option tells the command to also write that object to the database
echo "test content" | git hash-object -w --stdind670460b4b4aece5915caf5c68d12f560a9fe3e4We all got the same hash because we all put the same content into the hashing algorithm (test content)
and we can check if it wrote to the directory.
find .git/objects/ -type f.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4and we see a file that it was supposed to have!
Viewing git objects¶
We can try with cat
cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4xK??OR04f(I-.QH??+I?+?K?This is binary output that we cannot understand.
Fortunately, git provides a utility. We can use cat-file to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (7046 will not work, but d670 will)
cat-file requires an option -p is for pretty print
git cat-file -p d670test contentwhere we see the content we put in to the hashing function
or -t for type:
git cat-file -t d670blobwe see that it is a blob.
lsWe have a git object but no files at this point
Hashing a file¶
let’s create a file, using a redirect
echo "version 1" > test.txtand confirm it exists
lstest.txtand its content
cat test.txtversion 1and store it, by hashing it
git hash-object -w test.txt83baae61804e65cc73a7201a7252750c76066a30we can look at what we have.
find .git/objects/ -type f.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30we see two objects as expected
Now this is the status of our repo.
We can check the type of files with -t and git cat-file
git cat-file -t 83bablobit is a blob object as expected
git cat-file -p 83baversion 1Notice, however, that we only have one file in the working directory.
git statusOn branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
test.txt
nothing added to commit but untracked files present (use "git add" to track)it is the one test.txt, the first blob we made had no file in the working directory associated to it.
So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in the tree and there are no commits that point to that part of the tree.
the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in closer relationship that we currently have unless we use plumbling commands, but it is good to build a solid understanding of how the .git directory relates to your working directory.
Updating the Index¶
Now, we can add our file as it is to the index.
the index is also called the staging area
git update-index --add --cacheinfo 100644 \
83baae61804e65cc73a7201a7252750c76066a30 test.txtthe
\lets us wrap onto a second line.this the plumbing command
git update-indexupdates (or in this case creates an index, the staging area of our repository)the
--addoption is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)--cacheinfobecause the file we’re adding isn’t in your directory but is in the database.in this case, we’re specifying a mode of 100644, which means it’s a normal file.
then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.
finally the file name of that content
this has no output
Again, we check in with status
git statusOn branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txtWe have the files staged as expected
Now the file is staged.
Let’s edit it further.
echo "version 2" >> test.txtwe can check the working directory again
lstest.txtWe can look at the content to ensure it as expected
cat test.txtversion 1
version 2So the file has two lines
Now check status again.
git statusOn branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txtWe added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.
We can hash and store this version too.
git hash-object -w test.txt0c1e7391ca4e59584f8b773ecdbbb9467eba1547We can then look again at our list of objects.
find .git/objects/ -type f.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30So now our repo has 3 items, all blobs
The index is not represented by any git objects, it is a temporary space
and again, check with git:
git statusOn branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txthashing the object does not impact the index, which is what git status uses
Preparing to Commit¶
When we work with porcelain commands, we use add then commit. We have staged the file, which we know is what happens when we add. What else has to happen to make a commit?
We know that commits are comprised of:
a message
author and times stamp info
a pointer to a tree
a pointer to the parent (except the first commit)
We do not have any of these items yet, but we know that the message, author and time stamp come at the last bit when we make the commit.
Let’s make a tree next.
Now we can write a tree from the index,
git write-treed8329fc1cc938780ffdd9f94e0d364e0ea74f579and we get a hash
Lets examine the tree, first check the type
git cat-file -t d832treeit is as expected
and now we can look at its contents
git cat-file -p d832100644 blob 83baae61804e65cc73a7201a7252750c76066a30 test.txtit looks a lot like what we put in the index
Now this is the status of our repo:
We have 3 blobs, and a tree that points to one of them
Again, we will check in with git via git status
git statusOn branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txtNothing has changed, making the tree does not yet make the commit
Creating a commit manually¶
We can echo a commit message through a pipe into the commit-tree plumbing function to commit a particular hashed object.
the git commit-tree command requires a message via stdin and the tree hash. We will use stdin and a pipe for the message
echo "first commit" | git commit-tree d832da01135f2fb02e9b53c200708863df82f0daa50cand we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user.
The above hash is the one I got during class, but when I re-ran this while typing the notes last fall I got a different hash (d450567fec96cbd8dd514313db9bcb96ad7664b0) even though I have the same name and e-mail because the time changed.
We can also look at its type
git cat-file -t da01commitand we can look at the content
git cat-file -p da01tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1759857348 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1759857348 -0400
first commitNow we check the final list of objects that we have for today
find .git/objects/ -type f.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/da/01135f2fb02e9b53c200708863df82f0daa50c
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30Check that you also have 5 objects and 4 of them should match mine, the one you should not have is da01135 but you should have a different one.
Visually, this is what our repo looks like:
What does git status do?¶
compares the working directory to the current state of the active branch
we cansee the working directory with:
lswe can see the active branch in the
HEADfilewhat is its status?
we have seen it is “on main” this is because we set the branch to main, but since we have not written there, we have to do it directly.
Notice that when we use the porcelain command for commit, it does this automatically; the porcelain commands do many things.
git statusOn branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txtNotice, git says we have no commits yet even though we have written a commit.
In our case because we made the commit manually, we did not update the branch.
This is because the main branch does not point to any commit.
We can verify by looking at the HEAD file
cat .git/HEADref: refs/heads/mainas expected
then we look at that file
cat .git/refs/heads/maincat: .git/refs/heads/main: No such file or directorywhich does not even exist!
we can see the objects though:
find .git/objects/ -type f.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/da/01135f2fb02e9b53c200708863df82f0daa50c
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30Git References¶
We can make that file manually
echo 0c1e7391ca4e59584f8b773ecdbbb9467eba1547 > .git/refs/heads/mainhere you need to use the full hash because we are writing a file directly that git will read, not using a git command.
Also you need to use the hash of your commit
then we check with git
git statuserror: bad tree object HEADI made a typo, so it failed
so I corrected it:
echo da01135f2fb02e9b53c200708863df82f0daa50c > .git/refs/heads/mainSo we now have HEAD-> main and main -> our commit -> tree --> blob.
and git status works:
git statusOn branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
no changes added to commit (use "git add" and/or "git commit -a")It still has the file with extra chagnes because we commited the first line only, not the second one.
cat test.txtversion 1
version 2git diffdiff --git a/test.txt b/test.txt
index 83baae6..0c1e739 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
version 1
+version 2The git objects folder¶
ls .git/objects/0c 83 d6 d8 da info packThere was a question about why they are organized into folders.
it looks like it is about speed of accessing partially, but more importantly because folders can have a limit on the number of files in them
git logcommit da01135f2fb02e9b53c200708863df82f0daa50c (HEAD -> main)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date: Tue Oct 7 13:15:48 2025 -0400
first commitgit cat-file -p da01tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1759857348 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1759857348 -0400
first commitgit cat-file -p d832100644 blob 83baae61804e65cc73a7201a7252750c76066a30 test.txtgit cat-file -p 83baaversion 1Prepare for Next Class¶
Read the notes from October 7. We will build on these directly in the future. You need to have the
testrepo with the same status for lab on class on 3/5 Make sure you have completed all of the steps in the github inclass repo.Add to your IDE idethoughts.md file if you have not in a few days on a dedicated
ide_prepbranch. This is prep for after a few weeks from now, not for October 9; keep this branch open until it is specifically asked for
Badges¶
Review the notes from today
Make a table in gitplumbingreview.md in your KWL repo that relates the two types of git commands we have seen: plumbing and porcelain. The table should have two columns, one for each type of command (plubming and porcelain). Each row should have one git porcelain command and at least one of the corresponding git plumbing command(s). Include two rows:
addandcommit.
Review the notes from today
Read more details about git internals to review what we did in class in greater detail. Make a file gitplumbingdetail.md and create a a table or mermaid diagram that shows the relationship between at least three porcelain commands and their corresponding plumbing commands (generally more than one each).
Create gitislike.md and explain main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby.
Experience Report Evidence¶
Experience Report Evidence¶
You need to have a test repo that matches this for class on tuesday.
Generate your evidence with the following in your test repo
find .git/objects/ -type f > testobj.mdthen append the contents of your commit object to that file.
Move the testobj.md to your kwl repo in the experiences folder.
Questions After Today’s Class¶
What exactly happens inside .git/index—how is it structured to store file paths and hashes?¶
I have not been able to find a clear explanation of how the index stores the information, only how to put things in and get them out (into a tree).
This is a good explore badge topic.
How does git know which tree or commit the HEAD is pointing to when multiple branches exist?¶
It reads the HEAD file. git checkout changes the HEAD file.
How do git write-tree and git commit-tree relate to git add` and git commit under the hood?¶
This is partially your badge for today, so great question!
the book section on the topic has a lot of information
How will this content we learned be applied in future lessons in this class?¶
Next we will learn more advanced git porcelain commands and having a correct understanding of the internals helps make those make more sense.
Is there a ever a situation in which creating commits this way is more beneficial?¶
Probably not, it is an exercise in understanding what happens when you use the other commands so that you can understand if something goes wrong
Can you have a bunch of trees linked to each ther that all hold their own blobs?¶
Yes.
Thinking through how to create this (using porcelain or plumbing commands) is a good idea to practice with the concepts from today.
Is there a reason why the author and committer be different than what my email is?¶
You can set them, if you are using a computer where you have not used git config they can be different
How can we display the contents of the index file in a human-readable way?¶
I have not found one, this is a good explore badge topic!
We may have covered this already, but what are the folders with hexadecimal names in .git/objects for?¶
They nake git more efficient, it is faster, and more flexible, some file systems can have limits on how many files can be in a folder. This drastically increases how many git objects can be stored on a computer with those limits.
if git stores commits as snapshots does that mean it saves several dupliate files for each commit?¶
Not quite, because since it hashes the file’s content and then writes the content in a file names with the hash, if the file has not changed it overwrites the last snapshot’s copy, with the same content.
This is faster than even comparing because comparing two files is a complex operation, more than the hashing algorithm.
what are the main git objects that make up a commit?¶
A commit is a git object but it has several parts
Is there ever a case where using the plumbing commands is necassary over the porcelain commands?¶
Not in normal use, but they need to be there for git to work, and we can use them to inspect and learn.