12. How does git make a commit?#
Today we will dig into how git really works. This will be a deep dive and provide a lot of details about how git creates a commit. It will reinforce important concepts, which is of practical use when fixing things give you some ideas about how you might fix things when things go wrong.
Later, we will build on this more on the practical side, but these concepts are very important for making sense of the more practical aspects of fixing things in git.
This deep dive in git is to help you build a correct, flexbile understanding of git so that you can use it independently and efficiently. The plumbing commands do not need to be a part of your daily use of git, but they are the way that we can dig in and see what actually happens when git creates a commit.
this is also to serve as an example method you could apply in understanding another complex system
Inspecting a system’s components is a really good way to understand it and correctly understanding it will impact your ability to ask good questions and even look up the right thing to do when you need to fix things.
Also, looking at the parts of git is a good way to reinforce specific design patterns that are common in CS in a practical way. This means that today we will also:
review and practice with the bash commands we have seen so far
see a practical example of hashing
reinforce through examples what a pointer does
Navigate to your folder for class
ls
fall24-brownsarahm tiny-book
gh-inclass-brownsarahm
it should have your other repos
and, not be a git repo, we can confirm with:
git status
fatal: not a git repository (or any of the parent directories): .git
In contrast, if we are in a repo
cd fall24-brownsarahm/
git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
the command works
If yours worked, go up one level to be out of a repo:
cd ../
git status
fatal: not a git repository (or any of the parent directories): .git
Important
Today there were a lot of prismia comprehension check questions.
If you struggled with them, you should be sure to check that you now understand the correct answer
12.1. Creating a repo from scratch#
We can create an empty repo from scratch using git init <path>
Last time we used an existing directory like git init .
because we were working in the directory that already existed
Today we will create a new directory called test
and initialize it as a repo at the same time:
git init test
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/test/.git/
I get this message again, see context from previous class
We can see what it did by first looking at the working directory
ls
fall24-brownsarahm test
gh-inclass-brownsarahm tiny-book
it made a new folder named as we said
and we can go into that directory
cd test/
it is empty as expected:
ls
and then rename the branch
git branch -m main
To clarify we will look at the status
git status
Notice that there are no commits, and no origin.
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)
there are no remotes at all
git remote
we do have the .git hidden directory though
ls -a
. .. .git
ls .git/
HEAD description info refs
config hooks objects
Notice that there are no commits, and no origin.
ls ../gh-inclass-brownsarahm/.git
COMMIT_EDITMSG REBASE_HEAD index packed-refs
FETCH_HEAD config info refs
HEAD description logs
ORIG_HEAD hooks objects
in my inclass repo I have other files because as we do various git operations, we create other files for example the
COMMIT_EDITMSG
is the file that opens in vim when we forget the -m
option
12.2. Searching the file system#
We can use the bash command find
to search the file system note that this does not search the contents of the files, just the names.
find .git/objects/
.git/objects/
.git/objects//pack
.git/objects//info
we have a few items in that directory and the directory itself.
We can limit by type, to only files with the -type
option set to f
find .git/objects/ -type f
And we have no results. We have no objects yet. Because this is an empty repo
12.3. Git Objects#
Remember our 3 types of objects
blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots
12.3.1. How to create an object#
All git objects are files stored with the name that is the hash of the content in the file
Remember git is a content-addressable file systsem… so it uses key- value pairs.
Let’s create our first git object. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our .git directory with this.
The git hash-object
command works on files, but we do not have any files yet. We can create a file, but we do not have to. Remememer, everything is a file.
When we use things like echo
it writes to the stdout file.
echo "test content"
test content
which shows on our terminal. We can us a pipe to connect the stdout of on command to the stdin of the next.
echo "test content" | git hash-object --stdin
We can break down this command:
git hash-object would take the content you handed to it and merely return the unique key
--stdin
option tells git hash-object to get the content to be processed from stdin instead of a filethe
|
is called a pipe (what we saw before was a redirect) it pipes a process output into the next commandecho
would write to stdout, withthe pip it passes that to std in of thegit-hash
we get back the hash:
d670460b4b4aece5915caf5c68d12f560a9fe3e4
Warning
if you have an odd number of quotes "
or '
or more open brackets ([{
than corresponding close brackets }])
bash will wait fo you to finish. Whenever your regular prompt is replaced by >
it is waiting for you to finish the current command.
and then it runs once we close with "
kjsjslf
sdf
jjsdkf
Now let’s run it again with a slight modification. -w
option tells the command to also write that object to the database
echo "test content" | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4
and we can check if it wrote to the repository.
find .git/objects/ -type f
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
and we see a file that it was supposed to have!
We do not have any files or commits though still.
git status
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)
ls
12.3.2. Viewing git objects#
We can try with cat
cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
xK??OR04f(I-.QH??+I?+?K?
This is binary output that we cannot understand. Fortunately, git provides a utility. We can use cat-file
to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (7046
will not work, but d670
will)
cat-file
requires an option -p
is for pretty print
git cat-file -p d670
test content
where we see the content we put in to the hashing function
and -t
is for type
git cat-file -t d670
blob
Hint
You can always use the --help
on any git
command to
learn more about its options
These options cannot be stacked
git cat-file -pt d670
error: options '-t' and '-p' cannot be used together
the developers did not implement it to work, but it also does not really make sense to output both of them, because they each provide a different type of content. Using both asks for 2 different types of content which would then not be usable in a basic pipe so it would be bad design in terms of the unix philosophy to do that.
12.3.3. Hashing a file#
let’s create a file
echo "version 1" > test.txt
first we will check in with git:
git status
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
test.txt
nothing added to commit but untracked files present (use "git add" to track)
it can see the file, but it is untracked.
and now, we store it, by hashing it
git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30
this returns the hash, this is still a hash that is the same for all of us.
and again with git status
git status
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
test.txt
nothing added to commit but untracked files present (use "git add" to track)
still untracked
but the file does exist in the git objects:
find .git/objects/ -type f
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
we can check the type of this new object:
git cat-file -t 83baa
blob
also a blob
git cat-file -p 83baa
version 1
Now this is the status of our repo.
Notice, however, that we only have one file in the working directory.
ls
test.txt
it is the one test.txt, the first blob we made had no file in the working directory associated to it.
the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in closer relationship that we currently have unless we use plumbling commands, but it is good to build a solid understanding of how the .git
directory relates to your working directory.
So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in the tree and there are no commits that point to that part of the tree.
12.4. Updating the Index#
Now, we can add our file as it is to the index.
git update-index --add --cacheinfo 100644 \
> 83baae61804e65cc73a7201a7252750c76066a30
error: option 'cacheinfo' expects <mode>,<sha1>,<path>
the
\
lets us wrap onto a second line, the>
above is the new prompt for the second linethis the plumbing command
git update-index
updates (or in this case creates an index, the staging area of our repository)the
--add
option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)--cacheinfo
because the file we’re adding isn’t in your directory but is in the database.in this case, we’re specifying a mode of 100644, which means it’s a normal file.
then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.
finally the file name of that content
I forgot the file name above, so I used my up arrow key to get the command back:
git update-index --add --cacheinfo 100644 83baae61804e65cc73a7201a7252750c76066a30 test.txt
note that the \
I originally typed and the >
from the prompt are not there.
this command had no output, so let check git status.
Again, we check in with status
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
We have the files staged as expected
Now the file is staged.
Let’s edit it further.
echo "version 2" >> test.txt
We can look at the content to ensure it as expected
cat test.txt
version 1
version 2
So the file has two lines
Now check status again.
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.
we can see we do not yet have a new object for the new version of the file
find .git/objects/ -type f
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
We can hash and store this version too.
git hash-object -w test.txt
0c1e7391ca4e59584f8b773ecdbbb9467eba1547
We can then look again at our list of objects.
find .git/objects/ -type f
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
our new object is there with the first two
So now our repo has 3 items, all blobs
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
hashing the object does not impact the index, which is what git status uses
12.4.1. Preparing to Commit#
When we work with porcelain commands, we use add then commit. We have staged the file, which we know is what happens when we add. What else has to happen to make a commit.
We know that commits are comprised of:
a message
author and times stamp info
a pointer to a tree
a pointer to the parent (except the first commit)
We do not have any of these items yet.
Let’s make a tree next.
Now we can write a tree from the index,
git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
and we get a hash
Lets examine the tree, first check the type
git cat-file -t d832
tree
it is as expected
and now we can look at its contents
git cat-file -p d832
100644 blob 83baae61804e65cc73a7201a7252750c76066a30 test.txt
Now this is the status of our repo:
two blobs that are unlinked to anything and one blob that is included in a tree
Again, we will check in with git via git status
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
Nothing has changed, making the tree does not yet make the commit
This only keeps track of the objects, there are also still the HEAD that we have not dealt with and the index.
12.4.2. Creating a commit manually#
We can echo a commit message through a pipe into the commit-tree plumbing function to commit a particular hashed object.
the git commit-tree
command requires a message via stdin and the tree hash. We will use stdin and a pipe for the message
echo "first commit" | git commit-tree d832
097898b3f2a5a10f2adb96931da78666ff858002
and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user.
we can look at this object too:
git cat-file -p 0978
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1729186238 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1729186238 -0400
first commit
The above hash is the one I got during class, but I did this previously I had a different hash (d450567fec96cbd8dd514313db9bcb96ad7664b0
) even though I have the same name and e-mail because the time changed.
We can also look at its type
git cat-file -t 0978
commit
Now we check the final list of objects that we have for today
find .git/objects/ -type f
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects//09/7898b3f2a5a10f2adb96931da78666ff858002
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
You should also have 5 objects like me, 3 blobs, one tree, one commit. Four of your objects (blobs & tree) will have the same has, but your commit will have a different hash. I highlghted my commit, that is the one that you should not have.
Visually, this is what our repo looks like:
12.5. What does git status do?#
compares the working directory to the current state of the active branch
we cansee the working directory with:
ls
we can see the active branch in the
HEAD
filewhat is its status?
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
we see it is “on main” this is because we set the branch to main , but since we have not written there, we have to do it directly. Notice that when we use the porcelain command for commit, it does this automatically; the porcelain commands do many things.
Notice, git says we have no commits yet even though we have written a commit.
In our case because we made the commit manually, we did not update the branch.
This is because the main branch does not point to any commit.
We can check that a commit type object exists
git cat-file -t 0978
commit
We can verify by looking at the HEAD
file
cat .git/HEAD
ref: refs/heads/main
and then viewing that file
cat .git/refs/heads/main
cat: .git/refs/heads/main: No such file or directory
which does not even exist!
we can also look at the folder where branch pointer files typicall live
ls .git/refs/heads/
nothing exists there yet!
12.6. Git References#
We can make that file manually
echo 097898b3f2a5a10f2adb96931da78666ff858002> .git/refs/heads/main
Important
This file needs to have the full hash, not only a few digits.
and back to git status
git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
no changes added to commit (use "git add" and/or "git commit -a")
now it is as we expect!
12.7. Prepare for Next Class#
Review the GitHub Action files in your KWL repo and make note of what if any syntax in there is unfamilar. (note that link will not work on the rendered website, but will work on badge issues)
Use quote reply or edit to see how I made a relative path to a location within the repo in this issue. (to see another application of paths)
Check out the github action marketplace to see other actions that are available and try to get a casual level of understanding of the types of things that people use actions for.
12.8. Badges#
Make a table in gitplumbingreview.md in your KWL repo that relates the two types of git commands we have seen: plumbing and porcelain. The table should have two columns, one for each type of command (plubming and porcelain). Each row should have one git porcelain command and at least one of the corresponding git plumbing command(s). Include two rows:
add
andcommit
.
Read more details about git internals to review what we did in class in greater detail. Make a file gitplumbingdetail.md and create a a table or mermaid diagram that shows the relationship between at least three porcelain commands and their corresponding plumbing commands (generally more than one each).
Create gitislike.md and explain main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby.
12.9. Experience Report Evidence#
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects//09/7898b3f2a5a10f2adb96931da78666ff858002
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
Important
You need to have a test repo that matches this for Lab next week
Generate your evidence with the following in your test repo
find .git/objects/ -type f > testobj.md
then append the contents of your commit object to that file.
Move the testobj.md
to your kwl repo in the experiences folder.
12.10. Questions After Today’s Class#
12.10.1. How can certain parts of commits be accessed and pulled for use?#
We saw that we can look at the output of the git cat-file
command.
git cat-file -p 0978
This tells us the tree (which would tell us the blobs) and any future commits would have a parent too. Let’s focus on the tree and try to extract it.
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1729186238 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1729186238 -0400
first commit
We can pipe this output into another command to take a subset of it.
This could be a custom written tool (see python click library for example) or you can use sed
with a regex like expression. This is a really detailed task that is a good use of a generative AI tool. I knew what I wanted to do, but could not remember if it was sed
or awk
that was the right tool and the regex syntax is not something I use enough to retain. So I asked an LLM “sed or awk to extract the parent of a commit from the git cat-file output” and it gave me:
git cat-file -p 0978 | sed -n 's/^tree \(.*\)/\1/p'
which I ran to confirm that it worked
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
12.10.2. how to remove objects after running git commit -tree?#
they are files you can delete them with rm
12.10.3. Different between tree and blob?#
blobs contain the content of the file, a tree contains a listing of the files in a snapshot (or single folder within a snapshot) that associates the blob objects to their file names.
12.10.4. Because of the nature of how git works off of files, if I were to change the ref head for a branch to point to main’s current pointer- would it then contain the same contents?#
Yes the branch is just a pointer to a particular commit, as PR compares the branches by showing the changes to the files between the snapshots at those two commits.