How does git make a commit? - CSC311 - Computer Systems and Programming Tools

Today we will dig into how git really works. This will be a deep dive and provide a lot of details about how git creates a commit. It will reinforce important concepts, which are of practical when using git.

You could use the basic commands without understanding how it works, or with even an incorrect understanding. However, to really get the most important advantages out of git and use it fluently, know how it works in great detail is a huge advantage.

Today is also an exercise in careful examination of a system. You will do this a lot in your career, whenever you are debugging something, being able to systematically break it down and examine all of the parts is the central skill.

Later, we will build on this more on the practical side, but these concepts are very important for making sense of the more practical aspects of fixing things in git.

This deep dive in git is to help you build a correct, flexbile understanding of git so that you can use it independently and efficiently. The plumbing commands do not need to be a part of your daily use of git, but they are the way that we can dig in and see what actually happens when git creates a commit.

Inspecting a system’s components is a really good way to understand it and correctly understanding it will impact your ability to ask good questions and even look up the right thing to do when you need to fix things.

Outcomes for today¶

Also, looking at the parts of git is a good way to reinforce specific design patterns that are common in CS in a practical way. This means that today we will also:

review and practice with the bash commands we have seen so far
see a practical example of hashing
reinforce through examples what a pointer does

Review¶

Recall: git stores important content in files that it uses like variables.

navigate to your github inclass repo

cd gh-inclass-fa25-brownsarahm/

Some files in git work like pointers:

cat .git/HEAD

ref: refs/heads/organization

holds the current branch

ls .git

COMMIT_EDITMSG	hooks		ORIG_HEAD
config		index		packed-refs
description	info		REBASE_HEAD
FETCH_HEAD	logs		refs
HEAD		objects

.gitignore is a file in the working direcotry that contains alist of files and patterns to not track.

We can see it is a hidden file in the working directory with ls -a

ls -a

.			docs
..			example.md
.git			helper_functions.py
.github			important_classes.py
.gitignore		LICENSE.md
.secret			README.md
abstract_base_class.py	scratch.ipynb
alternative_classes.py	setup.py
API.md			tests
CONTRIBUTING.md

Creating a repo from scratch¶

We will start in the top level course directory.

cd ..
ls

fall25-kwl-brownsarahm
gh-inclass-fa25-brownsarahm
tiny-book

Yours should also have your kwl repo, gh inclass repo, course website clone, etc.

We can confirm that we are not in a git repository already:

git status

fatal: not a git repository (or any of the parent directories): .git

We can create an empty repo from scratch using git init <path>

Last time we used an existing directory like git init . because we were working in the directory that already existed

Today we will create a new directory called test and initialize it as a repo at the same time:

git init test

hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: 	git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: 	git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/test/.git/

First we go into that directory:

cd test/

I will change mine manually so that I keep getting the warning to teach about it.

git branch -m main

To confirm we will look at the status

git status

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Notice that there are no commits, and no origin.

Examining an empty repo¶

ls .git

config		hooks		refs
description	info
HEAD		objects

we can see the basic requirements of an empty repo here.

We can also see the basic configurations:

cat .git/config

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true

Searching the file system¶

We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.

find .git/objects/

.git/objects/
.git/objects/pack
.git/objects/info

we have a few items in that directory and the directory itself.

We can limit by type, to only files with the -type option set to f

find .git/objects/ -type f

And we have no results. We have no objects yet. Because this is an empty repo

Git Objects¶

Remember our 3 types of objects

blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots

How to create an object¶

All git objects are files stored with the name that is the hash of the content in the file

Remember git is a content-addressable file systsem... so it uses key- value pairs.

Let’s create our first git object. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our .git directory with this.

The git hash-object command works on files, but we do not have any files yet. We can create a file, but we do not have to. Remememer, everything is a file.

When we use commands like echo it writes to the stdout file.

echo "test content"

test content

which shows on our terminal. We can us a pipe to connect the stdout of on command to the stdin of the next.

pipes (|) are an important content too. we’re seeing them in context of real uses, and we will keep seing them. Pipes connect the std out of one command t othe std in of the next.

We can put these together:

echo "test content" | git hash-object --stdin

We can break down this command:

git hash-object would take the content you handed to it and return the unique key
--stdin option tells git hash-object to get the content to be processed from stdin instead of a file
the | is called a pipe (what we saw before was a redirect) it pipes a process output into the next command
echo would write to stdout, with the pipe it passes that to std in of the git hash-object

we get back the hash:

d670460b4b4aece5915caf5c68d12f560a9fe3e4

Now let’s run it again with a slight modification. -w option tells the command to also write that object to the database

echo "test content" | git hash-object -w --stdin

d670460b4b4aece5915caf5c68d12f560a9fe3e4

and we can check if it wrote to the directory.

find .git/objects/ -type f

.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

and we see a file that it was supposed to have!

Viewing git objects¶

We can try with cat

cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

xK??OR04f(I-.QH??+I?+?K?

This is binary output that we cannot understand.

Fortunately, git provides a utility. We can use cat-file to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (7046 will not work, but d670 will)

cat-file requires an option -p is for pretty print

git cat-file -p d670

test content

where we see the content we put in to the hashing function

or -t for type:

git cat-file -t d670

blob

we see that it is a blob.

ls

Hashing a file¶

let’s create a file, using a redirect

echo "version 1" > test.txt

and confirm it exists

ls

test.txt

and its content

cat test.txt

version 1

and store it, by hashing it

git hash-object -w test.txt

83baae61804e65cc73a7201a7252750c76066a30

we can look at what we have.

find .git/objects/ -type f

.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

we see two objects as expected

Now this is the status of our repo.

We can check the type of files with -t and git cat-file

git cat-file -t 83ba

blob

it is a blob object as expected

git cat-file -p 83ba

version 1

Notice, however, that we only have one file in the working directory.

git status

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	test.txt

nothing added to commit but untracked files present (use "git add" to track)

it is the one test.txt, the first blob we made had no file in the working directory associated to it.

So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in the tree and there are no commits that point to that part of the tree.

Updating the Index¶

Now, we can add our file as it is to the index.

the index is also called the staging area

git update-index --add --cacheinfo 100644 \
 83baae61804e65cc73a7201a7252750c76066a30 test.txt

the \ lets us wrap onto a second line.
this the plumbing command git update-index updates (or in this case creates an index, the staging area of our repository)
the --add option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)
--cacheinfo because the file we’re adding isn’t in your directory but is in the database.
in this case, we’re specifying a mode of 100644, which means it’s a normal file.
then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.
finally the file name of that content

this has no output

Again, we check in with status

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

We have the files staged as expected

Now the file is staged.

Let’s edit it further.

echo "version 2" >> test.txt

we can check the working directory again

ls

test.txt

We can look at the content to ensure it as expected

cat test.txt

version 1
version 2

So the file has two lines

Now check status again.

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.

We can hash and store this version too.

git hash-object -w test.txt

0c1e7391ca4e59584f8b773ecdbbb9467eba1547

We can then look again at our list of objects.

find .git/objects/ -type f

.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

So now our repo has 3 items, all blobs

and again, check with git:

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

hashing the object does not impact the index, which is what git status uses

Preparing to Commit¶

When we work with porcelain commands, we use add then commit. We have staged the file, which we know is what happens when we add. What else has to happen to make a commit?

We know that commits are comprised of:

a message
author and times stamp info
a pointer to a tree
a pointer to the parent (except the first commit)

We do not have any of these items yet, but we know that the message, author and time stamp come at the last bit when we make the commit.

Let’s make a tree next.

Now we can write a tree from the index,

git write-tree

d8329fc1cc938780ffdd9f94e0d364e0ea74f579

and we get a hash

Lets examine the tree, first check the type

git cat-file -t d832

tree

it is as expected

and now we can look at its contents

git cat-file -p d832

100644 blob 83baae61804e65cc73a7201a7252750c76066a30	test.txt

it looks a lot like what we put in the index

Now this is the status of our repo:

We have 3 blobs, and a tree that points to one of them

Again, we will check in with git via git status

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

Nothing has changed, making the tree does not yet make the commit

Creating a commit manually¶

We can echo a commit message through a pipe into the commit-tree plumbing function to commit a particular hashed object.

the git commit-tree command requires a message via stdin and the tree hash. We will use stdin and a pipe for the message

echo "first commit" | git commit-tree d832

da01135f2fb02e9b53c200708863df82f0daa50c

and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user.

The above hash is the one I got during class, but when I re-ran this while typing the notes last fall I got a different hash (d450567fec96cbd8dd514313db9bcb96ad7664b0) even though I have the same name and e-mail because the time changed.

We can also look at its type

git cat-file -t da01

commit

and we can look at the content

git cat-file -p da01

tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1759857348 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1759857348 -0400

first commit

Now we check the final list of objects that we have for today

find .git/objects/ -type f

.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/da/01135f2fb02e9b53c200708863df82f0daa50c
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

Visually, this is what our repo looks like:

What does git status do?¶

compares the working directory to the current state of the active branch

we cansee the working directory with: ls
we can see the active branch in the HEAD file
what is its status?

we have seen it is “on main” this is because we set the branch to main, but since we have not written there, we have to do it directly.

Notice that when we use the porcelain command for commit, it does this automatically; the porcelain commands do many things.

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

Notice, git says we have no commits yet even though we have written a commit.

In our case because we made the commit manually, we did not update the branch.

This is because the main branch does not point to any commit.

We can verify by looking at the HEAD file

cat .git/HEAD

ref: refs/heads/main

as expected

then we look at that file

cat .git/refs/heads/main

cat: .git/refs/heads/main: No such file or directory

which does not even exist!

we can see the objects though:

find .git/objects/ -type f

.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/da/01135f2fb02e9b53c200708863df82f0daa50c
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

Git References¶

We can make that file manually

echo 0c1e7391ca4e59584f8b773ecdbbb9467eba1547 > .git/refs/heads/main

then we check with git

git status

error: bad tree object HEAD

I made a typo, so it failed

so I corrected it:

echo da01135f2fb02e9b53c200708863df82f0daa50c > .git/refs/heads/main

So we now have HEAD-> main and main -> our commit -> tree --> blob.

and git status works:

git status

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

no changes added to commit (use "git add" and/or "git commit -a")

It still has the file with extra chagnes because we commited the first line only, not the second one.

cat test.txt

version 1
version 2

git diff

diff --git a/test.txt b/test.txt
index 83baae6..0c1e739 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
 version 1
+version 2

The git objects folder¶

ls .git/objects/

0c	83	d6	d8	da	info	pack

There was a question about why they are organized into folders.

it looks like it is about speed of accessing partially, but more importantly because folders can have a limit on the number of files in them

git log

commit da01135f2fb02e9b53c200708863df82f0daa50c (HEAD -> main)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Oct 7 13:15:48 2025 -0400

    first commit

git cat-file -p da01

tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1759857348 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1759857348 -0400

first commit

git cat-file -p d832

100644 blob 83baae61804e65cc73a7201a7252750c76066a30	test.txt

git cat-file -p 83baa

version 1

Prepare for Next Class¶

Read the notes from October 7. We will build on these directly in the future. You need to have the test repo with the same status for lab on class on 3/5 Make sure you have completed all of the steps in the github inclass repo.
Add to your IDE idethoughts.md file if you have not in a few days on a dedicated ide_prep branch. This is prep for after a few weeks from now, not for October 9; keep this branch open until it is specifically asked for

Badges¶

Review

Practice

Review the notes from today
Make a table in gitplumbingreview.md in your KWL repo that relates the two types of git commands we have seen: plumbing and porcelain. The table should have two columns, one for each type of command (plubming and porcelain). Each row should have one git porcelain command and at least one of the corresponding git plumbing command(s). Include two rows: add and commit.

Experience Report Evidence¶

Generate your evidence with the following in your test repo

find .git/objects/ -type f > testobj.md

then append the contents of your commit object to that file.

Move the testobj.md to your kwl repo in the experiences folder.

Questions After Today’s Class¶

What exactly happens inside .git/index—how is it structured to store file paths and hashes?¶

I have not been able to find a clear explanation of how the index stores the information, only how to put things in and get them out (into a tree).

This is a good explore badge topic.

How does git know which tree or commit the HEAD is pointing to when multiple branches exist?¶

It reads the HEAD file. git checkout changes the HEAD file.

How do git write-tree and git commit-tree relate to git add` and git commit under the hood?¶

This is partially your badge for today, so great question!

the book section on the topic has a lot of information

How will this content we learned be applied in future lessons in this class?¶

Next we will learn more advanced git porcelain commands and having a correct understanding of the internals helps make those make more sense.

Is there a ever a situation in which creating commits this way is more beneficial?¶

Probably not, it is an exercise in understanding what happens when you use the other commands so that you can understand if something goes wrong

Can you have a bunch of trees linked to each ther that all hold their own blobs?¶

Yes.

Thinking through how to create this (using porcelain or plumbing commands) is a good idea to practice with the concepts from today.

Is there a reason why the author and committer be different than what my email is?¶

You can set them, if you are using a computer where you have not used git config they can be different

How can we display the contents of the index file in a human-readable way?¶

I have not found one, this is a good explore badge topic!

We may have covered this already, but what are the folders with hexadecimal names in .git/objects for?¶

They nake git more efficient, it is faster, and more flexible, some file systems can have limits on how many files can be in a folder. This drastically increases how many git objects can be stored on a computer with those limits.

if git stores commits as snapshots does that mean it saves several dupliate files for each commit?¶

Not quite, because since it hashes the file’s content and then writes the content in a file names with the hash, if the file has not changed it overwrites the last snapshot’s copy, with the same content.

This is faster than even comparing because comparing two files is a complex operation, more than the hashing algorithm.

what are the main git objects that make up a commit?¶

A commit is a git object but it has several parts

Is there ever a case where using the plumbing commands is necassary over the porcelain commands?¶

Not in normal use, but they need to be there for git to work, and we can use them to inspect and learn.