How does git make a commit?

Contents

11. How does git make a commit?#

Today we will dig into how git really works. This will be a deep dive and provide a lot of details about how git creates a commit. It will conceptually reinforce important concepts and practically give you some ideas about how you might fix things when things go wrong.

Later, we will build on this more on the practical side, but these concepts are very important for making sense of the more practical aspects of fixing things in git.

This deep dive in git is to help you build a correct, flexbile understanding of git so that you can use it independently and efficiently. The plumbing commands do not need to be a part of your daily use of git, but they are the way that we can dig in and see what actually happens when git creates a commit.

this is also to serve as an example method you could apply in understanding another complex system

Inspecting a system’s components is a really good way to understand it and correctly understanding it will impact your ability to ask good questions and even look up the right thing to do when you need to fix things.

Also, looking at the parts of git is a good way to reinforce specific design patterns that are common in CS in a practical way. This means that today we will also:

  • review and practice with the bash commands we have seen so far

  • see a practical example of hashing

  • reinforce through examples what a pointer does

navigate to your github inclass repo

Recall: git stores important content in files that it uses like variables.

For example:

cat .git/HEAD 

.gitignore is a file in the working direcotry that contains alist of files and patterns to not track.

11.1. Creating a repo from scratch#

We will start in the top level course directory.

cd inclass/systems
ls
gh-inclass-sp24-brownsarahm	tiny-book
kwl-sp24-brownsarahm

Yours should also have your kwl repo, gh inclass repo, course website clone, etc.

We can create an empty repo from scratch using git init <path>

Last time we used an existing directory like git init . because we were working in the directory that already existed

Today we will create a new directory called test

git init test
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint: 
hint: 	git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint: 	git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/test/.git/

we get this message again, see context from last week

Some people did not get the message

We can see what it did by first looking at the working directory

ls
gh-inclass-sp24-brownsarahm	test
kwl-sp24-brownsarahm		tiny-book

it made a new folder named as we said

and we can go into that directory

cd test/

and then rename the branch

git branch -m main

To clarify we will look at the status

git status

Notice that there are no commits, and no origin.

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Some people did not get the error message, so I decided to investigate. The message says that the warnign will only show if you do not set a default. So it is possible that some students set that.

First, though we checked to ensure that no onr was on an older version

git --version
git version 2.41.0
ls

it has no files

but with -a

ls -a

we see it does have the .git folder

.	..	.git
ls .git
HEAD		description	info		refs
config		hooks		objects

we can see the basic requirements of an empty repo here.

11.2. Searching the file system#

We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.

find .git/objects/
.git/objects/
.git/objects//pack
.git/objects//info

we have a few items in that directory and the directory itself.

We can limit by type, to only files with the -type option set to f

find .git/objects/ -type f

And we have no results. We have no objects yet. Because this is an empty repo

11.3. Git Objects#

Remember our 3 types of objects

  • blob objects: the content of your files (data)

  • tree objects: stores file names and groups files together (organization)

  • Commit Objects: stores information about the sha values of the snapshots

classDiagram class tree{ List: - hash: blob - string: type - string:file name } class commit{ hash: parent hash: tree string: message string: author string: time } class blob{ binary: contents } class object{ hash: name } object <|-- blob object <|-- tree object <|-- commit

11.3.1. How to create an object#

All git objects are files stored with the name that is the hash of the content in the file

Remember git is a content-addressable file systsem… so it uses key- value pairs.

Let’s create our first git object. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our .git directory with this.

The git hash-object command works on files, but we do not have any files yet. We can create a file, but we do not have to. Remememer, everything is a file.

When we use things like echo it writes to the stdout file.

echo "test content"
test content

which shows on our terminal. We can us a pipe to connect the stdout of on command to the stdin of the next.

pipes are an important content too. we’re seeing them in context of real uses, and we will keep seing them. Pipes connect the std out of one command t othe std in of the next.

echo "test content" | git hash-object --stdin

We can break down this command:

  • git hash-object would take the content you handed to it and merely return the unique key

  • --stdin option tells git hash-object to get the content to be processed from stdin instead of a file

  • the | is called a pipe (what we saw before was a redirect) it pipes a process output into the next command

  • echo would write to stdout, withthe pip it passes that to std in of the git-hash

we get back the hash:

d670460b4b4aece5915caf5c68d12f560a9fe3e4

and we can check if it wrote to the directory.

find .git/objects/ -type f

and it shows nothing

Now let’s run it again with a slight modification. -w option tells the command to also write that object to the database

echo "test content" | git hash-object --stdin -w

we get the same hash back again

d670460b4b4aece5915caf5c68d12f560a9fe3e4

and we can check if it wrote to the directory.

find .git/objects/ -type f
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

and we see a file that it was supposed to have!

11.3.2. Viewing git objects#

We can try with cat

cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4 
xK??OR04f(I-.QH??+I?+?K?	

This is binary output that we cannot understand. Fortunately, git provides a utility. We can use cat-file to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (7046 will not work, but d670 will)

cat-file requires an option -p is for pretty print

git cat-file -t d670
blob

we see that it is a blob.

Then we can do it with the -p option instead to see the content

git cat-file -p d670
test content

where we see the content we put in to the hashing function

ls

11.3.3. Hashing a file#

let’s create a file

echo "version 1" >test.txt

and store it, by hashing it

git hash-object -w test.txt 
83baae61804e65cc73a7201a7252750c76066a30

we can look at what we have.

find .git/objects/ -type f

we see two objects as expected

.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

Now this is the status of our repo.

classDiagram class d67046{ + "test content" +(blob) } class 83baae{ + Version 1 + (blob) }

We can check the type of files with -t and git cat-file

git cat-file -t 83baa
blob

it is a blob object as expected

Notice, however, that we only have one file in the working directory.

ls
test.txt

it is the one test.txt, the first blob we made had no file in the working directory associated to it.

the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in closer relationship that we currently have unless we use plumbling commands, but it is good to build a solid understanding of how the .git directory relates to your working directory.

git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	test.txt

nothing added to commit but untracked files present (use "git add" to track)

So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in the tree and there are no commits that point to that part of the tree.

11.3.4. Updating the Index#

Now, we can add our file as it is to the index.

the \ lets us wrap onto a second line.

  • this the plumbing command git update-index updates (or in this case creates an index, the staging area of our repository)

  • the --add option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)

  • --cacheinfo because the file we’re adding isn’t in your directory but is in the database.

  • in this case, we’re specifying a mode of 100644, which means it’s a normal file.

  • then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.

  • finally the file name of that content

git update-index --add --cacheinfo 100644 \
 83baae61804e65cc73a7201a7252750c76066a30 test.txt

Again, we check in with status

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

We have the files staged as expected

11.3.5. A file can have multiple statuses#

Now the file is staged.

Let’s edit it further.

echo "version 2" >> test.txt

We can look at the content to ensure it as expected

cat test.txt 
version 1
version 2

So the file has two lines

Now check status again.

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.

We can hash and store this version too.

git hash-object -w test.txt
0c1e7391ca4e59584f8b773ecdbbb9467eba1547

We can then look again at our list of objects.

find .git/objects/ -type f
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
classDiagram class d67046{ +"test content" +(blob) } class 83baae{ +Version 1 +(blob) } class 0c1e73{ +Version 1 +Verson 2 +(blob) }

So now our repo has 3 items, all blobs

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

hashing the object does not impact the index, which is what git status uses

11.3.6. Preparing to Commit#

When we work with porcelain commands, we use add then commit. We have staged the file, which we know is what happens when we add. What else has to happen to make a commit.

We know that commits are comprised of:

  • a message

  • author and times stamp info

  • a pointer to a tree

  • a pointer to the parent (except the first commit)

We do not have any of these items yet.

Let’s make a tree next.

Now we can write a tree from the index,

git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579

and we get a hash

Lets examine the tree, first check the type

git cat-file -t d832
tree

it is as expected

and now we can look at its contents

git cat-file -p d832
100644 blob 83baae61804e65cc73a7201a7252750c76066a30	test.txt

Now this is the status of our repo:

classDiagram class d67046{ +"test content" +(blob) } class 83baae{ +Version 1 +(blob) } class d8329f{ +blob: 83baae +filename: test.txt +(tree) } class 0c1e73{ +Version 1 +Verson 2 +(blob) } d8329f --|> 83baae

Again, we will check in with git via git status

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

Nothing has changed, making the tree does not yet make the commit

11.3.7. Creating a commit manually#

We can echo a commit message through a pipe into the commit-tree plumbing function to commit a particular hashed object.

the git commit-tree command requires a message via stdin and the tree hash. We will use stdin and a pipe for the message

echo "first commit" | git commit-tree d832
e3ba10cb02de504d4f48b9af4934ddcc4d0be3df

and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user.

We can also look at its type

git cat-file -t e3ba
commit

and we can look at the content

git cat-file -p e3ba
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1709231797 -0500
committer Sarah M Brown <brownsarahm@uri.edu> 1709231797 -0500

first commit

Now we check the final list of objects that we have for today

find .git/objects/ -type f
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects//e3/ba10cb02de504d4f48b9af4934ddcc4d0be3df
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

Important

Check that you also have 5 objects and 4 of them should match mine, the one you should not have is the e3ba10 one but you should have a different one in its place.

Visually, this is what our repo looks like:

classDiagram class d67046{ "test content" +(blob) } class 83baae{ "Version 1" +(blob) } class d8329f{ blob: 83baae filename: test.txt +(tree) } class 0c1e73{ Version 1 Verson 2 +(blob) } class 188a75{ tree d8329f author name commiter time +(commit) } d8329f --|> 83baae 188a75 --|> d8329f
git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

11.4. What does git status do?#

compares the working directory to the current state of the active branch

  • we cansee the working directory with: ls

  • we can see the active branch in the HEAD file

  • what is its status?

git status

we see it is “on main” this is because we set the branch to main , but since we have not written there, we have to do it directly. Notice that when we use the porcelain command for commit, it does this automatically; the porcelain commands do many things.

Notice, git says we have no commits yet even though we have written a commit.

In our case because we made the commit manually, we did not update the branch.

This is because the main branch does not point to any commit.

We will pick up from here next class.

11.5. Prepare for this class#

  1. Read the notes from February 29th. We will build on these directly in the future. You need to have the test repo with the same status for lab on class on 3/5 Make sure you have completed all of the steps in the github inclass repo.

11.6. Badges#

  1. Make a table in gitplumbingreview.md in your KWL repo that relates the two types of git commands we have seen: plubming and porcelain. The table should have two columns, one for each type of command (plubming and porcelain). Each row should have one git porcelain command and at least one of the corresponding git plumbing command(s). Include two rows: add and commit.

  1. Read more details about git internals to review what we did in class in greater detail. Make a file gitplumbingdetail.md and create a visualization that is compatible with version control (eg can be viewed in plain text and compared line by line, such as table or mermaid graph) that shows the relationship between at least three porcelain commands and their corresponding plumbing commands (generally more than one each).

  2. Create gitislike.md and explain main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby.

11.7. Experience Report Evidence#

Important

You need to have a test repo that matches this for class on tuesday.

Generate your evidence with the following in your test repo

find .git/objects/ -type f > testobj.md

then append the contents of your commit object to that file.

Move the testobj.md to your kwl repo in the experiences folder.

11.8. Questions After Today’s Class#

11.8.1. After finalizing the repo, how do we go about uploading it to GitHub? Do we push it like we would new content for cloned local repos?#

You could add a remote and then push; we will not push this one, but that is possible.

11.8.2. Is there any benefit to using git like this over how we normally use it with git add and git commit?#

It is useful for edge cases and inspection. It will not be the way you use it day to day, but it would be useful, if you were working on an applicaiton the built on top of git. It is also helpful to go through this way to see how git really works.

11.8.3. What would happen if a commit is made and a blob or tree object is later deleted?#

This is a good explore badge idea. My guess is that it would not do anything immediately, but if you tried to check out that commit it would have an issue and complain about a corrupted repo.

11.8.4. what does the git index do I’m slightly confused about the functionality of this?#

The index is the staging area, it is where we store information necessary to make a commit while we get ready to make a commit.

11.8.5. How are hashes made?#

git uses the SHA-1 hardened algorithm. We will learn more about that algorithm soon.

11.8.6. Why do four of the objects match for everyone?#

The core objects are all the same, so the hashes are the same because the hashes are a function of the content. Only the commit has unique information in it.

11.8.7. using cat on the “test object” gave me xK▒▒OR04f(I-.QH▒▒+I▒+▒K▒, your computer output the same thing but rendered the as ? and other actual symbols, why?#

They are both binary content, but we are using different terminal applications that attempt to render it using different character sets.

11.8.8. did any of what we did today synch with remote servers on github?#

No, this repo has no remotes.

11.8.9. How do we upload the new repository and connect it to others#

add a remote, like how we did with our tiny-book

11.8.10. Why is it so hard to have a repositry inside of another one?#

One is that it is often just not what you want.

Two then you have to manage which repo git is working on when you commit.

Learning about uses and best practices for git submodules is a good explore.

11.8.11. I thought we psuh into remote before we staged the content?#

We can only push content that hash been added to the .git directory, so without staging, we would’t have the tree.

11.8.12. Do remote repositories like github store the repo in a similar .git folder#

yes! that is the content that gets compressed and sent back and forth.

11.8.13. Why would manually doing the commands be useful or is this just to help us understand how they work?#

Mostly just to understand, but plumbing commands can be helpful in fixing if weird stuff happens.

11.8.14. What happens if there is a collision with hashes? Especially if you have a massive repository with thousands of files?#

(we will come back to this later)

git used SHA-1 initially which is vulnerable to collision attacks, so now it uses a hardened one, it knows which are possible to collide andif one of those occurs, it changes.

git is working toward changing to SHA-256 to make collisisons much less likely.

11.8.15. Why would we hash an object but not add it to a regular commit?#

We would not typically do this on purpose, but we do hash and store objects when we use git add before they are added to the index, they must be hashed. So if you do atypical things at that point you could end up with something hashed but not linked to a commit, long term instead of only temporary.