Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

What is a commit?

Defining terms

A commit is the most important unit of git. Later we will talk about what git as a whole is in more detail, but understanding a commit is essential to understanding how to fix things using git.

In CS we often have multiple, overlapping definitions for a term depending on our goal.

In intro classes, we try really hard to only use one definition for each term to let you focus.

Now we need to contend with multiple definitions

These definitons could be based on

for a commit, today, we are going to go through all of these, with lighter treatment on the implementation for today, and more detail later.

Conceptually, a commit is a snapshot

storing data as snapshots over time

Figure 1:git takes a full snapshot of the repo at each commit.

Under the hood, it only makes an additional copy of files that have changed because it uses the same technique to store each snapshot, so any files that have not changed, do not create new files inside of git.

A commit’s role is central to git

a commit is the basic unit of what git manages

All other git things are defined relative to commits

Parts of a commit

We will learn about the structure of a commit by inspecting it.

First we will go back to our gh-inclass repo

cd Documents/inclass/systems/gh-inclass-fa25-brownsarahm/

From git log

We can use git log to view past commits

git log
most recent commit
previous commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
commit e899a0e7ad5a9626a6d5c6b0fd96a410bd42b710 (HEAD -> organization)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Thu Sep 18 13:38:47 2025 -0400

    begin reorg

commit 285dd2104498d173d1926fb59f5513d224a34a14
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Thu Sep 18 13:04:26 2025 -0400

    add note to readme

commit 3300996de3e91ced5c731d759d29a10f011aeb00 (origin/organizing_ac, organizing_ac)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Thu Sep 18 12:34:06 2025 -0400

    add files for organizing activity

commit 11017a59088d4a0b880f770f15fab8c9e086a789 (origin/main, origin/HEAD, mybranchcheckedoutb, my_branch, main)
Merge: c8f4926 99f86bf
Author: Sarah Brown <brownsarahm@uri.edu>

This shows several commits, but let’s take one for closer inspection, the tabs show highlighted the most recent two commits, each chunk is a distinct commit.

Let’s examine one commit in more detail.

hash
author
time stamp
message
commit e899a
1
2
3
4
5
commit e899a0e7ad5a9626a6d5c6b0fd96a410bd42b710 
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Thu Sep 18 13:38:47 2025 -0400

    begin reorg

the hash is the unique identifier for the commit and the name of the file the commit is stored in in the object database (.git)

the message is written for people to know what the contents of the commit is in human understandable terms.

but we know commits are supposed to represent some content and we have no information about that in this view

Seeing the full commit content

we can view individual commits with git cat-file and at least 4 characters of the hash or enough to be unique. We will try 4 characters and I will use the commit we inspected above, with full hash e899a0e7ad5a9626a6d5c6b0fd96a410bd42b710

git cat-file has different modes and one must be specified for it to work:

We’ll use -p here:

git cat-file -p e899
tree 6f435051d686c4fec112cdfe7c73c65ad9153125
parent 285dd2104498d173d1926fb59f5513d224a34a14
author Sarah M Brown <brownsarahm@uri.edu> 1758217127 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1758217127 -0400

begin reorg

Now we see more detail,

This view doesn’t show the hash of the commit anymore, only the contents of the commit itselv

tree
parent
author/time
committer/time
message
commit e899a
1
2
3
4
5
6
tree 6f435051d686c4fec112cdfe7c73c65ad9153125
parent 285dd2104498d173d1926fb59f5513d224a34a14
author Sarah M Brown <brownsarahm@uri.edu> 1758217127 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1758217127 -0400

begin reorg

the tree is a pointer to the content of the snapshot

The commiter and author are generally the same, a common time when they are different is merge commits.

For example this commit from courseutils

commit 2a6c732 from courseutils
tree 334b6685f652af4236afd3310d44881c6f0159a3
parent 522f11e939ae38d7081d1e9a1a1aa4b9a93ea4d2
parent d833ad8e4b41756e17a42c9537c04cb7fa8da3a9
author Sarah Brown <brownsarahm@uri.edu> 1737644878 -0500
committer GitHub <noreply@github.com> 1737644878 -0500

Trees are object too

So we can look at them the same way

git cat-file -p 6f43
040000 tree 263fb9d22090e88edd2bf1847c24c3511de91b49	.github
100644 blob c1b4f81358eaaf467ff4ce4b95171497c28d1622	.gitignore
100644 blob 3a4533a3abbc749f5e1905b30eb187a7350ae71a	API.md
100644 blob 9ece5efa25710c8fad7d9f210928785b5362b06f	CONTRIBUTING.md
100644 blob 2d232a2231c650dc4094606797fe0bd3e0ce4c65	LICENSE.md
100644 blob 19e9a4e91197294600869263508df42d46328d5c	README.md
100644 blob 9d6ffa6ded47d8b6df13ed60e482b188015ee499	abstract_base_class.py
100644 blob 762f01b5cf84f39096d55ca95e46f0519d8cae48	alternative_classes.py
040000 tree 743db376fa76bb3611cfac6935938d179330c7eb	docs
100644 blob 93c08483f44ebdf5ce10e6c0002e641aa0cc8844	example.md
100644 blob f9e70e5b8173525188a6b10ce5979972de4e0d9f	helper_functions.py
100644 blob 762f01b5cf84f39096d55ca95e46f0519d8cae48	important_classes.py
100644 blob d87bf4a5641e0429fd3c371bd2b19d755105ca92	scratch.ipynb
100644 blob 39d5708913a6c708d1a505cde6da544785c086a6	setup.py
040000 tree d3980efccf4856f0c61a6a16ed40be534c5230a5	tests

the tree contains pointers to more objects: other trees and blob objects that contain the file conten

Blobs hold the content of a file

and we can look at them too

git cat-file -p c1b4
.secret

since I am tracing the tree of my own most recent commit, and i have not changed anything since

git status
On branch organization
nothing to commit, working tree clean

the file contents should match

cat .gitignore
.secret

and it does!

Commits are implemented as files

commits are stored in the .git directory as files. git itself is a file system, or a way of storing information.

Everything the git program uses is stored in the .git directory, you can think of that like all of the variables the program would need if it ran all the time.

ls .git
COMMIT_EDITMSG	FETCH_HEAD	index		objects		REBASE_HEAD
config		HEAD		info		ORIG_HEAD	refs
description	hooks		logs		packed-refs

Most of the content is in the objects folder, git objects are the items that get stores.

ls .git/objects/
0b	1f	37	49	76	9e	bc	d3	e8
11	21	39	4f	82	a4	c1	d6	f9
18	28	3a	63	93	ab	c4	d8	info
19	2d	3c	6f	99	b0	cc	da	pack
1e	33	48	74	9d	b6	ce	e0

We see a lot more folders here than we had commits. This is because there are multiple types of objects that all create entries in this object database

There are 3 main types:

Each of those folders is the first 2 digits of at least one hash, or unique identifier for an object. We can list what is in one of those folders

ls .git/objects/c1
b4f81358eaaf467ff4ce4b95171497c28d1622

Mine has just one, most will in a small repo like this, but it could be mroe than one.

We can look at the plain file using cat

cat .git/objects/c1/b4f81358eaaf467ff4ce4b95171497c28d1622
xK??OR?`?+NM.J-?&K?

The content of the file is stored in compressed form, not human readable.

the git cat-file git git plumbing command can parse the file though so we can read it.

git cat-file -p c1b4
.secret

The same is true for commit objects:

cat .git/objects/e8/99a0e7ad5a9626a6d5c6b0fd96a410bd42b710
x??A
?0=??%??&-??wO? ?n??Jl??Z??i```?2M?
??a???wl%4!?????9?8I???<S??,Bh?o?[
p?[]?3????M׭?'???!?H??[k?or??s?i??Pu???V?G?

still not readable

we can also check the type of objects

git cat-file -t c1b4
blob
git cat-file -t e899
commit

both as expected

Losing stuff in git is hard

Imagine you had several commits including some on a new branch

a conceptual depiction of 3 commits with 1 tree each and blobs for some files with a main branch and an unnamed second branch

Figure 2:in this example we have 3 commits: A, B, C and each has a tree and there are some blob objects. The arrows are the things we can trace through, where there are pointers from one object to another.

Now conisder that you switched back to the main branch and then deleted the new branch without merging those commits.

git checkout main
git branch -d new

so our situation is like this

same as above with the new branch dashed to indicate deleted

Figure 3:now the new branch is deleted

Normally when we get a branch with git checkout the following happens:

  1. changing the head pointer to point to that branch

  2. going to the commit the branch points to

  3. reading the tree in the commit

  4. for each blob/tree in the that tree, create the file based on the name in the tree and the conent in the blob

Now we have no way to access commit C or the chances to files f1 and f2, because we have no pointer to it. The contente is still there though!

We could recover it manually by:

  1. logging the current commit hash (B)

  2. goign to its tree and logging the hash for tB

  3. going to the blob objects from that tree and logging each of those (f3.0, f2.1, f1.0)

  4. going the commit before commit (from B to A)

  5. logging the hash of that commit (A)

  6. goign to its tree and logging the hash for tA

  7. going to the blob objects from that tree and logging each of those (f2.0, f1.0)

  8. making a list of all of the hashes from .git/objects

  9. finding which hashes are there but on on our list so far (C,tC, f2.3, f1.2)

  10. checking each of those for which is a commit

  11. setting a branch new_fixed to that hash (C)

Tracing the commit history

We can trace in a real repo by first looking at the HEAD pointer

cat .git/HEAD
ref: refs/heads/organization

that points to the branch, which is a file so we can look at that too

cat .git/refs/heads/organization
e899a0e7ad5a9626a6d5c6b0fd96a410bd42b710

then we look at that commit

git cat-file -p e899
1
2
3
4
5
6
tree 6f435051d686c4fec112cdfe7c73c65ad9153125
parent 285dd2104498d173d1926fb59f5513d224a34a14
author Sarah M Brown <brownsarahm@uri.edu> 1758217127 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1758217127 -0400

begin reorg

then its parent

git cat-file -p 285dd
1
2
3
4
5
6
tree c429050b554c3d504dc964b32f59affcf28f6435
parent 3300996de3e91ced5c731d759d29a10f011aeb00
author Sarah M Brown <brownsarahm@uri.edu> 1758215066 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1758215066 -0400

add note to readme

and we could continue back

Commit messages are essential

A git commit message must exist and is always for people, but can also be for machines.

the conventional commits standard is a format of commits

if you use this, then you can use automated tools to generate a full change log when you release code

A tip in code spaces

Codespaces are a virutal machine that you can use VSCode on in browser. You only have VSCode access to this system, but VSCode with the terminal is a lot of power.

first check the status

git status
On branch organization
nothing to commit, working tree clean

but it’s not pushed so we do that

git push
fatal: The current branch organization has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin organization

To have this happen automatically for branches without a tracking
upstream, see 'push.autoSetupRemote' in 'git help config'.

we have seen this before

so we do what git suggests

git push --set-upstream origin organization
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 16 threads
Compressing objects: 100% (7/7), done.
Writing objects: 100% (9/9), 1.36 KiB | 1.36 MiB/s, done.
Total 9 (delta 2), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (2/2), completed with 1 local object.
remote: 
remote: Create a pull request for 'organization' on GitHub by visiting:
remote:      https://github.com/compsys-progtools/gh-inclass-fa25-brownsarahm/pull/new/organization
remote: 
To https://github.com/compsys-progtools/gh-inclass-fa25-brownsarahm.git
 * [new branch]      organization -> organization
branch 'organization' set up to track 'origin/organization'.

Then move to browser

gh repo view --web
Opening https://github.com/compsys-progtools/gh-inclass-fa25-brownsarahm in your browser.
  1. Navigate to your github inclass repo on Github.com

  2. Use the link in the README or the green code button to open a new codespace on main.

when your codespace is open, share its name (first part of the url 2 words)

If VSCode is new to you, use their documentation of the VSCode interface to get oriented to the different parts of the screen.

Prepare for Next Class

  1. review the notes from the last class and make sure you understand. Bring any questions that come up for you or post them as issues on the course website.

  2. Think through and make some notes about what you have learned about design (software or otherwise) so far. Try to answer the questions below in design_before.md. If you do not now know how to answer any of the questions, write in what questions you have.

- What past experiences with making decisions about design of software do you have?
- what experiences studying design do you have? 
- What processes, decisions, and practices come to mind when you think about designing software?
- From your experiences as a user, how you would describe the design of command line tools vs other GUI based tools?

Badges

Review
Practice
  1. Export your git log for your KWL main branch to a file called gitlog.txt and commit that as exported to the branch for this issue. note that you will need to work between two branchse to make this happen. Append a blank line, ## Commands, and another blank line to the file, then the command history used for this exercise to the end of the file.

  2. In commit-def.md compare two of the four ways we described a commit today in class. How do the two descriptions differ? How does defining it in different ways help add up to improve your understanding?

  3. Find the detailed view of the commit that added today’s notes to the website in github.com and locally. In commit-detail.md include the url to the commit in github and the contents of the commit object with some notes on any differences (if any).

Experience Report Evidence

redirect your history to a file log-2024-02-08.txt and include it with your experience report.

Questions After Today’s Class

What are gonna be our applications of tracing hashes?

This was mostly a learning experience, having this understanding makes the more complex git commands make sense. You generally will not do it yourself, unless you were contributing to git.

I think it’s easier to remember long term that this is how git works if you hve actually gone through it than just seeing for example a slide saying thats what it does.

Is using rf safe as long is its not recursively deleting?

rm can be used safely on specific files for sure

rm -rf can even be used if you know that the folder is not needed

rm -rf ./~ is proably never what you want.

a meme about an AI offering to run rm -rf ~

How can I check the type of a git object?

git cat-file -t <hash> or see the examples above

Is it possible to accidentally delete commits?

It is possible to delete them, but accidentally is hard to define.

If you are using normal git commands like add, commit, etc then no.

To delete one you have to manually edit in the .git directory or delete that whole folder for a repo that has not been pushed to any remote

Besides the Conventional Commit Standard, what else can be used to better format commits?

This is a good explore badge topic

Footnotes
  1. generally your user.name and user.email are used for both author and committer, but you can use environment variables to override