8. What is a commit?#

8.1. Defining terms#

A commit is the most important unit of git. Later we will talk about what git as a whole is in more detail, but understanding a commit is essential to understanding how to fix things using git.

In CS we often have multiple, overlapping definitions for a term depending on our goal.

In intro classes, we try really hard to only use one definition for each term to let you focus.

Now we need to contend with multiple definitions

These definitons could be based on

  • what it conceptually represents

  • its role in a larger system

  • what its parts are

  • how it is implemented

for a commit, today, we are going to go through all of these, with lighter treatment on the implementation for today, and more detail later.

8.2. Conceptually, a commit is a snapshot#

storing data as snapshots over time

git takes a full snapshot of the repo at each commit.

Under the hood, it only makes a new copy of files that have changed because it uses the same technique to store each snapshot, so any files that have not changed, do not create new files inside of git.

8.3. A commit’s role is central to git#

a commit is the basic unit of what git manages

All other git things are defined relative to commits

  • branches are pointers to commits that move

  • tags are pointers to commits that do not move

  • trees are how file path/organization information is stored for a commit

  • blobs are how files contents are stored when a commit is made

8.4. Parts of a commit#

We will learn about the structure of a commit by inspecting it.

First we will go back to our gh-inclass repo

cd Documents/inclass/systems/gh-inclass-brownsarahm/

We can use git log to view past commits

git log
 1commit 1e2ab9259651a73ad277e826d602514d28969c86 (HEAD -> organization)
 2Author: Sarah M Brown <brownsarahm@uri.edu>
 3Date:   Tue Sep 24 13:30:44 2024 -0400
 4
 5    include readmen content
 6
 7commit 87c72aeca9bd16700fc8fd8ee719136c13e83e01 (origin/organization)
 8Author: Sarah M Brown <brownsarahm@uri.edu>
 9Date:   Tue Sep 24 13:23:36 2024 -0400
10
11    stop tracking
12
13commit d2d1fac72642204bfdcebc7703d786615b8de934
14Author: Sarah M Brown <brownsarahm@uri.edu>
15Date:   Tue Sep 24 13:16:10 2024 -0400
16
17    organized files into foleders and ignore private
18
19commit a3904a0a5e7adbcbf9fe439c387fb4dbd7846c51
20Author: Sarah M Brown <brownsarahm@uri.edu>
21Date:   Tue Sep 24 12:46:19 2024 -0400
22
23    Revert "start organizing"
24    
25    This reverts commit 9120d9d88aa587e4ffda1ee9aa8c3dcf8f764f7e.

here we see some parts:

  • hash (the long alphanumeric string)

  • (if merge)

  • author

  • time stamp

  • message

but we know commits are supposed to represent some content and we have no information about that in this view

the hash is the unique identifier of each commit

we can view individual commits with git cat-file and at least 4 characters of the hash or enough to be unique. We will try 4 characters and I will use the first visible commit above, that is highlighted

git cat-file has different modes:

  • -p for pretty print

  • -t to return the type

git cat-file -p 1e2a
tree 7c055c5ff9309a982982db0b890bc2a02926d7e3
parent 87c72aeca9bd16700fc8fd8ee719136c13e83e01
author Sarah M Brown <brownsarahm@uri.edu> 1727199044 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1727199044 -0400

include readmen content

Here we see the actual parts of a commit file:

  • a pointer to a tree

  • a pointer to a parent commit (highlighted)

  • author info with timestamp

  • committer info with timestamp

  • commit message

8.4.1. Commit parents help us trace back#

kind of like a linked list

we can use the hash of the parent in the output above

git cat-file -p 87c7
tree 06895b0f89062a5d9d12b5de5a068dc253f27092
parent d2d1fac72642204bfdcebc7703d786615b8de934
author Sarah M Brown <brownsarahm@uri.edu> 1727198616 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1727198616 -0400

stop tracking

8.4.2. Commit trees are the hash of the content#

The snapshot is stored via a tree, we can use git cat-file to look at the tree object too.

The tree being a separate object from the overall commit allows us to be able to “edit” a message or “change” the parent of a commit; we actually make a new commit with the same tree.

let’s look at the tree for that commit.

git cat-file -p 0689
040000 tree 263fb9d22090e88edd2bf1847c24c3511de91b49	.github
100644 blob 9fdc6b1b8d6b0916ef50b0a37e8c31999117016d	.gitignore
100644 blob 9ece5efa25710c8fad7d9f210928785b5362b06f	CONTRIBUTING.md
100644 blob 2d232a2231c650dc4094606797fe0bd3e0ce4c65	LICENSE.md
100644 blob b8eb6e89c6295e574ee5e3363d51c917a16797ff	README.md
040000 tree f596404cd28ea4bad49ff73fb4884049ab0e31f2	docs
100644 blob 39d5708913a6c708d1a505cde6da544785c086a6	setup.py
040000 tree 8c3cc97ca6446c270ca0b8f7d4ce640a6e81e468	src
040000 tree d3980efccf4856f0c61a6a16ed40be534c5230a5	tests

in this we have several columns:

  • mode (indicates normla file or directory in the working directory)

  • git object type (block or tree)

  • hash of the object

  • its file name in the working directory

The highlighted line for LICENSE.md we all have the same hash (as long as you picked a commit and tree after that file was created). This is because the hashis of the contents and the files all do have the same contents

8.4.3. Trees point to blobs of the file content#

We can also use git cat-file to view a blob.

git cat-file -p 2d23
the info on how the code can be reused

++{“lesson_part”:“main”}

8.5. Commits are implemented as files#

commits are stored in the .git directory as files. git itself is a file system, or a way of storing information.

Everything the git program uses is stored in the .git directory, you can think of that like all of the variables the program would need if it ran all the time.

ls .git
COMMIT_EDITMSG	REBASE_HEAD	index		packed-refs
FETCH_HEAD	config		info		refs
HEAD		description	logs
ORIG_HEAD	hooks		objects

the ones in all caps are simple pointers and the others are other formats.

Most of the content is in th objects folder, git objects are the items that get stores.

Recall, we had seen the HEAD pointer before

cat .git/HEAD
ref: refs/heads/organization

which stores our current branch

Most of the content is in the objects folder, git objects are the items that get stores.

ls .git/objects/
06	29	46	72	93	ab	c7	e9
0c	2d	4c	76	94	b0	ca	f1
0e	38	5b	7a	99	b1	cb	f5
10	39	5f	7c	9d	b8	d2	f9
19	3a	62	85	9e	c0	d3	info
1e	3c	63	87	9f	c2	d8	pack
1f	3d	66	8c	a3	c3	dd
25	45	70	91	a8	c5	e0

We see a lot more folders here than we had commits. This is because there are three types of objects.

8.5.1. a commit is a type of git object#

This is a class diagram for the git objects:

classDiagram class tree{ List: - hash: blob - string: type - string:file name } class commit{ hash: parent hash: tree string: message string: author string: time } class blob{ binary: contents } class object{ hash: name } object <|-- blob object <|-- tree object <|-- commit
cat .git/objects/29/245e4b9cce937fb9e50bc3762ab19c6a7a12c3 
x%?A
?0Fa?9?nt!?]? *(
??x?1??`Ld2???V?????eS/???P???1?aLL?EUT???!=????fu??~?
                                                      ??.???x?TItƤ???|)?>?'#?Fܢhϔ?%?Cu?ڮ.??ђGb?????|Ez8```

+++{"lesson_part": "main"}

```{code-cell} bash
:tags: ["skip-execution"]
git cat-file -t 2924
blob
git cat-file -p 2924
# Sarah Brown

tenure year: 2027
- i skied competiively in hs 
<<<<<<< HEAD
- i started at uri in 2020
=======
- i went to Northeastern
>>>>>>> 62dcf61 (local second fun fact)
git log
commit 1e2ab9259651a73ad277e826d602514d28969c86 (HEAD -> organization)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:30:44 2024 -0400

    include readmen content

commit 87c72aeca9bd16700fc8fd8ee719136c13e83e01 (origin/organization)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:23:36 2024 -0400

    stop tracking

commit d2d1fac72642204bfdcebc7703d786615b8de934
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:16:10 2024 -0400

    organized files into foleders and ignore private

commit a3904a0a5e7adbcbf9fe439c387fb4dbd7846c51
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 12:46:19 2024 -0400

    Revert "start organizing"
    
    This reverts commit 9120d9d88aa587e4ffda1ee9aa8c3dcf8f764f7e.

commit 4ceb1500582236e98bdb141116821a5857f75a76
git status
On branch organization
Your branch is ahead of 'origin/organization' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
ls
CONTRIBUTING.md	README.md	scratch.ipynb	src
LICENSE.md	docs		setup.py	tests

8.6. Commit messages are essential#

A git commit message must exist and is always for people, but can also be for machines.

the conventional commits standard is a format of commits

if you use this, then you can use automated tools to generate a full change log when you release code

Tooling and examples of conventional commits

8.7. Git tags point to commits and do not move#

echo "example" >> README.md 

git commit -a -m 'add more stuff'
[organization e8ab736] add more stuff
 1 file changed, 1 insertion(+)
git log 
commit e8ab736a7c843e8515e484b136ffcbccfc162618 (HEAD -> organization)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Oct 1 13:33:22 2024 -0400

    add more stuff

commit 1e2ab9259651a73ad277e826d602514d28969c86
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:30:44 2024 -0400

    include readmen content

commit 87c72aeca9bd16700fc8fd8ee719136c13e83e01 (origin/organization)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:23:36 2024 -0400

    stop tracking

commit d2d1fac72642204bfdcebc7703d786615b8de934
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:16:10 2024 -0400

    organized files into foleders and ignore private

commit a3904a0a5e7adbcbf9fe439c387fb4dbd7846c51
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 12:46:19 2024 -0400
git checkout 1e2a
Note: switching to '1e2a'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 1e2ab92 include readmen content
git tag 0.0.1

git tag
0.0.1
git checkout organization
Previous HEAD position was 1e2ab92 include readmen content
Switched to branch 'organization'
Your branch is ahead of 'origin/organization' by 2 commits.
  (use "git push" to publish your local commits)
git log
commit e8ab736a7c843e8515e484b136ffcbccfc162618 (HEAD -> organization)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Oct 1 13:33:22 2024 -0400

    add more stuff

commit 1e2ab9259651a73ad277e826d602514d28969c86 (tag: 0.0.1)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:30:44 2024 -0400

    include readmen content

commit 87c72aeca9bd16700fc8fd8ee719136c13e83e01 (origin/organization)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:23:36 2024 -0400

    stop tracking

commit d2d1fac72642204bfdcebc7703d786615b8de934
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 13:16:10 2024 -0400

    organized files into foleders and ignore private

commit a3904a0a5e7adbcbf9fe439c387fb4dbd7846c51
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Sep 24 12:46:19 2024 -0400
ls .git
COMMIT_EDITMSG	REBASE_HEAD	index		packed-refs
FETCH_HEAD	config		info		refs
HEAD		description	logs
ORIG_HEAD	hooks		objects
ls .git/refs/
heads	remotes	tags
ls .git/refs/tags/
0.0.1
ls .git/refs/heads/
1-create-a-readme	my_branch
fun_fact		my_branch_checkedoutb
main			organization
## Prepare for Next Class 

```{include} ../_prepare/2024-10-03.md

8.8. Badges#

  1. Export your git log for your KWL main branch to a file called gitlog.txt and commit that as exported to the branch for this issue. note that you will need to work between two branchse to make this happen. Append a blank line, ## Commands, and another blank line to the file, then the command history used for this exercise to the end of the file.

  2. In commit-def.md compare two of the four ways we described a commit today in class. How do the two descriptions differ? How does defining it in different ways help add up to improve your understanding?

  1. Explore the tools for conventional commits and then pick one to try out. Work on the branch for this badge and use one of the tools that helps making conventional commits (eg in VSCode or a CLI for it)for a series of commits adding “features” and “bug fixes” telling the story of a code project in a file called commit-story.md. For each edit, add short phrases like ‘new feature 1’, or ‘next bug fix’ to the single file each time, but use conventional commits for each commit. In total make at least 5 different types of changes (types per conventional commits standard) including 2 breaking changes and at least 10 total commits to the file.

  2. learn about options for how git can display commit history. Try out a few different options. Choose two, write them both to a file, gitlog-compare.md. Using a text editor, wrap each log with three backticks to make them “code blocks” and then add text to the file describing a use case where that format in particular would be helpful. do this after the above so that your git log examples include your conventional commits

8.9. Experience Report Evidence#

8.10. Questions After Today’s Class#

8.10.1. Besides a linked list, what other data structures or algorithms does git use on its inner workings?#

the main algorithm it uses is hashing.

More detail here is a good explore badge.

8.10.2. Why did running ls .git/objects had no files even when I have a commit history?#

git compresses content into packfiles at times. This is rare, but can happen.

8.10.3. Is the only reason I would need to sign a commit is to show that I am the one dispersing it so it doesn’t seem like spam or a virus? or should signing commits become a frequent practice?#

the answer is yes to both questions

8.10.4. When I entered “git cat-file -p 39d5” in the command line, the result was “file with function with instructions for pip”. What does this mean?#

your blob object 39d5 is a fil with that contents like:

file with function with instructions for pip

we do not yet know its filenmae, but that is what this is.

8.10.5. What is the reason for tags? Are they means of labeling? Or means of access?#

they are for labeling and for creating fixed “versions” that can be used for a “release”. For example, in calss I use tags to create releases and then those make notifcations, but for software releases are like the times we push to users.