12. What is git?#

Last class we created a local repo, then created an empty repo on GitHub. Linked both repos and attmpted to push local changes on the “empty” repo but weren’t successful

The reason for that is the way the new GitHub repo was created. When we used GitHub Classroom to fork my template tiny-book repo as your new empty repo GitHub made two commits. Making it not an empty repo.

cd Documents/systems/tiny-book
git push -u origin main
To https://github.com/compsys-progtools/tiny-book-AymanBx
 ! [rejected]        main -> main (non-fast-forward)
error: failed to push some refs to 'https://github.com/compsys-progtools/tiny-book-AymanBx'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. If you want to integrate the remote changes,
hint: use 'git pull' before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

Let’s check it out

gh repo view --web

Let’s compare that to what we have locally

git log
commit 1da3fa4d2b1b14e3a92358455c2320697af43867 (HEAD -> main)
Author: AymanBx <ayman_sandouk@uri.edu>
Date:   Tue Mar 4 13:42:31 2025 -0500

    jupyter book template

When we attempted to pull we weren’t fully successful because we failed to link the two mains because the push command failed.

git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 5 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Unpacking objects: 100% (5/5), 1.70 KiB | 102.00 KiB/s, done.
From https://github.com/compsys-progtools/tiny-book-AymanBx
 * [new branch]      feedback   -> origin/feedback
 * [new branch]      main       -> origin/main
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=origin/<branch> main

We should follow git instructions

git branch --set-upstream-to=origin/main main
branch 'main' set up to track 'origin/main'.

Now we should be able to pull successfully

git pull
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint:   git config pull.rebase false  # merge
hint:   git config pull.rebase true   # rebase
hint:   git config pull.ff only       # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.

Git recognized the different commit history as in the repos as a conflict

To resolve this type of conflict (main had unrelated changes on it that are unknown by my branch) We rebase

rebase is updating my current branch with new commits that occurred on main (or any branch I want to rebase with)

git pull --rebase
Successfully rebased and updated refs/heads/main.

What did that do?

git log
Author: AymanBx <ayman_sandouk@uri.edu>
Date:   Tue Mar 4 13:42:31 2025 -0500

    jupyter book template

commit d781535217b324d9cb2ce6cb45ae565b54ee786f (origin/main)
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date:   Tue Oct 8 16:54:12 2024 +0000

    Setting up GitHub Classroom Feedback

commit 72bcbb8cbd2769d21aad3c23c8fbe477d0260ced (origin/feedback)
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date:   Tue Oct 8 16:54:12 2024 +0000

    GitHub Classroom Feedback

Now we should be able to push

git push
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 8 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 16.33 KiB | 8.17 MiB/s, done.
Total 12 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/compsys-progtools/tiny-book-aymanbx.git
   d781535..60bd457  main -> main

A couple of weeks ago, we learned about what a commit is and then we took a break from how git works, to talk more about unix philosophy and how developers communicate about code

Today we will learn what git is more formally.

study tip

We will go in and out of topics at times, in order to provides what is called spaced repetition, repeating material or key concepts with breaks in between.

Using git correctly is a really important goal of this course because git is an opportunity for you to demonstrate a wide range of both practical and conceptual understanding.

So, I have elected to interleave other topics with git to give core git ideas some time to simmer and give you time to practice them before we build on them with more depth at git.

Also, we are both learning git and using git as a motivating example of other key important topics.

12.1. Why so much git?#

Today, we are going to learn what git is and later we will learn more details of how it is implemented.

Remember we are spending so much time with git for two reasons:

  1. it is an important developer tool

  2. it demonstrates important conceptual ideas that occur in other areas of CS

git book is the official reference on git.

this includes other spoken languages as well if that is helpful for you.

12.2. git definition#

From here, we have the full definition of git

git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it.

We do not start from that point, because these documents were written for target audience of working developers who are familar with other, old version control systems and learning an additional one.

Have you used another version control system before?

Most of you, however, have probably not used another version control system.

Let’s break down the definition

12.3. Git is a File system#

Content-addressable filesystem means a key-value data store.

What are some examples of key-value pairs that you have seen in computer science broadly, and in this course specficially, so far?

  • python dictionaries

  • pointers (address,content)

  • parameter, passed values

  • yaml files

some examples of key-value pairs that you have seen in computer science broadly, and in this course specficially

  • python dictionaries

  • pointers (address,content)

  • parameter, passed values

  • yaml files

What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.

12.4. Git is a Version Control System#

In the before times

PhD comics final.doc comic

git stores snapshots of your work each time you commit.

What unit of git is how it represents a snapshot?

  • [ ] branch

  • [ ] blob

  • [x] commit

  • [ ] tag

snapshot of 5 versions of 3 files

it uses 3 stages:

3 stages in git

These three stages are the in relation to your working directory, and potentially remotes.

So in broader context, the git visual cheatsheet is a more complete picture and has commands overlayed with the concept.

12.5. Git has two sets of commands#

  • Porcelain: the user friendly VCS

  • Plumbing: the internal workings- a toolkit for a VCS

Which of the following commands are porcelain commands?

git commit
git cat-file
git add
git status
git hash-object

Which of the following commands are porcelain commands?

git commit
git cat-file
git add
git status
git hash-object

We have so far used git as a version control system. A version control system, in general, will have operations like commit, push, pull, clone. These may work differently under the hood or be called different things, but those are what something needs to have in order to keep track of different versions.

The plumbing commands reveal the way that git performs version control operations. This means, they implement the git file system operations for the git version control system.

You can think of the plumbing vs porcelain commands like public/private methods. As a user, you only need the public methods (porcelain commands) but those use the private ones to get things done (plumbing commands). We will use the plumbing commands over the next few classes to examine what git really does when we call the porcelain commands that we will typically use.

Example?

12.6. Git is distributed#

What does that mean?

Git runs locally. It can run in many places, and has commands to help sync across remotes, but git does not require one copy of the repository to be the “official” copy and the others to be subordinate. git just sees repositories.

For human reasons, we like to have one “official” copy and treat the others as other copies, but that is a social choice, not a technological requirement of git. Even though we will typically use it with an offical copy and other copies, having a tool that does not care, makes the tool more flexible and allows us to create workflows, or networks of copies that have any relationship we want.

It’s about the workflows, or the ways we socially use the tool.

12.6.1. Subversion Workflow#

subversion workflow

subversion is an older VCS

12.6.2. Centralized Manager#

integration manager workflow

12.6.3. dictator and lieutenants#

dictator and lieutenants workflow

This is a variant of a multiple-repository workflow. It’s generally used by huge projects with hundreds of collaborators; one famous example is the Linux kernel. Various integration managers are in charge of certain parts of the repository; they’re called lieutenants. All the lieutenants have one integration manager known as the benevolent dictator. The benevolent dictator pushes from their directory to a reference repository from which all the collaborators need to pull.

12.7. How does git do all these things?#

We can use the bash command find to search the file system. Note that this does not search the contents of the files, just the names.

find objects/ -type f
.git/objects/06/d56f40c838b64eb048a63e036125964a069a3a
.git/objects/0e/2e3b27f61b5908c4bb75a1ca680ee4053aa992
.git/objects/1d/a3fa4d2b1b14e3a92358455c2320697af43867
.git/objects/29/a422c19251aeaeb907175e9b3219a9bed6c616
.git/objects/2b/d9785b546aa1af7d6e41a48d33a0af811082dd
.git/objects/5f/534f8051f6a94d40e57e58242ef0113fae4fd1
.git/objects/6e/b15166db3ad944529be060af334deb2c022bbd
.git/objects/74/d5c7101ed8c8c1a6f87e31debd9445df1f0e71
.git/objects/78/3ec6aa5afe2f0a66087d01a112f543e1ed287e
.git/objects/7e/821e45db31376729c73f3616fb24db2b655a95
.git/objects/a0/57a320dcd595f3f0e0d250c3af4a5653596914
.git/objects/d6/f9d92349c768da1863b412674f25cd27d23cfb
.git/objects/e3/5d8850c9688b1ce82711694692cc574a799396
.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.git/objects/f8/cdc73cb2be06824f521837366ec95b73d55ef8
.git/objects/fa/eea606145667f54d220a0c17ffe8d22db07146
.git/objects/fd/b7176c429a73d5335e127b27d530b8aaa07c7d

We searched for anything of the type file with the option -type f

This is a lot of files! It’s more than we have in our working directory.

We can see that by looking at the working directory with ls

ls
_build/      _toc.yml  logo.png               markdown.md      references.bib
_config.yml  intro.md  markdown-notebooks.md  notebooks.ipynb  requirements.txt

And remember, _build is not being tracked. That means git knows nothing about it or about its content

This is a consequence of git taking snap shots and tracking both the actual contents of our working directory and our commit messages and other meta data about each commit.

12.8. Git Variables#

the program git does not run continously the entire time you are using it for a project. It runs quick commands each time you tell it to, it’s goal is to manage files, so this makes sense. This also means that important information that git needs is also saved in files.

We can see the files that it has by listing the directory:

ls .git
COMMIT_EDITMSG  HEAD    description  index  logs/     refs/
FETCH_HEAD      config  hooks/       info/  objects/

the files in all caps are like gits variables.

Lets look at the one called HEAD we have interacted with HEAD before when resolving merge conflicts.

cat .git/HEAD 
ref: refs/heads/main

HEAD is a pointer to the currently checked out branch. Do you remember where we see this pointer?

The other files with HEAD in their name are similarly pointers to other references, named corresponding to other things.

12.9. Git Objects#

There are 3 types:

  • blob objects: the content of your files (data)

  • tree objects: stores file names and groups files together (organization)

  • Commit Objects: stores information about the sha values of the snapshots

mermaid.live

classDiagram class tree{ List: - hash: blob - string: type - string:file name } class commit{ hash: parent hash: tree string: message string: author string: time } class blob{ binary: contents } class object{ hash: name } object <|-- blob object <|-- tree object <|-- commit

12.10. Examining git objects#

Which of the following commands we have seen so far is a plumbing command?

  • [ ] git commit

  • [ ] git push

  • [x] git cat-file

  • [ ] git pull

git cat-file -t 

Let’s do so in the inclass repo

cd ../gh-inclass-AymanBx/

Let’s remember what has been done in this repo

git log
commit 99c73d4f2c0c87c3aef7a61366e56e3894a040a5 (HEAD -> organization, origin/organization)
Author: AymanBx <ayman_sandouk@uri.edu>
Date:   Tue Feb 25 13:02:20 2025 -0500

    Organized all documentation files.

commit 93d4ae322683efa703dedbb9acb8431807a38afd
Author: AymanBx <ayman_sandouk@uri.edu>
Date:   Tue Feb 25 12:53:36 2025 -0500

    Added overview of files in readme.md

commit 4ccfb5fe36073d15eaa4271d5f1355c0d9e1635b
Author: AymanBx <ayman_sandouk@uri.edu>
Date:   Thu Feb 20 13:13:59 2025 -0500

    added age to readme

commit 581ac5ef92ba5b4a647f30fc91a9d405e6900007 (origin/main, origin/HEAD, main)
Merge: 162a47b 7b56104
Author: Ayman Sandouk <111829133+AymanBx@users.noreply.github.com>
Date:   Thu Feb 20 11:19:10 2025 -0500

    Merge pull request #3 from compsys-progtools/organizing_ac

    Feb 20 in class activity

commit 7b561041f188ea7c2fa4d66d32957f639c574bd5 (origin/organizing_ac)
Author: AymanBx <ayman_sandouk@uri.edu>
Date:   Thu Feb 20 11:01:56 2025 -0500

    add files for organizing activity

What branch are we on at the moment? We can tell from git status or from what we saw on git log But all these commands are really getting their info from the HEAD pointer, and we now know how to do so our selves:

cat .git/HEAD
ref: refs/heads/organization

Let’s try to trace what we can find in that file

cat .git/refs/heads/organization
99c73d4f2c0c87c3aef7a61366e56e3894a040a5

We see a hash (made of 32 characters) A hash in git can be a representation of multiple things. One of its uses is that it is used as a names of git objects that hold the different information of a git repository.

Let’s confirm that we can find that file

ls .git/objects/
0f/  23/  39/  4e/  56/  76/  83/  99/  a6/  ce/  e4/
16/  2d/  3a/  4f/  58/  7b/  8b/  9d/  a9/  d7/  f9/
1e/  33/  3c/  50/  63/  7f/  92/  9e/  ab/  d8/  info/
1f/  35/  4c/  54/  65/  80/  93/  a2/  b0/  e0/  pack/

The first two characters of the hash can be found here as a directory 99/

If we look at the contents of that folder

ls .git/objects/99
42c3c00dcde51d52a1c110413522043eca07cd
c73d4f2c0c87c3aef7a61366e56e3894a040a5

We see two files, one of them is c73d4f2c0c87c3aef7a61366e56e3894a040a5 which is the remainder of that hash

Can we read what’s in that file?

cat .git/objects/99/c73d4f2c0c87c3aef7a61366e56e3894a040a5
x▒▒mj1▒;▒xP▒▒D▒
               򒼴▒▒D▒    hO▒G迁a`B]▒▒A(▒▒▒"▒▒U>  2Ih▒▒▒▒!R:▒▒r8{`▒▒▒ ▒B▒B')▒▒"E▒▒Scٞ▒▒Z ▒▒={P▒P▒r}▒▒b▒u▒ϣ▒=▒q▒▒*▒▒ъ▒n▒▒▒`▒▒▒}a▒?▒b

Not very readable. But We’ve talked about how this is supposed to represent a status of the repository. It is supposed to represent a commit. We can confirm that by looking at the git log from earlier. The very first commit was represented by that same hash!

Let’s see if there’s another way to view the content of that file

git cat-file -t 99c73
commit

That’s the type -t!

git cat-file -p 99c73
tree 7fc74e8f774bf2e6f25652bbab56ee45f21703c0
parent 93d4ae322683efa703dedbb9acb8431807a38afd
author AymanBx <ayman_sandouk@uri.edu> 1740506540 -0500
committer AymanBx <ayman_sandouk@uri.edu> 1740506540 -0500

Organized all documentation files.

We see that the contetns is detailed information about the last commit, we see some of this in the log. But here we find all the information. We had to use git cat-file specifically to view the content, because if we view it in its raw shape we get a bunch of weird characters. That’s because git compressed this information in binary to save disk space. Making the way git saves its information more space efficient.

Cool! Let’s try one more thing. How can I find how the contents of a file is saved by git. Lets try to find a blob object. Let’s check out the other file that was in the 99/ directory

git cat-file -t 9942c
commit

Ok, we found another commit. Let’s view it

git cat-file -p 9942c
tree a2ea9badc19dc1d94af5957a8f6f128a3b316f49
parent 8bd4ea38fe31186b9e5d0c1e19c9aef748c6dce6
author Ayman Sandouk <111829133+AymanBx@users.noreply.github.com> 1739298243 -0500
committer GitHub <noreply@github.com> 1739298243 -0500
gpgsig -----BEGIN PGP SIGNATURE-----

 wsFcBAABCAAQBQJnq5XECRC1aQ7uu5UhlAAARUEQAB+Dnsi+XCvn+Vnb202qnpST
 E8WlCMlKStERUEjHItTWHv+mY9YODKi66uBTDNvtJVZFr6DLJ2Xyr+Z5ee1T3nEE
 FwA4xDjh3uegfJnleL2n/ebwlmQ+8xhtlzfTvPg3rHHZwOB9kKQC1QbTiYO/FmMJ
 yhu3KNS/7piXPjp7C8dYAI5sAyGUHBH/uGlVkvIkRmfuK8UgkLxQjlk3By5zHNLx
 /4SlPB8NTf0qFMdag6DszlTn994FVKNPfbLikU433CeUw0yEUYUUZ0fil6f8nHnQ
 4pju9SIbhMWwUcVRVaiA85XWW4EezqloaoMeZnBKgF/TekpzsDuPAIv7mRCa8Sbc
 gk6NpHldfG9IOBYzFE7jSNa/JVT+em8IQq42pkXBB7Mu7ndKIovLAOSHLawquezd
 thvwOf01TvaHkVhH9Cd7Noq9kjf/qKq9sHJXRKUNNlhPM6FiNV9MWmaAGwgHvljg
 HjrN/9ANxpakpW6tVGWkLCyCi+Ip9o22MSUYOYntI9cfKvEmeUYYv0zvMvJoIndZ
 9krQr2zsfMHryG8U536+esEWQl7L9SFYZJljDakrRRY58lISXt4Un+D2zeMYNUIS
 CY09UaZ5CIMHT0znKgWjaR+pQGKM4960QxhFN6s22B5+6InzQuWbetIOlCWXu/pa
 fcqFgLJibxp0Zup3wABx
 =tYyc
 -----END PGP SIGNATURE-----


Update about.md

Looks like we updated the about.md file in this commit. Let’s keep tracing. This commit is made of the tree a2ea9badc19dc1d94af5957a8f6f128a3b316f49. Let’s check it out

git cat-file -p a2ea9
040000 tree 263fb9d22090e88edd2bf1847c24c3511de91b49    .github
100644 blob 920e41a47826309b0bca92645e1ce0c734830ed0    about.md

Cool, we found that blob that represents the about.md at that particular commit. Let’s keep digging

git cat-file -t 920e4
blob
git cat-file -p 920e4
Second semster Masters

Expected graduation May 2026
- I graduated highschool abroad

Hmm interesting. I remember about.md being different than that. wait,

cat about.md
Second semster Masters

Expected graduation May 2026
- I got my BS in 2023
- I graduated highschool abroad

Yeah, that’s write, we had added another fun fact that time we got a merge conflict.

So that confirms it. We were just able to find an old version of about.md from an old commit. This is how git stores all the information that tracks all the changes that occurred.

One final thing, what happens if we try to view the contents of that hash without using git cat-file?

cat .git/objects/92/0e41a47826309b0bca92645e1ce0c734830ed0
xK▒▒OR▒0NM▒▒KQ(N▒-.I-R▒MQ▒\\▒▒▒%▒)
▒E▒)▒▒%▒▒y@▒J##3.]O▒8PEFfzFqrF~~▒BbRQ~b
▒6▒

Same as earlier, we just get a bunch of weird characters because of how git compresses the information into a binary.

12.11. Prepare for Next Class#

  1. Take a few minutes to think what you know about hashing and numbers. Create hash_num_prep.md with two sections: ## Hashing with a few bullet points summarzing key points about hashing, and ## Numbers with what types of number representations you know.

  2. Review notes from How do git branches work. Focus on resolving merge conflict in preperation to next lab.

  3. Review notes from What is a commit & What is git (Both notes should be fixed by 3/10) to be fully prepared for this class after a nice break. Bring questions if you come up with any. You may qualify for a community badge if you post/contribute in a discussion thread related to the concepts mentioned in said classes if your ask meaningful questions that your classmates feel intrigued to discuss.

12.12. Experience Report Evidence#

Append the contents of one of your trees or commits and one blob or tree inside of that first one to the bottom of your experience report.

12.13. Badges#

  1. Read about different workflows in git and describe which one you prefer to work with and why in favorite_git_workflow.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs

  2. Update your kwl chart with what you have learned or new questions in the want to know column

  3. Separate from what you added from the previous step. Add to your kwl table the following rows and fill them out. git branches, merge conflicts, commits. Try to include an older understanding about them in the know column and a newer understanding in the learned column

  4. In commit_contents.md, redirect the content of your most recent commit, its tree, and the contents of one blob. Edit the file or use echo to put markdown headings between the different objects. Add a title # Complete Commit to the file and at the bottom of the file add ## Reflection subheading with some notes on how, if at all this excercise helps you understand what a commit is.

git log to see most recent commit git cat-file that hash git cat-file the tree has git cat-file hash of those files

  1. Read about different workflows in git and add responses to the below in a workflows.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs

  2. Update your kwl chart with what you have learned or new questions in the want to know column

  3. Separate from what you added from the previous step. Add to your kwl table the following rows and fill them out. git branches, merge conflicts, commits. Try to include an older understanding about them in the know column and a newer understanding in the learned column

  4. Add the hash of the content of your completed workflows.md file and put that in the comment of your badge PR for this badge. Try to do this from your local CLI, but full credit even if you use the website interface

## Workflow Reflection

1. Why is it important that git can be used with different workflows?
1. Which workflow do you think you would like to work with best and why?
1. Describe a scenario that might make it better for the whole team to use a workflow other than the one you prefer.  

12.14. Questions#