12. What is git?#
Last class we created a local repo, then created an empty repo on GitHub. Linked both repos and attmpted to push local changes on the “empty” repo but weren’t successful
The reason for that is the way the new GitHub repo was created. When we used GitHub Classroom to fork my template tiny-book repo as your new empty repo GitHub made two commits. Making it not an empty repo.
cd Documents/systems/tiny-book
git push -u origin main
To https://github.com/compsys-progtools/tiny-book-AymanBx
! [rejected] main -> main (non-fast-forward)
error: failed to push some refs to 'https://github.com/compsys-progtools/tiny-book-AymanBx'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. If you want to integrate the remote changes,
hint: use 'git pull' before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
Let’s check it out
gh repo view --web
Let’s compare that to what we have locally
git log
commit 1da3fa4d2b1b14e3a92358455c2320697af43867 (HEAD -> main)
Author: AymanBx <ayman_sandouk@uri.edu>
Date: Tue Mar 4 13:42:31 2025 -0500
jupyter book template
When we attempted to pull
we weren’t fully successful because we failed to link the two mains because the push
command failed.
git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 5 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Unpacking objects: 100% (5/5), 1.70 KiB | 102.00 KiB/s, done.
From https://github.com/compsys-progtools/tiny-book-AymanBx
* [new branch] feedback -> origin/feedback
* [new branch] main -> origin/main
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.
git pull <remote> <branch>
If you wish to set tracking information for this branch you can do so with:
git branch --set-upstream-to=origin/<branch> main
We should follow git instructions
git branch --set-upstream-to=origin/main main
branch 'main' set up to track 'origin/main'.
Now we should be able to pull successfully
git pull
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint: git config pull.rebase false # merge
hint: git config pull.rebase true # rebase
hint: git config pull.ff only # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.
Git recognized the different commit history as in the repos as a conflict
To resolve this type of conflict (main had unrelated changes on it that are unknown by my branch)
We rebase
rebase
is updating my current branch with new commits that occurred on main (or any branch I want to rebase with)
git pull --rebase
Successfully rebased and updated refs/heads/main.
What did that do?
git log
Author: AymanBx <ayman_sandouk@uri.edu>
Date: Tue Mar 4 13:42:31 2025 -0500
jupyter book template
commit d781535217b324d9cb2ce6cb45ae565b54ee786f (origin/main)
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date: Tue Oct 8 16:54:12 2024 +0000
Setting up GitHub Classroom Feedback
commit 72bcbb8cbd2769d21aad3c23c8fbe477d0260ced (origin/feedback)
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date: Tue Oct 8 16:54:12 2024 +0000
GitHub Classroom Feedback
Now we should be able to push
git push
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 8 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 16.33 KiB | 8.17 MiB/s, done.
Total 12 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/compsys-progtools/tiny-book-aymanbx.git
d781535..60bd457 main -> main
A couple of weeks ago, we learned about what a commit is and then we took a break from how git works, to talk more about unix philosophy and how developers communicate about code
Today we will learn what git is more formally.
study tip
We will go in and out of topics at times, in order to provides what is called spaced repetition, repeating material or key concepts with breaks in between.
Using git correctly is a really important goal of this course because git is an opportunity for you to demonstrate a wide range of both practical and conceptual understanding.
So, I have elected to interleave other topics with git to give core git ideas some time to simmer and give you time to practice them before we build on them with more depth at git.
Also, we are both learning git and using git as a motivating example of other key important topics.
12.1. Why so much git?#
Today, we are going to learn what git is and later we will learn more details of how it is implemented.
Remember we are spending so much time with git for two reasons:
it is an important developer tool
it demonstrates important conceptual ideas that occur in other areas of CS
git book is the official reference on git.
this includes other spoken languages as well if that is helpful for you.
12.2. git definition#
From here, we have the full definition of git
We do not start from that point, because these documents were written for target audience of working developers who are familar with other, old version control systems and learning an additional one.
Have you used another version control system before?
Most of you, however, have probably not used another version control system.
Let’s break down the definition
12.3. Git is a File system#
Content-addressable filesystem means a key-value data store.
What are some examples of key-value pairs that you have seen in computer science broadly, and in this course specficially, so far?
python dictionaries
pointers (address,content)
parameter, passed values
yaml files
some examples of key-value pairs that you have seen in computer science broadly, and in this course specficially
python dictionaries
pointers (address,content)
parameter, passed values
yaml files
What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.
12.4. Git is a Version Control System#
In the before times
git stores snapshots of your work each time you commit.
What unit of git is how it represents a snapshot?
[ ] branch
[ ] blob
[x] commit
[ ] tag
it uses 3 stages:
These three stages are the in relation to your working directory, and potentially remotes.
So in broader context, the git visual cheatsheet is a more complete picture and has commands overlayed with the concept.
12.5. Git has two sets of commands#
Porcelain: the user friendly VCS
Plumbing: the internal workings- a toolkit for a VCS
Which of the following commands are porcelain commands?
git commit
git cat-file
git add
git status
git hash-object
Which of the following commands are porcelain commands?
git commit
git cat-file
git add
git status
git hash-object
We have so far used git as a version control system. A version control system, in general, will have operations like commit, push, pull, clone. These may work differently under the hood or be called different things, but those are what something needs to have in order to keep track of different versions.
The plumbing commands reveal the way that git performs version control operations. This means, they implement the git file system operations for the git version control system.
You can think of the plumbing vs porcelain commands like public/private methods. As a user, you only need the public methods (porcelain commands) but those use the private ones to get things done (plumbing commands). We will use the plumbing commands over the next few classes to examine what git really does when we call the porcelain commands that we will typically use.
Example?
12.6. Git is distributed#
What does that mean?
Git runs locally. It can run in many places, and has commands to help sync across remotes, but git does not require one copy of the repository to be the “official” copy and the others to be subordinate. git just sees repositories.
For human reasons, we like to have one “official” copy and treat the others as other copies, but that is a social choice, not a technological requirement of git. Even though we will typically use it with an offical copy and other copies, having a tool that does not care, makes the tool more flexible and allows us to create workflows, or networks of copies that have any relationship we want.
It’s about the workflows, or the ways we socially use the tool.
12.6.1. Subversion Workflow#
subversion is an older VCS
12.6.2. Centralized Manager#
12.6.3. dictator and lieutenants#
This is a variant of a multiple-repository workflow. It’s generally used by huge projects with hundreds of collaborators; one famous example is the Linux kernel. Various integration managers are in charge of certain parts of the repository; they’re called lieutenants. All the lieutenants have one integration manager known as the benevolent dictator. The benevolent dictator pushes from their directory to a reference repository from which all the collaborators need to pull.
12.7. How does git do all these things?#
We can use the bash command find
to search the file system. Note that this does not search the contents of the files, just the names.
find objects/ -type f
.git/objects/06/d56f40c838b64eb048a63e036125964a069a3a
.git/objects/0e/2e3b27f61b5908c4bb75a1ca680ee4053aa992
.git/objects/1d/a3fa4d2b1b14e3a92358455c2320697af43867
.git/objects/29/a422c19251aeaeb907175e9b3219a9bed6c616
.git/objects/2b/d9785b546aa1af7d6e41a48d33a0af811082dd
.git/objects/5f/534f8051f6a94d40e57e58242ef0113fae4fd1
.git/objects/6e/b15166db3ad944529be060af334deb2c022bbd
.git/objects/74/d5c7101ed8c8c1a6f87e31debd9445df1f0e71
.git/objects/78/3ec6aa5afe2f0a66087d01a112f543e1ed287e
.git/objects/7e/821e45db31376729c73f3616fb24db2b655a95
.git/objects/a0/57a320dcd595f3f0e0d250c3af4a5653596914
.git/objects/d6/f9d92349c768da1863b412674f25cd27d23cfb
.git/objects/e3/5d8850c9688b1ce82711694692cc574a799396
.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.git/objects/f8/cdc73cb2be06824f521837366ec95b73d55ef8
.git/objects/fa/eea606145667f54d220a0c17ffe8d22db07146
.git/objects/fd/b7176c429a73d5335e127b27d530b8aaa07c7d
We searched for anything of the type file
with the option -type f
This is a lot of files! It’s more than we have in our working directory.
We can see that by looking at the working directory with ls
ls
_build/ _toc.yml logo.png markdown.md references.bib
_config.yml intro.md markdown-notebooks.md notebooks.ipynb requirements.txt
And remember, _build
is not being tracked. That means git knows nothing about it or about its content
This is a consequence of git taking snap shots and tracking both the actual contents of our working directory and our commit messages and other meta data about each commit.
12.8. Git Variables#
the program git
does not run continously the entire time you are using it for a project. It runs quick commands each time you tell it to, it’s goal is to manage files, so this makes sense. This also means that important information that git
needs is also saved in files.
We can see the files that it has by listing the directory:
ls .git
COMMIT_EDITMSG HEAD description index logs/ refs/
FETCH_HEAD config hooks/ info/ objects/
the files in all caps are like gits variables.
Lets look at the one called HEAD
we have interacted with HEAD
before when resolving merge conflicts.
cat .git/HEAD
ref: refs/heads/main
HEAD
is a pointer to the currently checked out branch. Do you remember where we see this pointer?
The other files with HEAD
in their name are similarly pointers to other references, named corresponding to other things.
12.9. Git Objects#
There are 3 types:
blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots
12.10. Examining git objects#
Which of the following commands we have seen so far is a plumbing command?
[ ] git commit
[ ] git push
[x] git cat-file
[ ] git pull
git cat-file -t
Let’s do so in the inclass repo
cd ../gh-inclass-AymanBx/
Let’s remember what has been done in this repo
git log
commit 99c73d4f2c0c87c3aef7a61366e56e3894a040a5 (HEAD -> organization, origin/organization)
Author: AymanBx <ayman_sandouk@uri.edu>
Date: Tue Feb 25 13:02:20 2025 -0500
Organized all documentation files.
commit 93d4ae322683efa703dedbb9acb8431807a38afd
Author: AymanBx <ayman_sandouk@uri.edu>
Date: Tue Feb 25 12:53:36 2025 -0500
Added overview of files in readme.md
commit 4ccfb5fe36073d15eaa4271d5f1355c0d9e1635b
Author: AymanBx <ayman_sandouk@uri.edu>
Date: Thu Feb 20 13:13:59 2025 -0500
added age to readme
commit 581ac5ef92ba5b4a647f30fc91a9d405e6900007 (origin/main, origin/HEAD, main)
Merge: 162a47b 7b56104
Author: Ayman Sandouk <111829133+AymanBx@users.noreply.github.com>
Date: Thu Feb 20 11:19:10 2025 -0500
Merge pull request #3 from compsys-progtools/organizing_ac
Feb 20 in class activity
commit 7b561041f188ea7c2fa4d66d32957f639c574bd5 (origin/organizing_ac)
Author: AymanBx <ayman_sandouk@uri.edu>
Date: Thu Feb 20 11:01:56 2025 -0500
add files for organizing activity
What branch are we on at the moment?
We can tell from git status
or from what we saw on git log
But all these commands are really getting their info from the HEAD pointer, and we now know how to do so our selves:
cat .git/HEAD
ref: refs/heads/organization
Let’s try to trace what we can find in that file
cat .git/refs/heads/organization
99c73d4f2c0c87c3aef7a61366e56e3894a040a5
We see a hash (made of 32 characters) A hash in git can be a representation of multiple things. One of its uses is that it is used as a names of git objects that hold the different information of a git repository.
Let’s confirm that we can find that file
ls .git/objects/
0f/ 23/ 39/ 4e/ 56/ 76/ 83/ 99/ a6/ ce/ e4/
16/ 2d/ 3a/ 4f/ 58/ 7b/ 8b/ 9d/ a9/ d7/ f9/
1e/ 33/ 3c/ 50/ 63/ 7f/ 92/ 9e/ ab/ d8/ info/
1f/ 35/ 4c/ 54/ 65/ 80/ 93/ a2/ b0/ e0/ pack/
The first two characters of the hash can be found here as a directory 99/
If we look at the contents of that folder
ls .git/objects/99
42c3c00dcde51d52a1c110413522043eca07cd
c73d4f2c0c87c3aef7a61366e56e3894a040a5
We see two files, one of them is c73d4f2c0c87c3aef7a61366e56e3894a040a5
which is the remainder of that hash
Can we read what’s in that file?
cat .git/objects/99/c73d4f2c0c87c3aef7a61366e56e3894a040a5
x▒▒mj1▒;▒xP▒▒D▒
▒▒D▒ hO▒G迁a`B]▒▒A(▒▒▒"▒▒U> 2Ih▒▒▒▒!R:▒▒r8{`▒▒▒ ▒B▒B')▒▒"E▒▒Scٞ▒▒Z ▒▒={P▒P▒r}▒▒b▒u▒ϣ▒=▒q▒▒*▒▒ъ▒n▒▒▒`▒▒▒}a▒?▒b
Not very readable. But We’ve talked about how this is supposed to represent a status of the repository. It is supposed to represent a commit. We can confirm that by looking at the git log
from earlier. The very first commit was represented by that same hash!
Let’s see if there’s another way to view the content of that file
git cat-file -t 99c73
commit
That’s the type -t
!
git cat-file -p 99c73
tree 7fc74e8f774bf2e6f25652bbab56ee45f21703c0
parent 93d4ae322683efa703dedbb9acb8431807a38afd
author AymanBx <ayman_sandouk@uri.edu> 1740506540 -0500
committer AymanBx <ayman_sandouk@uri.edu> 1740506540 -0500
Organized all documentation files.
We see that the contetns is detailed information about the last commit, we see some of this in the log. But here we find all the information.
We had to use git cat-file
specifically to view the content, because if we view it in its raw shape we get a bunch of weird characters. That’s because git compressed this information in binary to save disk space. Making the way git saves its information more space efficient.
Cool!
Let’s try one more thing. How can I find how the contents of a file is saved by git.
Lets try to find a blob
object.
Let’s check out the other file that was in the 99/
directory
git cat-file -t 9942c
commit
Ok, we found another commit. Let’s view it
git cat-file -p 9942c
tree a2ea9badc19dc1d94af5957a8f6f128a3b316f49
parent 8bd4ea38fe31186b9e5d0c1e19c9aef748c6dce6
author Ayman Sandouk <111829133+AymanBx@users.noreply.github.com> 1739298243 -0500
committer GitHub <noreply@github.com> 1739298243 -0500
gpgsig -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJnq5XECRC1aQ7uu5UhlAAARUEQAB+Dnsi+XCvn+Vnb202qnpST
E8WlCMlKStERUEjHItTWHv+mY9YODKi66uBTDNvtJVZFr6DLJ2Xyr+Z5ee1T3nEE
FwA4xDjh3uegfJnleL2n/ebwlmQ+8xhtlzfTvPg3rHHZwOB9kKQC1QbTiYO/FmMJ
yhu3KNS/7piXPjp7C8dYAI5sAyGUHBH/uGlVkvIkRmfuK8UgkLxQjlk3By5zHNLx
/4SlPB8NTf0qFMdag6DszlTn994FVKNPfbLikU433CeUw0yEUYUUZ0fil6f8nHnQ
4pju9SIbhMWwUcVRVaiA85XWW4EezqloaoMeZnBKgF/TekpzsDuPAIv7mRCa8Sbc
gk6NpHldfG9IOBYzFE7jSNa/JVT+em8IQq42pkXBB7Mu7ndKIovLAOSHLawquezd
thvwOf01TvaHkVhH9Cd7Noq9kjf/qKq9sHJXRKUNNlhPM6FiNV9MWmaAGwgHvljg
HjrN/9ANxpakpW6tVGWkLCyCi+Ip9o22MSUYOYntI9cfKvEmeUYYv0zvMvJoIndZ
9krQr2zsfMHryG8U536+esEWQl7L9SFYZJljDakrRRY58lISXt4Un+D2zeMYNUIS
CY09UaZ5CIMHT0znKgWjaR+pQGKM4960QxhFN6s22B5+6InzQuWbetIOlCWXu/pa
fcqFgLJibxp0Zup3wABx
=tYyc
-----END PGP SIGNATURE-----
Update about.md
Looks like we updated the about.md file in this commit. Let’s keep tracing. This commit is made of the tree a2ea9badc19dc1d94af5957a8f6f128a3b316f49
. Let’s check it out
git cat-file -p a2ea9
040000 tree 263fb9d22090e88edd2bf1847c24c3511de91b49 .github
100644 blob 920e41a47826309b0bca92645e1ce0c734830ed0 about.md
Cool, we found that blob that represents the about.md at that particular commit. Let’s keep digging
git cat-file -t 920e4
blob
git cat-file -p 920e4
Second semster Masters
Expected graduation May 2026
- I graduated highschool abroad
Hmm interesting. I remember about.md being different than that. wait,
cat about.md
Second semster Masters
Expected graduation May 2026
- I got my BS in 2023
- I graduated highschool abroad
Yeah, that’s write, we had added another fun fact that time we got a merge conflict.
So that confirms it. We were just able to find an old version of about.md from an old commit. This is how git stores all the information that tracks all the changes that occurred.
One final thing, what happens if we try to view the contents of that hash without using git cat-file
?
cat .git/objects/92/0e41a47826309b0bca92645e1ce0c734830ed0
xK▒▒OR▒0NM▒▒KQ(N▒-.I-R▒MQ▒\\▒▒▒%▒)
▒E▒)▒▒%▒▒y@▒J##3.]O▒8PEFfzFqrF~~▒BbRQ~b
▒6▒
Same as earlier, we just get a bunch of weird characters because of how git compresses the information into a binary.
12.11. Prepare for Next Class#
Take a few minutes to think what you know about hashing and numbers. Create hash_num_prep.md with two sections:
## Hashing
with a few bullet points summarzing key points about hashing, and## Numbers
with what types of number representations you know.Review notes from How do git branches work. Focus on resolving merge conflict in preperation to next lab.
Review notes from What is a commit & What is git (Both notes should be fixed by 3/10) to be fully prepared for this class after a nice break. Bring questions if you come up with any. You may qualify for a community badge if you post/contribute in a discussion thread related to the concepts mentioned in said classes if your ask meaningful questions that your classmates feel intrigued to discuss.
12.12. Experience Report Evidence#
Append the contents of one of your trees or commits and one blob or tree inside of that first one to the bottom of your experience report.
12.13. Badges#
Read about different workflows in git and describe which one you prefer to work with and why in favorite_git_workflow.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs
Update your kwl chart with what you have learned or new questions in the want to know column
Separate from what you added from the previous step. Add to your kwl table the following rows and fill them out.
git branches
,merge conflicts
,commits
. Try to include an older understanding about them in theknow
column and a newer understanding in thelearned
columnIn commit_contents.md, redirect the content of your most recent commit, its tree, and the contents of one blob. Edit the file or use
echo
to put markdown headings between the different objects. Add a title# Complete Commit
to the file and at the bottom of the file add## Reflection
subheading with some notes on how, if at all this excercise helps you understand what a commit is.
git log
to see most recent commitgit cat-file
that hashgit cat-file
the tree hasgit cat-file
hash of those files
Read about different workflows in git and add responses to the below in a workflows.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs
Update your kwl chart with what you have learned or new questions in the want to know column
Separate from what you added from the previous step. Add to your kwl table the following rows and fill them out.
git branches
,merge conflicts
,commits
. Try to include an older understanding about them in theknow
column and a newer understanding in thelearned
columnAdd the hash of the content of your completed workflows.md file and put that in the comment of your badge PR for this badge. Try to do this from your local CLI, but full credit even if you use the website interface
## Workflow Reflection
1. Why is it important that git can be used with different workflows?
1. Which workflow do you think you would like to work with best and why?
1. Describe a scenario that might make it better for the whole team to use a workflow other than the one you prefer.