10. What is git?#

10.1. Admin#

Important

These notes are a bit different than the ones I made in class

  • Please start commenting/expressing interest on build/explore ideas. Next lab will be focused on that.

  • I will be merging the courseutils update soon, so you can see the docs more easily

  • creating a codetour will be a part of many builds

  • creating documentation websites is also a part of many builds

10.1.1. Support for build and explore proposals#

10.1.2. Build and Explore ideas#

10.1.3. get credit for this class toward your major#

The department has decided that this course is acceptable for the 300+ level elective degree requirement in the CS major, but it requires that you complete a form called a “Curriculum Modification”.

I added a template form from Professor DiPippo (CS Undergraduate Chair) to the course website repo in the img folder, it is a pdf.

Complete this form and send it to Professor DiPippo for her signature to make this course count toward your major requirements.

the form indicates a different email address to send it to, but that is after Professor DiPippo signs the form, do not send it directly to the Dean’s office (casforms)

10.2. Why define git now?#

Two weeks ago learned about what a commit is and then we took a break from how git works, to talk more about how developers communicate about code, programming languages, and the unix philosophy

study tip

We will go in an out of topics at times, in order to provides what is called spaced repetition repeating material or key concepts with breaks.

Using git correctly is a really important goal of this course because git is an opportunity for you to demonstrate a wide range of both practical and conceptual understanding.

So, I have elected to interleave other topics with git to give core git ideas some time to simmer and give you time to practice them before we build on them with more depth at git.

Also, we are both learning git and using git as a motivating example of other key important topics.

Today, we are going to learn what git is and what its parts are. Next we will see in greater detail how it works.

git book is the official reference on git.

this includes other spoken languages as well if that is helpful for you.

10.3. Git Definition#

From here, we have the full definition of git

git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it.

We do not start from that point, because these documents were written for target audience of working developers who are familar with other, old version control systems and learning an additional one.

Most of you, however, have probably not used another version control system.

Let’s break down the definition

10.3.1. Git is a File system#

Content-addressable filesystem means a key-value data store.

some examples of key-value pairs that you have seen in computer science broadly, and in this course specficially

  • python dictionaries

  • pointers (address,content)

  • parameter, passed values

  • yaml files

What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.

10.3.2. Git is a Version Control System#

In the before times

PhD comics final.doc comic

git stores snapshots of your work each time you commit.

snapshot of 5 versions of 3 files

it uses 3 stages:

3 stages in git

These three stages are the in relation to your working directory, and potentially remotes.

So in broader context, the git visual cheatsheet is a more complete picture and has commands overlayed with the concept.

10.4. Git has two sets of commands#

  • Porcelain: the user friendly VCS

  • Plumbing: the internal workings- a toolkit for a VCS

We have so far used git as a version control system. A version control system, in general, will have operations like commit, push, pull, clone. These may work differently under the hood or be called different things, but those are what something needs to have in order to keep track of different versions.

The plumbing commands reveal the way that git performs version control operations. This means, they implement file system operations for the git version control system.

You can think of the plumbing vs porcelain commands like public/private methods. As a user, you only need the public methods (porcelain commands) but those use the private ones to get things done (plumbing commands). We will use the plumbing commands over the next few classes to examine what git really does when we call the porcelain commands that we will typically use.

10.5. Git is distributed#

Git runs locally. It can run in many places, and has commands to help sync across remotes, but git does not require one copy of the repository to be the “official” copy and the others to be subordinate. git just sees repositories.

For human reasons, we like to have one “official” copy and treat the others as local copies, but that is a social choice, not a technological requirement of git. Even though we will typically use it with an offical copy and other copies, having a tool tht does not care, makes the tool more flexible and allows us to create workflows, or networks of copies that have any relationship we want.

It’s about the workflows, or the ways we socially use the tool.

10.5.1. Subversion WOrkflow#

subversion workflow

10.5.2. Integration Manager#

integration manager workflow

10.5.3. dictator and lieutenants#

dictator and lieutenants workflow

10.6. How does git do all these things?#

Let’s look at git again in our github-inclass repo

cd Documents/inclass/systems/gh-inclass-sp24-brownsarahm/

We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.

find .git/objects/ -type f
.git/objects//0d/6ef1d3db0729088d515b35b588f39af5b770fd
.git/objects//0c/67356c8bb0c86efa2c42cb9db9837312559020
.git/objects//9b/87264e80fd24198ff8df6d071eaccc6a913b20
.git/objects//9e/ce5efa25710c8fad7d9f210928785b5362b06f
.git/objects//9e/a743825de328435ce38670898e4109619a4d15
.git/objects//3c/7ace1d193b78129c10d61609cbe97b2686f202
.git/objects//67/1765c67072a10ea8fc89704a8f66d504fcf80f
.git/objects//93/c08483f44ebdf5ce10e6c0002e641aa0cc8844
.git/objects//05/1781ee70ce3f052d0a233d2efc5a34f921877e
.git/objects//9d/6ffa6ded47d8b6df13ed60e482b188015ee499
.git/objects//d8/7bf4a5641e0429fd3c371bd2b19d755105ca92
.git/objects//ab/e403c82238c3877b3437c1ab3e32499a60e24f
.git/objects//fc/a59e8cca05bb0861f9348a40fe8300b3d55637
.git/objects//f5/b2c8dd1d307b1114d70a7e5fe7dc37cc429d35
.git/objects//e3/b192aa0cd490226e8adcd81d3d0b95adb5676b
.git/objects//c8/dc2d1d10e6c27cad4f3be278016f3cf90ac1ca
.git/objects//ed/7b329f735e91c76a22236dc728034914ec29ee
.git/objects//pack/pack-ca5f27a0787f6fe7c793b73ecd54664cc2f6af14.rev
.git/objects//pack/pack-ca5f27a0787f6fe7c793b73ecd54664cc2f6af14.idx
.git/objects//pack/pack-ca5f27a0787f6fe7c793b73ecd54664cc2f6af14.pack
.git/objects//29/ffc88519103085ed3a2ab01cffb3c99d70fc6a
.git/objects//42/f16e84e01cc75ddf80fd19b150de5818db596c
.git/objects//1f/6ef3acccb91bc0c33661854903ec5e7d666fb5
.git/objects//74/3db376fa76bb3611cfac6935938d179330c7eb
.git/objects//7e/78493266aeef733137c5193cc0a0e465e4093d
.git/objects//26/0c9c309922970f80bfa2c93cc23bcfbb962740
.git/objects//81/c6f187f146caaaf43d97bc1bb8ed237142f4c3
.git/objects//86/c90cf823148f4a3b53b04ab5498605cbf6ccb1
.git/objects//9f/399466ad6a1ad572e104209f4469eb8cd48516
.git/objects//6b/eb67c7f42a4829d5b0512a19efefe8de7f9eec
.git/objects//07/897fdccf1f5e214e33ab58bc755be90edbc7e2
.git/objects//07/959c0c04e0d4a4b2df24ab5bd08b17e3e58938
.git/objects//00/4e5366de9503d7e3fe263171a7bcb4e0af88eb
.git/objects//3a/4533a3abbc749f5e1905b30eb187a7350ae71a
.git/objects//5b/03a3e101e049f7a0184c6b0a1adc4787056af0
.git/objects//39/d5708913a6c708d1a505cde6da544785c086a6
.git/objects//63/063aaa7d1dad782e0349d897fcf33ff16a7e5d
.git/objects//d3/980efccf4856f0c61a6a16ed40be534c5230a5
.git/objects//d5/c7c4543f48ca7fb5cf75881fd0dbe21d75f51c
.git/objects//aa/02c633322970a4095e564aeb8fd96a3d4f31ab
.git/objects//b0/20e21272ba05cff974b1f789bdf5cfd2f5d355
.git/objects//f9/e70e5b8173525188a6b10ce5979972de4e0d9f
.git/objects//e0/f340c90f46a0103ee3dac9cfdc5520e4848731
.git/objects//79/8561538cfdec1f1da1eb1271f453fc06a40ca4
.git/objects//79/4a48e2f15fd84d1c60e2cf38ce0b4497e3f2ad
.git/objects//2d/232a2231c650dc4094606797fe0bd3e0ce4c65
.git/objects//1e/2a45fbca5ce7bf775827f5f4dbe23b6561cff4
.git/objects//1e/97f0d64fc62f40bbeb0eba19fa696fe47b7ccc
.git/objects//8c/3cc97ca6446c270ca0b8f7d4ce640a6e81e468
.git/objects//76/2f01b5cf84f39096d55ca95e46f0519d8cae48
.git/objects//2b/95364a88284fac03923db4408737c65d842465
.git/objects//7a/1650f9ccb43520716b69be9da7142beeade018
.git/objects//8e/c01b21b8d76092341a9b91e067f3526c3d45db

This is a lot of files! It’s more than we have in our working directory.

We can see that by looking at the working directory with ls

ls
API.md		README.md	example.md	scratch.ipynb
CONTRIBUTING.md	about.md	my_sectrest	setup.py
LICENSE.md	docs		pkg		tests

This is a consequence of git taking snap shots and tracking both the actual contents of our working directory and our commit messages and other meta data about each commit.

10.6.1. Git Objects#

There are 3 types:

  • blob objects: the content of your files (data)

  • tree objects: stores file names and groups files together (organization)

  • Commit Objects: stores information about the sha values of the snapshots

classDiagram class tree{ List: - hash: blob - string: type - string:file name } class commit{ hash: parent hash: tree string: message string: author string: time } class blob{ binary: contents } class object{ hash: name } object <|-- blob object <|-- tree object <|-- commit

Let’s inspect the last item from the list of objects above,

Mine was:

.git/objects//8e/c01b21b8d76092341a9b91e067f3526c3d45db

so the minimum 4 characters that I can use to check what type it is 83c0.

git cat-file -t 8ec0
tree

Since mine is a tree, if I look at the content

git cat-file -p 8ec0 
040000 tree 95b60ce8cdec1bc4e1df1416e0c0e6ecbd3e7a8c	.github
100644 blob 3a4533a3abbc749f5e1905b30eb187a7350ae71a	API.md
100644 blob 9ece5efa25710c8fad7d9f210928785b5362b06f	CONTRIBUTING.md
100644 blob 2d232a2231c650dc4094606797fe0bd3e0ce4c65	LICENSE.md
100644 blob c8dc2d1d10e6c27cad4f3be278016f3cf90ac1ca	README.md
100644 blob 1e97f0d64fc62f40bbeb0eba19fa696fe47b7ccc	_config.yml
100644 blob e0f340c90f46a0103ee3dac9cfdc5520e4848731	_toc.yml
100644 blob 9ea743825de328435ce38670898e4109619a4d15	about.md
100644 blob 9d6ffa6ded47d8b6df13ed60e482b188015ee499	abstract_base_class.py
100644 blob 762f01b5cf84f39096d55ca95e46f0519d8cae48	alternative_classes.py
100644 blob 93c08483f44ebdf5ce10e6c0002e641aa0cc8844	example.md
100644 blob f9e70e5b8173525188a6b10ce5979972de4e0d9f	helper_functions.py
100644 blob 762f01b5cf84f39096d55ca95e46f0519d8cae48	important_classes.py
100644 blob 3c7ace1d193b78129c10d61609cbe97b2686f202	philosophy.md
100644 blob d87bf4a5641e0429fd3c371bd2b19d755105ca92	scratch.ipynb
100644 blob 39d5708913a6c708d1a505cde6da544785c086a6	setup.py
100644 blob 1f6ef3acccb91bc0c33661854903ec5e7d666fb5	test_abc.py
100644 blob 63063aaa7d1dad782e0349d897fcf33ff16a7e5d	tests_alt.py
100644 blob abe403c82238c3877b3437c1ab3e32499a60e24f	tests_helpers.py
100644 blob b020e21272ba05cff974b1f789bdf5cfd2f5d355	tests_imp.py

I see the tuples: type, hash of content, file name of the file for each item.

10.7. Git Variables#

the program git does not run continously the entire time you are using it for a project. It runs quick commands each time you tell it to, it’s goal is to manage files, so this makes sense. This also means that important information that git needs is also saved in files.

We can see the files that it has by listing the directory:

ls .git
COMMIT_EDITMSG	REBASE_HEAD	index		packed-refs
FETCH_HEAD	config		info		refs
HEAD		description	logs
ORIG_HEAD	hooks		objects

the files in all caps are like gits variables.

Lets look at the one called HEAD we have intereacted with HEAD before when resolving merge conflicts.

cat .git/HEAD 
ref: refs/heads/organization

HEAD is a pointer to the currently checked out branch.

The other files with HEAD in their name are similarly pointers to other references, named corresponding to other things.

Important

We did some exercises at this point in class, you can revieew the questions on prismia

We saw the local HEAD already, so we can also compare to the origin manually, by using FETCH_HEAD which is updated on each fetch or pull

Important

This is corrected relative to what was said in class, I had previously used a different source that called ORIG_HEAD as related to origin, but it is orig as in original. It is created when you move the head dramatically and holds the previous value before that change is done so you can go back. source

cat .git/FETCH_HEAD 
e3b192aa0cd490226e8adcd81d3d0b95adb5676b

but we see that origin head is to a commit, not a branch, so we have to follow the pointer from the HEAD to see what commit it was at.

So, remember that branches are pointers and look at what commit the current branch points to

cat .git/refs/heads/organization 
e3b192aa0cd490226e8adcd81d3d0b95adb5676b

here we can now see they are the same currently, but this is not necessarily true, it can be pointed at even different branches so these are often not the same.

10.8. Collaboration Tools#

Let’s look at the gh inclass repo in browser

gh repo view --web
Opening github.com/compsys-progtools/gh-inclass-sp24-brownsarahm in your browser.

We added a collaborator to each of your gh inclass repos and followed the steps in the “add a collaborator” issue.

If you are making up class, ping @brownsarahm in a comment on the issue to get assigned a partner.

10.9. Prepare for next class#

Create a file gitcommandsbreakdown.md and for each command in the template below break down what steps it must do based on what we have learned so far about git objects. I started the first one as an example based on what we discussed today, fill in the others. In class on Thursday, we will make a commit using plumbing commands, so thinking about what you already know about commits will prepare you to learn this material.

# What git commands do

## `git status`

- check the branch of the HEAD pointer
- compare the HEAD pointer to the FETCH_HEAD, if different trace back through parent commits to find out how many commits apart they are and which is ahead (or if both ahead and behind)
- compare the snapshot at the HEAD pointer's commit to the current working directory
- if staging is not empty, compare that to the working directory

## `git commit`

- 

## `git add`

- 

10.10. Badges#

  1. Read about different workflows in git and describe which one you prefer to work with and why in favorite_git_workflow.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs

  2. Update your kwl chart with what you have learned or new questions in the want to know column

  3. In commit_contents.md, redirect the content of your most recent commit, its tree, and the contents of each tree and blob in that tree to the same file. Edit the file or use echo to put markdown headings between the different objects. Add a title # Complete Commit to the file and at the bottom of the file add ## Reflection subheading with some notes on how, if at all this excercise helps you understand what a commit is.

  1. Read about different workflows in git and add responses to the below in a workflows.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs

  2. Update your kwl chart with what you have learned or new questions in the want to know column

  3. Find the hash of the blob object that contains the content of your workflows.md file and put that in the comment of your badge PR for this badge.

## Workflow Reflection

1. Why is it important that git can be used with different workflows?
1. Which workflow do you think you would like to work with best and why?
1. Describe a scenario that might make it better for the whole team to use a workflow other than the one you prefer.  

10.11. Experience Report Evidence#

Append the contents of one of your trees or commits and one blob or tree inside of that first one to the bottom of your experience report.

10.12. Questions After Today’s Class#

10.12.1. Why is python so hard to install on a computer?#

The complexity is that there can be multiple versions and there are multiple ways to do things.

10.12.2. What is a pack?#

a pack is a compressed chunk of content created when you push or occasionally if you have a lot of content.

You may have seen one after running the command, find .git/objects/ -type f. It would look something like: .git/objects/pack/pack-ef31f9fc1517ed75ff4829e12728cdc8884be12e.idx

you can read more in the git internals chapter of the book

10.12.3. What are the exact differences between HEAD, ORIG_HEAD, and FETCH_HEAD?#

All three are pointers to a commit, (or branch)

  • HEAD is the locally checked out branch (or commit).

  • ORIG_HEAD is where your head was before the last operation. source and FETCH_HEAD is the last head you checked out.

10.12.4. Why hasn’t a more user friendly interface been implemented for git?#

There are a lot of attempts, but they are all severely limited in function and generally pretty strongly disliked by people who understand how git works. I would not teach any of them because I have seen that they reinforce misconceptions and incorrect understanding of git in ways that prevent people from learning the more advanced features.

Also, user friendly is relative. Relative to the audience and to the alternative. To experienced devs familiar with other VCS, git is user friendly.

10.12.5. How does git know what files are in the staging area and what files are not?#

The index file, which is not easily human readable, contains the contents of the staging area. We will learn more about this next class. This is a great questions

10.12.6. When we switch branches how does it update that list of files in the staging area?#

No, the staging area is separate from branches. This is useful though. For example, you could add files to the staging area, then use git status and realize you are stil on main when you wanted to commit to a branch. You can then create and switch to the new branch, and finish the commit, without having to unstage and start over.

10.12.7. if github didn’t exist, where would git be used?#

Other popular git hosts are bitbucket and gitlab, but there are lots and you can host your own.

this is a good explore topic

10.12.8. Does the HEAD just point to the most recent commit for the entire repository?#

not always, but most of the time yes. It points to what is currently checked out.

We will be working with this more over the next few classes, so this is okay to be a little unclear because there will be more time to work with this.