10. What is git?#

Last week, we learned about what a commit is and then we took a break from how git works, to talk more about how developers communicate about code

Today we will learn what git is more formally.

Tip

We will go in and out of topics at times, in order to provides what is called spaced repetition, repeating material or key concepts with breaks in between.

Using git correctly is a really important goal of this course because git is an opportunity for you to demonstrate a wide range of both practical and conceptual understanding.

So, I have elected to interleave other topics with git to give core git ideas some time to simmer and give you time to practice them before we build on them with more depth at git.

Also, we are both learning git and using git as a motivating example of other key important topics.

10.1. Why so much git?#

Today, we are going to learn what git is and later we will learn more details of how it is implemented.

Remember we are spending so much time with git for two reasons:

  1. it is an important developer tool

  2. it demonstrates important conceptual ideas that occur in other areas of CS

git book is the official reference on git.

this includes other spoken languages as well if that is helpful for you.

10.2. git definition#

From here, we have the full definition of git

git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it.

We do not start from that point, because these documents were written for target audience of working developers who are familar with other, old version control systems and learning an additional one.

Most of you, however, have probably not used another version control system.

10.3. Git is a File system#

Content-addressable filesystem means a key-value data store.

some examples of key-value pairs that you have seen in computer science broadly, and in this course specficially

  • python dictionaries

  • pointers (address,content)

  • parameter, passed values

  • yaml files

What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.

10.4. Git is a Version Control System#

In the before times

PhD comics final.doc comic

git stores snapshots of your work each time you commit.

snapshot of 5 versions of 3 files

it uses 3 stages:

3 stages in git

These three stages are the in relation to your working directory, and potentially remotes.

So in broader context, the git visual cheatsheet is a more complete picture and has commands overlayed with the concept.

10.5. Git has two sets of commands#

  • Porcelain: the user friendly VCS

  • Plumbing: the internal workings- a toolkit for a VCS

We have so far used git as a version control system. A version control system, in general, will have operations like commit, push, pull, clone. These may work differently under the hood or be called different things, but those are what something needs to have in order to keep track of different versions.

The plumbing commands reveal the way that git performs version control operations. This means, they implement the git file system operations for the git version control system.

You can think of the plumbing vs porcelain commands like public/private methods. As a user, you only need the public methods (porcelain commands) but those use the private ones to get things done (plumbing commands). We will use the plumbing commands over the next few classes to examine what git really does when we call the porcelain commands that we will typically use.

10.6. Git is distributed#

What does that mean?

Git runs locally. It can run in many places, and has commands to help sync across remotes, but git does not require one copy of the repository to be the “official” copy and the others to be subordinate. git just sees repositories.

For human reasons, we like to have one “official” copy and treat the others as local copies, but that is a social choice, not a technological requirement of git. Even though we will typically use it with an offical copy and other copies, having a tool tht does not care, makes the tool more flexible and allows us to create workflows, or networks of copies that have any relationship we want.

It’s about the workflows, or the ways we socially use the tool.

10.6.1. Subversion Workflow#

subversion workflow

subversion is an older VCS

10.6.2. Integration Manager#

integration manager workflow

10.6.3. dictator and lieutenants#

dictator and lieutenants workflow

10.7. Starting a git repo locally#

We made this folder, but we have not used any git operations on it yet, it is actually not a git repo, which we could tell from the output above, but let’s use git to inspect and get another hint.

Now we will return to the tiny-book folder we made last class. This is a jupyterbook, meaning the command jupyter-book build . works in the directory.

cd tiny-book/
ls
_build			markdown-notebooks.md
_config.yml		markdown.md
_toc.yml		notebooks.ipynb
intro.md		references.bib
logo.png		requirements.txt

we see all fo the files

ls -a
.			logo.png
..			markdown-notebooks.md
_build			markdown.md
_config.yml		notebooks.ipynb
_toc.yml		references.bib
intro.md		requirements.txt

but no .git!

and if we try a git command

git status
fatal: not a git repository (or any of the parent directories): .git

This tells us the .git directory is missing form the current path and all parent directories.

pwd
/Users/brownsarahm/Documents/inclass/systems/tiny-book

nor is any other fold in this list, since it checks the parents.

To make it a git repo, we use git init with the path we want to initialize, which currently is .

git init .
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: 	git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: 	git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/tiny-book/.git/

we’ll change our default branch to main

git branch -m main

and check in with git now

git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	_build/
	_config.yml
	_toc.yml
	intro.md
	logo.png
	markdown-notebooks.md
	markdown.md
	notebooks.ipynb
	references.bib
	requirements.txt

nothing added to commit but untracked files present (use "git add" to track)

this time it works and we see a two important things:

  • there are no previous commits

  • all of the files are untracked

10.8. Handling Built files#

The built site files are compeltey redundant, content wise, to the original markdown files.

the build folder is compeltely redundant, we can delete it

rm -rf _build/

confirm it is gone

ls
_config.yml		markdown.md
_toc.yml		notebooks.ipynb
intro.md		references.bib
logo.png		requirements.txt
markdown-notebooks.md

then re-build to replace it

jupyter-book build .
Running Jupyter-Book v1.0.2
Source Folder: /Users/brownsarahm/Documents/inclass/systems/tiny-book
Config Path: /Users/brownsarahm/Documents/inclass/systems/tiny-book/_config.yml
Output Path: /Users/brownsarahm/Documents/inclass/systems/tiny-book/_build/html
Running Sphinx v7.4.7
loading translations [en]... done
making output directory... done
[etoc] Changing master_doc to 'intro'
checking bibtex cache... out of date
parsing bibtex file /Users/brownsarahm/Documents/inclass/systems/tiny-book/references.bib... parsed 5 entries
myst v2.0.0: MdParserConfig(commonmark_only=False, gfm_only=False, enable_extensions={'substitution', 'tasklist', 'dollarmath', 'linkify', 'colon_fence'}, disable_syntax=[], all_links_external=False, url_schemes=('mailto', 'http', 'https'), ref_domains=None, fence_as_directive=set(), number_code_blocks=[], title_to_header=False, heading_anchors=0, heading_slug_func=None, html_meta={}, footnote_transition=True, words_per_minute=200, substitutions={}, linkify_fuzzy_links=True, dmath_allow_labels=True, dmath_allow_space=True, dmath_allow_digits=True, dmath_double_inline=False, update_mathjax=True, mathjax_classes='tex2jax_process|mathjax_process|math|output_area', enable_checkboxes=False, suppress_warnings=[], highlight_code_blocks=True)
myst-nb v1.1.1: NbParserConfig(custom_formats={}, metadata_key='mystnb', cell_metadata_key='mystnb', kernel_rgx_aliases={}, eval_name_regex='^[a-zA-Z_][a-zA-Z0-9_]*$', execution_mode='force', execution_cache_path='', execution_excludepatterns=[], execution_timeout=30, execution_in_temp=False, execution_allow_errors=False, execution_raise_on_error=False, execution_show_tb=False, merge_streams=False, render_plugin='default', remove_code_source=False, remove_code_outputs=False, code_prompt_show='Show code cell {type}', code_prompt_hide='Hide code cell {type}', number_source_lines=False, output_stderr='show', render_text_lexer='myst-ansi', render_error_lexer='ipythontb', render_image_options={}, render_figure_options={}, render_markdown_format='commonmark', output_folder='build', append_css=True, metadata_to_fm=False)
Using jupyter-cache at: /Users/brownsarahm/Documents/inclass/systems/tiny-book/_build/.jupyter_cache
sphinx-multitoc-numbering v0.1.3: Loaded
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 4 source files that are out of date
updating environment: [new config] 4 added, 0 changed, 0 removed
/Users/brownsarahm/Documents/inclass/systems/tiny-book/markdown-notebooks.md: Executing notebook using local CWD [mystnb]
/Users/brownsarahm/Documents/inclass/systems/tiny-book/markdown-notebooks.md: Executed notebook in 1.28 seconds [mystnb]
/Users/brownsarahm/Documents/inclass/systems/tiny-book/notebooks.ipynb: Executing notebook using local CWD [mystnb]
/Users/brownsarahm/Documents/inclass/systems/tiny-book/notebooks.ipynb: Executed notebook in 2.10 seconds [mystnb]

looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
copying assets... 
copying static files... done
copying extra files... done
copying assets: done
writing output... [100%] notebooks
generating indices... genindex done
writing additional pages... search done
copying images... [100%] _build/jupyter_execute/a31e63b1f6ca34376ef17d2b6c277648c6b47bb0a75c5165999735167a988593.png
dumping search index in English (code: en)... done
dumping object inventory... done
[etoc] missing index.html written as redirect to 'intro.html'
build succeeded.

The HTML pages are in _build/html.

===============================================================================

Finished generating HTML for book.
Your book's HTML pages are here:
    _build/html/
You can look at your book by opening this file in a browser:
    _build/html/index.html
Or paste this line directly into your browser bar:
    file:///Users/brownsarahm/Documents/inclass/systems/tiny-book/_build/html/index.html            

===============================================================================

and see it is back, or view the file in our browser

ls
_build			markdown-notebooks.md
_config.yml		markdown.md
_toc.yml		notebooks.ipynb
intro.md		references.bib
logo.png		requirements.txt

it contains derived content, not content that a person manually authored

ls _build/
html		jupyter_execute
ls _build/html/
_images			markdown-notebooks.html
_sources		markdown.html
_sphinx_design_static	notebooks.html
_static			objects.inv
genindex.html		search.html
index.html		searchindex.js
intro.html

since they are fully redundant,

We do not want to keep track of changes for the built files since they are generated from the source files. It’s redundant and makes it less clear where someone should update content.

Git helps us with this with the .gitignore

echo "_build/" > .gitignore

now it does not show!

git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore
	_config.yml
	_toc.yml
	intro.md
	logo.png
	markdown-notebooks.md
	markdown.md
	notebooks.ipynb
	references.bib
	requirements.txt

nothing added to commit but untracked files present (use "git add" to track)

Now, we can add and commit our book!

git add .
git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   .gitignore
	new file:   _config.yml
	new file:   _toc.yml
	new file:   intro.md
	new file:   logo.png
	new file:   markdown-notebooks.md
	new file:   markdown.md
	new file:   notebooks.ipynb
	new file:   references.bib
	new file:   requirements.txt
git commit -m 'tempatle'
[main (root-commit) eb9ef8e] tempatle
 10 files changed, 342 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 _config.yml
 create mode 100644 _toc.yml
 create mode 100644 intro.md
 create mode 100644 logo.png
 create mode 100644 markdown-notebooks.md
 create mode 100644 markdown.md
 create mode 100644 notebooks.ipynb
 create mode 100644 references.bib
 create mode 100644 requirements.txt

10.9. How do I push a repo that I made locally to GitHub?#

Right now, we do not have any remotes, so if we try to push it will fail. Next we will see how to fix that.

First let’s confirm

git push
fatal: No configured push destination.
Either specify the URL from the command-line or configure a remote repository using

    git remote add <name> <url>

and then push using the remote name

    git push <name>

it fails, but it tells us how to fix it. This is why inspection is so powerful in developer tools, that is where we developers give one another hints.

Right now, we do not have any remotes

For today, we will create an empty github repo shared with me, by accepting the assignment linked in prismia or ask a TA/instructor if you are making up class.

More generally, you can create a repo

That default page for an empty repo if you do not initiate it with any files will give you the instructions for what remote to add.

Now we add the remote

git remote add origin https://github.com/compsys-progtools/tiny-book-brownsarahm-1.git

Then we can push

git push
fatal: The current branch main has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin main

To have this happen automatically for branches without a tracking
upstream, see 'push.autoSetupRemote' in 'git help config'.

once we link the branch

git push -u origin main
To https://github.com/compsys-progtools/tiny-book-brownsarahm-1.git
 ! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'https://github.com/compsys-progtools/tiny-book-brownsarahm-1.git'
hint: Updates were rejected because the remote contains work that you do not
hint: have locally. This is usually caused by another repository pushing to
hint: the same ref. If you want to integrate the remote changes, use
hint: 'git pull' before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

except github classroom added a commit, so we have diverged histories

we can fix this by pulling

git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 5 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Unpacking objects: 100% (5/5), 1.70 KiB | 348.00 KiB/s, done.
From https://github.com/compsys-progtools/tiny-book-brownsarahm-1
 * [new branch]      feedback   -> origin/feedback
 * [new branch]      main       -> origin/main
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=origin/<branch> main

which requires the branch to be better linked

git branch --set-upstream-to=origin/main main
branch 'main' set up to track 'origin/main'.
git pull
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint:   git config pull.rebase false  # merge
hint:   git config pull.rebase true   # rebase
hint:   git config pull.ff only       # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.

and we need the option for how to work. I want our local commits to be the most recent so I will rebase

git pull --rebase
Successfully rebased and updated refs/heads/main.

We can see that in the log it is as we expected.

git log
commit 60bd457a4c3523eae25be36ac685207d605c9676 (HEAD -> main)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Oct 8 12:51:52 2024 -0400

    tempatle

commit d781535217b324d9cb2ce6cb45ae565b54ee786f (origin/main)
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date:   Tue Oct 8 16:54:12 2024 +0000

    Setting up GitHub Classroom Feedback

commit 72bcbb8cbd2769d21aad3c23c8fbe477d0260ced (origin/feedback)
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date:   Tue Oct 8 16:54:12 2024 +0000

    GitHub Classroom Feedback

Note the commits made more recently in time appear older in the commit history.

Now we can push!

git push
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 8 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 16.33 KiB | 8.17 MiB/s, done.
Total 12 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/compsys-progtools/tiny-book-brownsarahm-1.git
   d781535..60bd457  main -> main

10.10. How does git do all these things?#

We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.

find .git/objects/ -type f
.git/objects//69/fa449dd96e2405945b2e4cff2fd0ab8b102097
.git/objects//0e/2e3b27f61b5908c4bb75a1ca680ee4053aa992
.git/objects//60/bd457a4c3523eae25be36ac685207d605c9676
.git/objects//5f/534f8051f6a94d40e57e58242ef0113fae4fd1
.git/objects//5f/2d0ff4056de4c043c720fd80db5ff75502d286
.git/objects//d7/81535217b324d9cb2ce6cb45ae565b54ee786f
.git/objects//eb/9ef8efc08fa2067507e7af41fcc904a9a11cb5
.git/objects//fd/b7176c429a73d5335e127b27d530b8aaa07c7d
.git/objects//29/a422c19251aeaeb907175e9b3219a9bed6c616
.git/objects//74/d5c7101ed8c8c1a6f87e31debd9445df1f0e71
.git/objects//7e/821e45db31376729c73f3616fb24db2b655a95
.git/objects//72/bcbb8cbd2769d21aad3c23c8fbe477d0260ced
.git/objects//06/d56f40c838b64eb048a63e036125964a069a3a
.git/objects//a0/57a320dcd595f3f0e0d250c3af4a5653596914
.git/objects//e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.git/objects//fa/eea606145667f54d220a0c17ffe8d22db07146
.git/objects//c2/48258b7f940fe3a133a2f317b574821da8bf7b
.git/objects//f8/cdc73cb2be06824f521837366ec95b73d55ef8
.git/objects//78/3ec6aa5afe2f0a66087d01a112f543e1ed287e

This is a lot of files! It’s more than we have in our working directory.

This is a consequence of git taking snap shots and tracking both the actual contents of our working directory and our commit messages and other meta data about each commit.

10.11. Git Variables#

the program git does not run continously the entire time you are using it for a project. It runs quick commands each time you tell it to, it’s goal is to manage files, so this makes sense. This also means that important information that git needs is also saved in files.

We can see the files that it has by listing the directory:

ls .git
AUTO_MERGE	ORIG_HEAD	index		refs
COMMIT_EDITMSG	config		info
FETCH_HEAD	description	logs
HEAD		hooks		objects

the files in all caps are like gits variables.

Lets look at the one called HEAD we have interacted with HEAD before when resolving merge conflicts.

cat .git/HEAD 
ref: refs/heads/main

HEAD is a pointer to the currently checked out branch.

The other files with HEAD in their name are similarly pointers to other references, named corresponding to other things.

10.12. Git Objects#

There are 3 types:

  • blob objects: the content of your files (data)

  • tree objects: stores file names and groups files together (organization)

  • Commit Objects: stores information about the sha values of the snapshots

classDiagram class tree{ List: - hash: blob - string: type - string:file name } class commit{ hash: parent hash: tree string: message string: author string: time } class blob{ binary: contents } class object{ hash: name } object <|-- blob object <|-- tree object <|-- commit

10.13. Prepare for Next Class#

  1. Take a few minutes to think what you know about hashing and numbers. Create hash_num_prep.md with two sections: ## Hashing with a few bullet points summarzing key points about hashing, and ## Numbers with what types of number representations you know.

  2. start commenting/expressing interest on build/explore ideas. Next lab will be focused on that.

10.14. Badges#

  1. Read about different workflows in git and describe which one you prefer to work with and why in favorite_git_workflow.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs

  2. Update your kwl chart with what you have learned or new questions in the want to know column

  3. In commit_contents.md, redirect the content of your most recent commit, its tree, and the contents of one blob. Edit the file or use echo to put markdown headings between the different objects. Add a title # Complete Commit to the file and at the bottom of the file add ## Reflection subheading with some notes on how, if at all this excercise helps you understand what a commit is.

  1. Read about different workflows in git and add responses to the below in a workflows.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs

  2. Update your kwl chart with what you have learned or new questions in the want to know column

  3. Add the hash of the content of your completed workflows.md file and put that in the comment of your badge PR for this badge. Try to do this from your local CLI, but full credit even if you use the website interface

## Workflow Reflection

1. Why is it important that git can be used with different workflows?
1. Which workflow do you think you would like to work with best and why?
1. Describe a scenario that might make it better for the whole team to use a workflow other than the one you prefer.  

10.15. Experience Report Evidence#

10.16. Questions After Today’s Class#

10.16.1. Can you clarify the difference between working in local repos, github repos, and github codespaces and how they all integrate?#

  • a local repo is a folder on your computer

  • a github repo is the copy on the github.com server that the website interface interacts with

  • a github codespace is a virtual machine on an Azure server that you can use VScode with

  • a codespace is a local copy as far as git is concerned, since you have local access

  • both types of local repos can be pushed and pulled to github.com

10.16.2. how were the older verion controls and what similar features did it have to git that remains present in git present day?#

this is a good explore badge topic.

10.16.3. why is there a difference between Porcealin and Plumbing commands when they both interact with the file system and are dependent on its contents?#

is it the operatiions that are performed by the commands? Or is it more to do with the level that they interact with the file system.

the porcelain commands essentially call the plumbing comamnds. In normal day to day use you only need the porcelain commands, plumbing are the low level details.

10.16.4. Can the list of commit (objects) ever become long enough to cause issues?#

Technically, yes, practically no. the badges for today will have you investigate this idea.