10. What is git?#
Last week, we learned about what a commit is and then we took a break from how git works, to talk more about how developers communicate about code
Today we will learn what git is more formally.
Tip
We will go in and out of topics at times, in order to provides what is called spaced repetition, repeating material or key concepts with breaks in between.
Using git correctly is a really important goal of this course because git is an opportunity for you to demonstrate a wide range of both practical and conceptual understanding.
So, I have elected to interleave other topics with git to give core git ideas some time to simmer and give you time to practice them before we build on them with more depth at git.
Also, we are both learning git and using git as a motivating example of other key important topics.
10.1. Why so much git?#
Today, we are going to learn what git is and later we will learn more details of how it is implemented.
Remember we are spending so much time with git for two reasons:
it is an important developer tool
it demonstrates important conceptual ideas that occur in other areas of CS
git book is the official reference on git.
this includes other spoken languages as well if that is helpful for you.
10.2. git definition#
From here, we have the full definition of git
We do not start from that point, because these documents were written for target audience of working developers who are familar with other, old version control systems and learning an additional one.
Most of you, however, have probably not used another version control system.
10.3. Git is a File system#
Content-addressable filesystem means a key-value data store.
some examples of key-value pairs that you have seen in computer science broadly, and in this course specficially
python dictionaries
pointers (address,content)
parameter, passed values
yaml files
What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.
10.4. Git is a Version Control System#
In the before times
git stores snapshots of your work each time you commit.
it uses 3 stages:
These three stages are the in relation to your working directory, and potentially remotes.
So in broader context, the git visual cheatsheet is a more complete picture and has commands overlayed with the concept.
10.5. Git has two sets of commands#
Porcelain: the user friendly VCS
Plumbing: the internal workings- a toolkit for a VCS
We have so far used git as a version control system. A version control system, in general, will have operations like commit, push, pull, clone. These may work differently under the hood or be called different things, but those are what something needs to have in order to keep track of different versions.
The plumbing commands reveal the way that git performs version control operations. This means, they implement the git file system operations for the git version control system.
You can think of the plumbing vs porcelain commands like public/private methods. As a user, you only need the public methods (porcelain commands) but those use the private ones to get things done (plumbing commands). We will use the plumbing commands over the next few classes to examine what git really does when we call the porcelain commands that we will typically use.
10.6. Git is distributed#
What does that mean?
Git runs locally. It can run in many places, and has commands to help sync across remotes, but git does not require one copy of the repository to be the “official” copy and the others to be subordinate. git just sees repositories.
For human reasons, we like to have one “official” copy and treat the others as local copies, but that is a social choice, not a technological requirement of git. Even though we will typically use it with an offical copy and other copies, having a tool tht does not care, makes the tool more flexible and allows us to create workflows, or networks of copies that have any relationship we want.
It’s about the workflows, or the ways we socially use the tool.
10.6.1. Subversion Workflow#
subversion is an older VCS
10.6.2. Integration Manager#
10.6.3. dictator and lieutenants#
10.7. Starting a git repo locally#
We made this folder, but we have not used any git operations on it yet, it is actually not a git repo, which we could tell from the output above, but let’s use git to inspect and get another hint.
Now we will return to the tiny-book
folder we made last class. This is a jupyterbook, meaning the command jupyter-book build .
works in the directory.
cd tiny-book/
ls
_build markdown-notebooks.md
_config.yml markdown.md
_toc.yml notebooks.ipynb
intro.md references.bib
logo.png requirements.txt
we see all fo the files
ls -a
. logo.png
.. markdown-notebooks.md
_build markdown.md
_config.yml notebooks.ipynb
_toc.yml references.bib
intro.md requirements.txt
but no .git!
and if we try a git command
git status
fatal: not a git repository (or any of the parent directories): .git
This tells us the .git
directory is missing form the current path and all parent directories.
pwd
/Users/brownsarahm/Documents/inclass/systems/tiny-book
nor is any other fold in this list, since it checks the parents.
To make it a git repo, we use git init
with the path we want to initialize, which currently is .
git init .
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/tiny-book/.git/
we’ll change our default branch to main
git branch -m main
and check in with git now
git status
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
_build/
_config.yml
_toc.yml
intro.md
logo.png
markdown-notebooks.md
markdown.md
notebooks.ipynb
references.bib
requirements.txt
nothing added to commit but untracked files present (use "git add" to track)
this time it works and we see a two important things:
there are no previous commits
all of the files are untracked
10.8. Handling Built files#
The built site files are compeltey redundant, content wise, to the original markdown files.
the build folder is compeltely redundant, we can delete it
rm -rf _build/
confirm it is gone
ls
_config.yml markdown.md
_toc.yml notebooks.ipynb
intro.md references.bib
logo.png requirements.txt
markdown-notebooks.md
then re-build to replace it
jupyter-book build .
Running Jupyter-Book v1.0.2
Source Folder: /Users/brownsarahm/Documents/inclass/systems/tiny-book
Config Path: /Users/brownsarahm/Documents/inclass/systems/tiny-book/_config.yml
Output Path: /Users/brownsarahm/Documents/inclass/systems/tiny-book/_build/html
Running Sphinx v7.4.7
loading translations [en]... done
making output directory... done
[etoc] Changing master_doc to 'intro'
checking bibtex cache... out of date
parsing bibtex file /Users/brownsarahm/Documents/inclass/systems/tiny-book/references.bib... parsed 5 entries
myst v2.0.0: MdParserConfig(commonmark_only=False, gfm_only=False, enable_extensions={'substitution', 'tasklist', 'dollarmath', 'linkify', 'colon_fence'}, disable_syntax=[], all_links_external=False, url_schemes=('mailto', 'http', 'https'), ref_domains=None, fence_as_directive=set(), number_code_blocks=[], title_to_header=False, heading_anchors=0, heading_slug_func=None, html_meta={}, footnote_transition=True, words_per_minute=200, substitutions={}, linkify_fuzzy_links=True, dmath_allow_labels=True, dmath_allow_space=True, dmath_allow_digits=True, dmath_double_inline=False, update_mathjax=True, mathjax_classes='tex2jax_process|mathjax_process|math|output_area', enable_checkboxes=False, suppress_warnings=[], highlight_code_blocks=True)
myst-nb v1.1.1: NbParserConfig(custom_formats={}, metadata_key='mystnb', cell_metadata_key='mystnb', kernel_rgx_aliases={}, eval_name_regex='^[a-zA-Z_][a-zA-Z0-9_]*$', execution_mode='force', execution_cache_path='', execution_excludepatterns=[], execution_timeout=30, execution_in_temp=False, execution_allow_errors=False, execution_raise_on_error=False, execution_show_tb=False, merge_streams=False, render_plugin='default', remove_code_source=False, remove_code_outputs=False, code_prompt_show='Show code cell {type}', code_prompt_hide='Hide code cell {type}', number_source_lines=False, output_stderr='show', render_text_lexer='myst-ansi', render_error_lexer='ipythontb', render_image_options={}, render_figure_options={}, render_markdown_format='commonmark', output_folder='build', append_css=True, metadata_to_fm=False)
Using jupyter-cache at: /Users/brownsarahm/Documents/inclass/systems/tiny-book/_build/.jupyter_cache
sphinx-multitoc-numbering v0.1.3: Loaded
building [mo]: targets for 0 po files that are out of date
writing output...
building [html]: targets for 4 source files that are out of date
updating environment: [new config] 4 added, 0 changed, 0 removed
/Users/brownsarahm/Documents/inclass/systems/tiny-book/markdown-notebooks.md: Executing notebook using local CWD [mystnb]
/Users/brownsarahm/Documents/inclass/systems/tiny-book/markdown-notebooks.md: Executed notebook in 1.28 seconds [mystnb]
/Users/brownsarahm/Documents/inclass/systems/tiny-book/notebooks.ipynb: Executing notebook using local CWD [mystnb]
/Users/brownsarahm/Documents/inclass/systems/tiny-book/notebooks.ipynb: Executed notebook in 2.10 seconds [mystnb]
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
copying assets...
copying static files... done
copying extra files... done
copying assets: done
writing output... [100%] notebooks
generating indices... genindex done
writing additional pages... search done
copying images... [100%] _build/jupyter_execute/a31e63b1f6ca34376ef17d2b6c277648c6b47bb0a75c5165999735167a988593.png
dumping search index in English (code: en)... done
dumping object inventory... done
[etoc] missing index.html written as redirect to 'intro.html'
build succeeded.
The HTML pages are in _build/html.
===============================================================================
Finished generating HTML for book.
Your book's HTML pages are here:
_build/html/
You can look at your book by opening this file in a browser:
_build/html/index.html
Or paste this line directly into your browser bar:
file:///Users/brownsarahm/Documents/inclass/systems/tiny-book/_build/html/index.html
===============================================================================
and see it is back, or view the file in our browser
ls
_build markdown-notebooks.md
_config.yml markdown.md
_toc.yml notebooks.ipynb
intro.md references.bib
logo.png requirements.txt
it contains derived content, not content that a person manually authored
ls _build/
html jupyter_execute
ls _build/html/
_images markdown-notebooks.html
_sources markdown.html
_sphinx_design_static notebooks.html
_static objects.inv
genindex.html search.html
index.html searchindex.js
intro.html
since they are fully redundant,
We do not want to keep track of changes for the built files since they are generated from the source files. It’s redundant and makes it less clear where someone should update content.
Git helps us with this with the .gitignore
echo "_build/" > .gitignore
now it does not show!
git status
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
_config.yml
_toc.yml
intro.md
logo.png
markdown-notebooks.md
markdown.md
notebooks.ipynb
references.bib
requirements.txt
nothing added to commit but untracked files present (use "git add" to track)
Now, we can add and commit our book!
git add .
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .gitignore
new file: _config.yml
new file: _toc.yml
new file: intro.md
new file: logo.png
new file: markdown-notebooks.md
new file: markdown.md
new file: notebooks.ipynb
new file: references.bib
new file: requirements.txt
git commit -m 'tempatle'
[main (root-commit) eb9ef8e] tempatle
10 files changed, 342 insertions(+)
create mode 100644 .gitignore
create mode 100644 _config.yml
create mode 100644 _toc.yml
create mode 100644 intro.md
create mode 100644 logo.png
create mode 100644 markdown-notebooks.md
create mode 100644 markdown.md
create mode 100644 notebooks.ipynb
create mode 100644 references.bib
create mode 100644 requirements.txt
10.9. How do I push a repo that I made locally to GitHub?#
Right now, we do not have any remotes, so if we try to push it will fail. Next we will see how to fix that.
First let’s confirm
git push
fatal: No configured push destination.
Either specify the URL from the command-line or configure a remote repository using
git remote add <name> <url>
and then push using the remote name
git push <name>
it fails, but it tells us how to fix it. This is why inspection is so powerful in developer tools, that is where we developers give one another hints.
Right now, we do not have any remotes
For today, we will create an empty github repo shared with me, by accepting the assignment linked in prismia or ask a TA/instructor if you are making up class.
More generally, you can create a repo
That default page for an empty repo if you do not initiate it with any files will give you the instructions for what remote to add.
Now we add the remote
git remote add origin https://github.com/compsys-progtools/tiny-book-brownsarahm-1.git
Then we can push
git push
fatal: The current branch main has no upstream branch.
To push the current branch and set the remote as upstream, use
git push --set-upstream origin main
To have this happen automatically for branches without a tracking
upstream, see 'push.autoSetupRemote' in 'git help config'.
once we link the branch
git push -u origin main
To https://github.com/compsys-progtools/tiny-book-brownsarahm-1.git
! [rejected] main -> main (fetch first)
error: failed to push some refs to 'https://github.com/compsys-progtools/tiny-book-brownsarahm-1.git'
hint: Updates were rejected because the remote contains work that you do not
hint: have locally. This is usually caused by another repository pushing to
hint: the same ref. If you want to integrate the remote changes, use
hint: 'git pull' before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
except github classroom added a commit, so we have diverged histories
we can fix this by pulling
git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 5 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Unpacking objects: 100% (5/5), 1.70 KiB | 348.00 KiB/s, done.
From https://github.com/compsys-progtools/tiny-book-brownsarahm-1
* [new branch] feedback -> origin/feedback
* [new branch] main -> origin/main
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.
git pull <remote> <branch>
If you wish to set tracking information for this branch you can do so with:
git branch --set-upstream-to=origin/<branch> main
which requires the branch to be better linked
git branch --set-upstream-to=origin/main main
branch 'main' set up to track 'origin/main'.
git pull
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint: git config pull.rebase false # merge
hint: git config pull.rebase true # rebase
hint: git config pull.ff only # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.
and we need the option for how to work. I want our local commits to be the most recent so I will rebase
git pull --rebase
Successfully rebased and updated refs/heads/main.
We can see that in the log it is as we expected.
git log
commit 60bd457a4c3523eae25be36ac685207d605c9676 (HEAD -> main)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date: Tue Oct 8 12:51:52 2024 -0400
tempatle
commit d781535217b324d9cb2ce6cb45ae565b54ee786f (origin/main)
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date: Tue Oct 8 16:54:12 2024 +0000
Setting up GitHub Classroom Feedback
commit 72bcbb8cbd2769d21aad3c23c8fbe477d0260ced (origin/feedback)
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date: Tue Oct 8 16:54:12 2024 +0000
GitHub Classroom Feedback
Note the commits made more recently in time appear older in the commit history.
Now we can push!
git push
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 8 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 16.33 KiB | 8.17 MiB/s, done.
Total 12 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/compsys-progtools/tiny-book-brownsarahm-1.git
d781535..60bd457 main -> main
10.10. How does git do all these things?#
We can use the bash command find
to search the file system note that this does not search the contents of the files, just the names.
find .git/objects/ -type f
.git/objects//69/fa449dd96e2405945b2e4cff2fd0ab8b102097
.git/objects//0e/2e3b27f61b5908c4bb75a1ca680ee4053aa992
.git/objects//60/bd457a4c3523eae25be36ac685207d605c9676
.git/objects//5f/534f8051f6a94d40e57e58242ef0113fae4fd1
.git/objects//5f/2d0ff4056de4c043c720fd80db5ff75502d286
.git/objects//d7/81535217b324d9cb2ce6cb45ae565b54ee786f
.git/objects//eb/9ef8efc08fa2067507e7af41fcc904a9a11cb5
.git/objects//fd/b7176c429a73d5335e127b27d530b8aaa07c7d
.git/objects//29/a422c19251aeaeb907175e9b3219a9bed6c616
.git/objects//74/d5c7101ed8c8c1a6f87e31debd9445df1f0e71
.git/objects//7e/821e45db31376729c73f3616fb24db2b655a95
.git/objects//72/bcbb8cbd2769d21aad3c23c8fbe477d0260ced
.git/objects//06/d56f40c838b64eb048a63e036125964a069a3a
.git/objects//a0/57a320dcd595f3f0e0d250c3af4a5653596914
.git/objects//e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.git/objects//fa/eea606145667f54d220a0c17ffe8d22db07146
.git/objects//c2/48258b7f940fe3a133a2f317b574821da8bf7b
.git/objects//f8/cdc73cb2be06824f521837366ec95b73d55ef8
.git/objects//78/3ec6aa5afe2f0a66087d01a112f543e1ed287e
This is a lot of files! It’s more than we have in our working directory.
This is a consequence of git taking snap shots and tracking both the actual contents of our working directory and our commit messages and other meta data about each commit.
10.11. Git Variables#
the program git
does not run continously the entire time you are using it for a project. It runs quick commands each time you tell it to, it’s goal is to manage files, so this makes sense. This also means that important information that git
needs is also saved in files.
We can see the files that it has by listing the directory:
ls .git
AUTO_MERGE ORIG_HEAD index refs
COMMIT_EDITMSG config info
FETCH_HEAD description logs
HEAD hooks objects
the files in all caps are like gits variables.
Lets look at the one called HEAD
we have interacted with HEAD
before when resolving merge conflicts.
cat .git/HEAD
ref: refs/heads/main
HEAD
is a pointer to the currently checked out branch.
The other files with HEAD
in their name are similarly pointers to other references, named corresponding to other things.
10.12. Git Objects#
There are 3 types:
blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots
10.13. Prepare for Next Class#
Take a few minutes to think what you know about hashing and numbers. Create hash_num_prep.md with two sections:
## Hashing
with a few bullet points summarzing key points about hashing, and## Numbers
with what types of number representations you know.start commenting/expressing interest on build/explore ideas. Next lab will be focused on that.
10.14. Badges#
Read about different workflows in git and describe which one you prefer to work with and why in favorite_git_workflow.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs
Update your kwl chart with what you have learned or new questions in the want to know column
In commit_contents.md, redirect the content of your most recent commit, its tree, and the contents of one blob. Edit the file or use
echo
to put markdown headings between the different objects. Add a title# Complete Commit
to the file and at the bottom of the file add## Reflection
subheading with some notes on how, if at all this excercise helps you understand what a commit is.
Read about different workflows in git and add responses to the below in a workflows.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs
Update your kwl chart with what you have learned or new questions in the want to know column
Add the hash of the content of your completed workflows.md file and put that in the comment of your badge PR for this badge. Try to do this from your local CLI, but full credit even if you use the website interface
## Workflow Reflection
1. Why is it important that git can be used with different workflows?
1. Which workflow do you think you would like to work with best and why?
1. Describe a scenario that might make it better for the whole team to use a workflow other than the one you prefer.
10.15. Experience Report Evidence#
10.16. Questions After Today’s Class#
10.16.1. Can you clarify the difference between working in local repos, github repos, and github codespaces and how they all integrate?#
a local repo is a folder on your computer
a github repo is the copy on the github.com server that the website interface interacts with
a github codespace is a virtual machine on an Azure server that you can use VScode with
a codespace is a local copy as far as git is concerned, since you have local access
both types of local repos can be pushed and pulled to github.com
10.16.2. how were the older verion controls and what similar features did it have to git that remains present in git present day?#
this is a good explore badge topic.
10.16.3. why is there a difference between Porcealin and Plumbing commands when they both interact with the file system and are dependent on its contents?#
is it the operatiions that are performed by the commands? Or is it more to do with the level that they interact with the file system.
the porcelain commands essentially call the plumbing comamnds. In normal day to day use you only need the porcelain commands, plumbing are the low level details.
10.16.4. Can the list of commit (objects) ever become long enough to cause issues?#
Technically, yes, practically no. the badges for today will have you investigate this idea.