Authors: Henry Milner, Andrew Do. Some of the material in this notebook is inspired by lectures by Prof. George Necula in CS 169.
Your first reason for this class (any likely many classes and projects to come): It's the only way to interact with other developers, because everyone uses it.
"Everyone?" Yes. Github, the biggest host for public git repositories, has 20 million repositories. There are probably many more private repositories. (You can create either.)
Better reasons:
However, git can be a little confusing. Many confusions happen because people don't understand the fundamentals you'll learn today. If you've got the basics, the impact of other confusions will be bounded, and you can probably figure out how to search for a solution.
We made a special repository for this section (it takes 5 seconds) here:
https://github.com/DS-100/git-intro
We'll use a Jupyter notebook, but you can run any of these commands in a Bash shell. Note that cd
is a magic command in Jupyter that doesn't have a !
in front of it. !cd
only works for the line you write it on.
We'll check out the repo in the /tmp folder, which the OS will wipe when you reboot. Obviously, don't do that if you want to keep the repo.
In [42]:
cd /tmp
In [30]:
# Delete the repo if it happens to already exist:
!rm -rf git-intro
In [31]:
# Create the repo
!git clone https://github.com/DS-100/git-intro git-intro
In [44]:
!ls -lh | grep git-intro
In [45]:
cd git-intro
In [37]:
# What files are in the repo?
!ls -lh
In [46]:
# What about hidden files?
!ls -alh
The special .git
directory is where git stores all its magic. If you delete it (or this whole directory), the repository won't be a repository any more.
In [49]:
# What's the current status, according to git?
!git status
In [50]:
# What's the history of the repo?
!git log
In [51]:
# What does README.md look like currently?
!cat README.md
In [140]:
# We can use Python to compute the filename.
# Then we can reference Python variables in
# ! shell commands using {}, because Jupyter
# is magic.
import datetime
our_id = datetime.datetime.now().microsecond
filename = "our_file_{:d}.txt".format(our_id)
filename
In [106]:
!echo "The quick brown fox \
jumped over the lzy dog." > "{filename}"
!ls
Creating the file only changed the local filesystem. We can go to the repository page on Github to verify that the file hasn't been added yet. You probably wouldn't want your changes to be published immediately to the world!
In [107]:
!git add "{filename}"
If you check again, our file still hasn't been published to the world. In git, you package together your new files and updates to old files, and then you create a new version called a "commit."
Git maintains a "staging" or "index" area for files that you've marked for committing with git add
.
In [108]:
!git status
In [109]:
!git commit -m 'Added our new file, "{filename}"'
In [110]:
!git status
In [111]:
!git log
Now our local repository has this new commit in it. Notice that the log shows the message we wrote when we made the commit. It is very tempting to write something like "stuff" here. But then it will be very hard to understand your history, and you'll lose some of the benefits of git.
For the same reason, try to make each commit a self-contained idea: You fixed a particular bug, added a particular feature, etc.
Our commit hasn't been published to other repositories yet, including the one on Github. We can check again to verify that.
To publish a commit we've created locally to another repository, we use git push
. Git remembers that we checked out from the Github repository, and by default it will push to that repository. Just to be sure, let's find the name git has given to that repository, and pass that explicitly to git push
.
In [112]:
!git remote -v
In [113]:
!git help push
In [120]:
!git push origin
Now our commit is finally visible on Github. Even if we spill coffee on our laptop, our new state will be safely recorded in the remote repository.
Oops, we didn't want that file! In fact, if you look at the history, people have been adding a bunch of silly files. We don't want any of them.
Once a commit is created, git basically never forgets about it or its contents (unless you try really hard). When your local filesystem doesn't have any outstanding changes, it's easy to switch back to an older commit.
We have previously given the name first
to the first commit in the repo, which had basically nothing in it. (We'll soon see how to assign names to commits.)
In [132]:
!git help branch
In [145]:
!git branch --list
In [146]:
# Let's make a new name for the first commit, "going-back",
# with our ID in there so we don't conflict with other
# sections.
!git branch going-back-{our_id} first
In [147]:
!git branch --list
In [148]:
!git checkout going-back-{our_id}
In [149]:
!ls
In [150]:
!git status
In [162]:
!git log --graph --decorate first going-back-{our_id} master
Note: we can always get back to the commit we made with:
git checkout master
Git informs us that we've switched to the going-back
"branch," and in the local filesystem, neither the file we created nor any other files, other than README.md, are there any more. What do you think would happen if we made some changes and made a new commit now?
master
branch would disappear.master
branch would now refer to our new commit.master
branch would still refer to our last commit. The first
branch would refer to the new commit.master
branch would still refer to our last commit. The first
branch would still refer to the first commit in the repository.
Let's find out.
In [152]:
new_filename = "our_second_file_{}.txt".format(our_id)
new_filename
In [153]:
!echo "Text for our second file!" > {new_filename}
!ls
In [154]:
!git add {new_filename}
!git commit -m'Adding our second file!'
In [157]:
!git status
In [161]:
!git log --graph --decorate first going-back-{our_id} master
Every commit is a snapshot of some files. A commit can never be changed. It has a unique ID assigned by git, like 20f97c1
.
Humans can't work with IDs like that, so git lets us give names like master
or first
to commits, using git branch <name> <commit ID>
. These names are called "branches" or "refs" or "tags." They're just names. Often master
is used for the most up-to-date commit in a repository, but not always.
At any point in time, your repository is pointing to a commit. Except in unusual cases, that commit will have a name. Git gives that name its own name: HEAD
. Remember: HEAD
is a special kind of name. It refers to other names rather than to a commit.
When you commit:
git log
line are showing.HEAD
points to (your "current branch"). Now that name refers to the new commit.Can you list all the pieces that make up the full state of your git repository?
HEAD
, which points to a ref.git add
but haven't committed yet. (You can find out what's staged with git status
. The staging area is confusing, so use it sparingly. Usually you should stage things and then immediately create a commit.)
In [171]:
!git push origin going-back-{our_id}
Here origin
is the name (according to git remote -v
) of the repository you want to push to. If you omit a remote name, origin
is also the default. Normally that's what you want.
going-back-{our_id}
(whatever the value of {our_id}
) is a branch in your repository. If you omit a branch name here, your current branch (the branch HEAD
refers to) is the default.
What do you think git does?
A few things happen:
going-back-{our_id}
's history - all of its ancestors.origin
, and they're added to that repository. (If origin
already has a bunch of them, of course those don't need to be sent.)going-back-{our_id}
in origin
to point to the same commit yours does.However, suppose someone else has updated going-back-{our_id}
since you last got it?
456 (your going-back-{our_id})
\ 345 (origin's going-back-{our_id}, pushed by someone else)
\ /
\ /
234 (going-back-{our_id} when you last pulled it from origin)
|
123
How do you think git handles that?
The answer may surprise you: git gives up and tells you you're not allowed to push. Instead, you have to pull the remote commits and merge them in your repository, then push after merging.
error: failed to push some refs to 'https://github.com/DS-100/git-intro.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
We'll go over merging next, but the end result after merging will look like this:
567 (your going-back-{our_id})
| \
| \
| \
456 \
\ 345 (origin's going-back-{our_id}, pushed by someone else)
\ /
\ /
234 (going-back-{our_id} when you last pulled it from origin)
|
123
Then git push origin going-back-{our_id}
would succeed, since there are now no conflicts. We're updating going-back-{our_id}
to a commit that's a descendant of the current commit going-back-{our_id}
names in origin
.
So it remains to see how to accomplish a merge. We need to start with pulling updates from other repositories.
In [165]:
cd /tmp
In [166]:
!git clone https://github.com/DS-100/git-intro git-intro-2
In [172]:
cd /tmp/git-intro-2
In [175]:
!git checkout going-back-{our_id}
In [176]:
third_filename = "our_third_file_{}.txt".format(our_id)
third_filename
In [177]:
!echo "Someone else added this third file!" > {third_filename}
!git add {third_filename}
!git commit -m"Adding a third file!"
!git push
Now we go back to our original repo.
In [178]:
cd /tmp/git-intro
You might just want the update. Or maybe you want to push your own commit to the same branch, and your git push
failed.
Git has a command called pull
that you could use. But it's complicated, and it's easier to break it down into two steps: fetching and merging.
Since git
commits are never destroyed, it's always safe to fetch commits from another repository. (Refs can be changed, so that's not true for refs. That's the source of the problem with our push
before!)
In [164]:
!git help fetch
In [ ]:
!git fetch origin
In [184]:
!git log --graph --decorate going-back-{our_id} origin/going-back-{our_id}
Now we need to update our ref to the newer commit. In this case, it's easy, because we didn't have any further commits. Git calls that a "fast-forward" merge.
In [191]:
!git merge origin/going-back-{our_id} --ff-only
In [192]:
!git log --graph --decorate
As a shortcut, you can do fetch and fast-forward merge with a single command:
git pull origin/going-back-{our_id} --ff-only
In this class, you have three repositories:
ds100
, which contains blank copies of assignmentsorigin
, where you submit your assignmentsds100
will be updated regularly with commits that add new assignments. You'll never push to ds100
. But you will pull from it regularly to get the new assignments.
When you pull from ds100
, you don't want to just use the latest commit from that repo. Then you'd be starting from scratch, without all your work on previous assignments.
Instead, you want to merge the ds100
updates so that you get the new assignments but don't clobber your own work.
In the git log, after a few assignments, this will look something like this:
(ds100/master) (master on local repo)
... ---------------------------efg (merged with 345 to get hw2)
| / |
345 (hw2) def (worked on lab2)
| |
| ---------------------------cde (merged with 234 to get lab2)
| / |
234 (lab2) bcd (finishing touches on hw1)
| |
| ---------------------------abc (your work on hw1)
| / ^ not a merge
123 (hw1)
Consider the first merge only. The current commit is bcd
, and you want to get lab 2. From what you know so far, how should you merge?
Answer: Assuming we're on the master
branch in our repo, and there are no uncommitted changes to the working files:
!git fetch ds100
!git merge ds100/master
That doesn't finish things for us, though. How will the merge work? How will git reconcile your changes to the hw1
files with the addition of lab2
files?
Git tries to intelligently include all the changes introduced in the two merged branches since their last common ancestor. In this case, the changes are independent - one branch introduced new files in the lab2
directory, and the other edited files in the hw1
directory. So git will just do it.
Git assumes that changes introduced in separate files, or in separate lines of the same file, can be applied together. If two branches change the same line of the same file, it will give up and ask you to reconcile the changes. You'll then need to edit the file and follow the instructions to mark it as fixed. We won't go over an example of that today.
Note that sometimes git's assumption about independence is not true. For example, suppose you are working on a project and you create a new code file A that imports code from another file B
. Your coworker deletes file B
. Git will merrily apply both changes, but your code in file A
won't work any more. So you need to apply human judgment when merging. If you write informative commit messages, it's much easier to find such problems.
The instructions on the course website tell you to get changes from ds100
with this command:
git pull -s recursive -X ours --no-edit ds100 master
What does this do? It's basically what we just went through, with some extra options that let you avoid dealing with merges:
git pull ds100 master
: Pull from ds100, updating the master
branch. Equivalent to git fetch ds100; git merge ds100/master
as seen above.-s recursive -X ours
: If git finds that you and the ds100
repo have made changes to the same line in a file, always take your changes and delete the ds100
repo changes. It will do this instead of asking you to reconcile the changes.--no-edit
: Normally, git will ask you to create a commit message to describe the merge commit. This option generates a default message for you.git merge
or git checkout
that might update your current branch. If you have outstanding uncommitted changes, it can be complicated to keep them intact. Generally you should commit your changes before running such commands.git diff
: See all changes in your working directory versus the most recent commit.git diff <commit_or_ref> <other_commit_or_ref>
: See all the changes between two commits.git add -u
: Add to the index (in preparation for a commit) all files that have been changed.git add -A
: Add to the index (in preparation for a commit) all files.git rm <file>
: Delete a file. This change happens in the index, so it will show up in your next commit.git mv <file> <new_name>
: Rename a file. Again, this change happens in the index, so it will show up in your next commit.git checkout -- <file>
: Reset a file to its state in the current commit, eliminating changes in the working directory.git rm --cached <file>
: Unstage a staged change to a file. (Useful if you git add
a file you didn't mean to add.) Doesn't delete the file.git checkout -b <new_branch_name>
: Create a new branch at the current commit and check it out, making HEAD
point to it.