In [1]:
:set -XOverloadedStrings

import Data.Maybe (fromJust)
import qualified Data.Set as S

import Prelude hiding ((^^))

import Duffer
import Duffer.Loose
import Duffer.Loose.Objects
import Duffer.WithRepo
import Duffer.Unified

duffer = withRepo "../.git"
resolveRef' = fmap fromJust . resolveRef
readObject' = fmap fromJust . readObject

Let's start with a raw representation of the most recent commit:


In [2]:
:!git show --format=raw -s


commit cebcb3302302809f25c190c5add9bb720dcb0493
tree ad20585018a493aa864df6a1a180795d9e1bf962
parent a593fffc837b481fffb09708c978d7ae0c4bd1d3
author Vaibhav Sagar <vaibhavsagar@gmail.com> 1501773615 +0800
committer Vaibhav Sagar <vaibhavsagar@gmail.com> 1501805943 +0800

    Refactor GitObject data type

I'm currently on the master branch, so another way to get to this object is as follows:


In [3]:
duffer (resolveRef' "refs/heads/master")


tree ad20585018a493aa864df6a1a180795d9e1bf962
parent a593fffc837b481fffb09708c978d7ae0c4bd1d3
author Vaibhav Sagar <vaibhavsagar@gmail.com> 1501773615 +0800
committer Vaibhav Sagar <vaibhavsagar@gmail.com> 1501805943 +0800

Refactor GitObject data type

A commit refers to a tree, which is git's way of storing a directory. An example tree looks like

Source: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

We can view the pretty-printed contents of a git object with cat-file -p. Each commit has a tree associated with it which represents a directory, in this case the root project folder.


In [4]:
:!git cat-file -p master^{tree}


040000 tree f8db550f29991a4c0cf30bab423abc1714b76dca	.ci
100644 blob 417c681b2e9c4d6d1f5da3fcd1afb3dd2c35d364	.gitignore
100644 blob f4f9a6020fe720baefcd13ab406b3096e49359b6	.hlint.yaml
100644 blob 7209b5526b6c91a5c76f23ae1a6d22f744def2d2	.travis.yml
100644 blob cdd96137b2bafb26259aaaebe802c05e6e2e1049	README.md
040000 tree 9095a580da426e1e279cbed785650be4632d327d	duffer-json
040000 tree 9d9da5d80cb1ecd48b8a7ee4f27c52659a814d0b	duffer-streaming
040000 tree de6051b24241a2b797d2c769b67a5c95c9d19301	duffer
040000 tree 783002195bf5521f37cd14645321ad464830a173	ihaskell-duffer
040000 tree 0211e67887f7d6c3f6a7bb57c1b03e2e12f9378d	notebooks
040000 tree dff6c74841b8ad1f0a986b9d3dd46f5afa4947eb	presentation
100644 blob 173e4da4e6a9aa80371700f2ff475b71584f03fe	release.nix
100644 blob f1cec41def89f4011c1df0486564de0e6d94f7dc	stack.yaml

Again, we can obtain almost identical (modulo formatting) output with duffer:


In [5]:
duffer $ do
    GitCommit master <- resolveRef' "refs/heads/master"
    let tree  =  commitTreeRef master
    readObject' tree


040000	tree	f8db550f29991a4c0cf30bab423abc1714b76dca	.ci
100644	blob	417c681b2e9c4d6d1f5da3fcd1afb3dd2c35d364	.gitignore
100644	blob	f4f9a6020fe720baefcd13ab406b3096e49359b6	.hlint.yaml
100644	blob	7209b5526b6c91a5c76f23ae1a6d22f744def2d2	.travis.yml
100644	blob	cdd96137b2bafb26259aaaebe802c05e6e2e1049	README.md
040000	tree	9095a580da426e1e279cbed785650be4632d327d	duffer-json
040000	tree	9d9da5d80cb1ecd48b8a7ee4f27c52659a814d0b	duffer-streaming
040000	tree	de6051b24241a2b797d2c769b67a5c95c9d19301	duffer
040000	tree	783002195bf5521f37cd14645321ad464830a173	ihaskell-duffer
040000	tree	0211e67887f7d6c3f6a7bb57c1b03e2e12f9378d	notebooks
040000	tree	dff6c74841b8ad1f0a986b9d3dd46f5afa4947eb	presentation
100644	blob	173e4da4e6a9aa80371700f2ff475b71584f03fe	release.nix
100644	blob	f1cec41def89f4011c1df0486564de0e6d94f7dc	stack.yaml

git implements a giant hashtable on the filesystem using SHA1 as the hashing function. It stores all the past files and directory listings as zlib-compressed text files (with a header denoting object type and length) under .git/objects as follows:

  1. Compute a SHA1 hash of the content.
  2. zlib-compress the content.
  3. Take the first 2 characters of the hash. This is the subdirectory under .git/objects where the content will be stored.
  4. The remaining 38 characters of the hash are the filename.

Source: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

For example, a decompressed commit looks like:


In [6]:
:!cat ../.git/objects/4b/d9b179bb166b85e3e889f9f263f1b5a26f3e34 | zlib-flate -uncompress


commit 287tree 0b36647819f93c5523b5967d19cb131d88ab1be4
parent 2577894fb379a7cbe2e3bfd3ba325f4e451bbb5f
parent d72ae27a9ae58d49235ff9761cfae816b004d9b1
author Vaibhav Sagar <vaibhavsagar@gmail.com> 1473766850 -0400
committer Vaibhav Sagar <vaibhavsagar@gmail.com> 1473766850 -0400

Add porcelain.

In [7]:
duffer $ readObject' "4bd9b179bb166b85e3e889f9f263f1b5a26f3e34"


tree 0b36647819f93c5523b5967d19cb131d88ab1be4
parent 2577894fb379a7cbe2e3bfd3ba325f4e451bbb5f
parent d72ae27a9ae58d49235ff9761cfae816b004d9b1
author Vaibhav Sagar <vaibhavsagar@gmail.com> 1473766850 -0400
committer Vaibhav Sagar <vaibhavsagar@gmail.com> 1473766850 -0400

Add porcelain.

In [8]:
:!git branch


  bits
  datatype-refactoring
  gh-pages
* master
  prime-cache

In [9]:
duffer $ do
    current  <- resolveRef' "refs/heads/master"
    parent   <- fromJust <$> current ^^ 1
    fromJust <$> parent ~~ 1


tree 839e183f0ea13dde4aba8b0963546f75fe78ba50
parent 6e86f14c06f8251715d7e077c38a0bef3626779e
author Vaibhav Sagar <vaibhavsagar@gmail.com> 1501721402 +0800
committer Vaibhav Sagar <vaibhavsagar@gmail.com> 1501721402 +0800

Use (&&&)

As mentioned previously, the hash of a git object uniquely identifies it in the giant hashtable that is git


In [10]:
tree <- duffer $ readObject' "a28aded05daa52ff5d0c77cd6186b1ce0faf7c8c"
hash tree


"a28aded05daa52ff5d0c77cd6186b1ce0faf7c8c"

git refers to files as blobs.


In [11]:
duffer $ readObject' "b75f4c9dbe3b61cacba052f23461834468832e41"


Copyright Vaibhav Sagar (c) 2015

All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.

    * Redistributions in binary form must reproduce the above
      copyright notice, this list of conditions and the following
      disclaimer in the documentation and/or other materials provided
      with the distribution.

    * Neither the name of Vaibhav Sagar nor the names of other
      contributors may be used to endorse or promote products derived
      from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The last type of git object is a tag, which gives a name to another git object.


In [12]:
duffer $ readObject' "d4b1e0343313ab60688cf0ddfa8ae5d8fe60ec23"


object 25354a5cfebca0261cdaa87ebef3a6b9dcb9c13a
type commit
tag test
tagger Vaibhav Sagar <vaibhavsagar@gmail.com> 1459935215 +1000

Test tag.

duffer is pretty great at reading git repositories, but that's not all you can do with it. You can also add content to a git repository with it:


In [13]:
import Data.ByteString.UTF8 (fromString, toString)
blob = GitBlob $ Blob (fromString "hello world")
duffer $ writeLooseObject blob


"95d09f2b10159347eece71399a7e2e907ea3df4f"

In [14]:
:!git cat-file -p 95d09f2b10159347eece71399a7e2e907ea3df4f


hello world

In [15]:
:!git branch


  bits
  datatype-refactoring
  gh-pages
* master
  prime-cache

In [16]:
currentCommitObject = resolveRef' "refs/heads/master"
duffer $ currentCommitObject >>= \commit -> updateRef "refs/heads/new-branch" commit


"cebcb3302302809f25c190c5add9bb720dcb0493"

In [17]:
:!git branch


  bits
  datatype-refactoring
  gh-pages
* master
  new-branch
  prime-cache

In [18]:
rootTreeObject <- duffer $ currentCommitObject >>= (\(GitCommit c) -> return $ commitTreeRef c) >>= readObject' 
rootTreeObject


040000	tree	f8db550f29991a4c0cf30bab423abc1714b76dca	.ci
100644	blob	417c681b2e9c4d6d1f5da3fcd1afb3dd2c35d364	.gitignore
100644	blob	f4f9a6020fe720baefcd13ab406b3096e49359b6	.hlint.yaml
100644	blob	7209b5526b6c91a5c76f23ae1a6d22f744def2d2	.travis.yml
100644	blob	cdd96137b2bafb26259aaaebe802c05e6e2e1049	README.md
040000	tree	9095a580da426e1e279cbed785650be4632d327d	duffer-json
040000	tree	9d9da5d80cb1ecd48b8a7ee4f27c52659a814d0b	duffer-streaming
040000	tree	de6051b24241a2b797d2c769b67a5c95c9d19301	duffer
040000	tree	783002195bf5521f37cd14645321ad464830a173	ihaskell-duffer
040000	tree	0211e67887f7d6c3f6a7bb57c1b03e2e12f9378d	notebooks
040000	tree	dff6c74841b8ad1f0a986b9d3dd46f5afa4947eb	presentation
100644	blob	173e4da4e6a9aa80371700f2ff475b71584f03fe	release.nix
100644	blob	f1cec41def89f4011c1df0486564de0e6d94f7dc	stack.yaml

In [19]:
GitTree rootTree = rootTreeObject
newFile = TreeEntry Regular "new-file" "95d09f2b10159347eece71399a7e2e907ea3df4f"
duffer $ do
    let entries    =  treeEntries rootTree
    let newEntries =  S.insert newFile entries
    newTree        <- writeLooseObject . GitTree $ Tree newEntries
    let me         =  PersonTime "Vaibhav Sagar" "vaibhavsagar@gmail.com" "1461156164" "+1000" 
    let newCommit  =  GitCommit $ Commit newTree ["d76238fed6c656183a4d4dcf287217a061043869"] me me Nothing "New commit."
    newHead        <- writeLooseObject newCommit
    updateRef "refs/heads/new-branch" newCommit


"5fdd80cfd4669041838341ca3e8d0bf9fa798bd2"

In [20]:
newTree = duffer $ resolveRef' "refs/heads/new-branch" >>= (\(GitCommit c) -> return $ commitTreeRef c) >>= readObject'
newTree


040000	tree	f8db550f29991a4c0cf30bab423abc1714b76dca	.ci
100644	blob	417c681b2e9c4d6d1f5da3fcd1afb3dd2c35d364	.gitignore
100644	blob	f4f9a6020fe720baefcd13ab406b3096e49359b6	.hlint.yaml
100644	blob	7209b5526b6c91a5c76f23ae1a6d22f744def2d2	.travis.yml
100644	blob	cdd96137b2bafb26259aaaebe802c05e6e2e1049	README.md
040000	tree	9095a580da426e1e279cbed785650be4632d327d	duffer-json
040000	tree	9d9da5d80cb1ecd48b8a7ee4f27c52659a814d0b	duffer-streaming
040000	tree	de6051b24241a2b797d2c769b67a5c95c9d19301	duffer
040000	tree	783002195bf5521f37cd14645321ad464830a173	ihaskell-duffer
100644	blob	95d09f2b10159347eece71399a7e2e907ea3df4f	new-file
040000	tree	0211e67887f7d6c3f6a7bb57c1b03e2e12f9378d	notebooks
040000	tree	dff6c74841b8ad1f0a986b9d3dd46f5afa4947eb	presentation
100644	blob	173e4da4e6a9aa80371700f2ff475b71584f03fe	release.nix
100644	blob	f1cec41def89f4011c1df0486564de0e6d94f7dc	stack.yaml