Loading From Git

This will familiarize you with the different ways to access a GitRepo (or MultiGitRepo) object and how to use its data.

  • Single Repo:
    • remote get_repo("https://github.com/sbenthall/bigbang.git", in_type = "remote" )
    • local get_repo("~/urap/bigbang/archives/sample_git_repos/bigbang", in_type = "local" )
    • name get_repo("bigbang", in_type = "name")
  • Multiple Repos:
    • With repo names: get_multi_repo(repo_names=["bigbang","django"])
    • With repo objects: get_multi_repo(repos=[{list of existing GitRepo objects}]
    • With Github Organization names get_org_multirepo("glass-bead-labs")

Repo Locations

As of now, repos are clones into archives/sample_git_repos/{repo_name}. Their caches are stored at archives/sample_git_repos/{repo_name}_backup.csv.

Caches

Caches are stored at archives/sample_git_repos/{repo_name}_backup.csv. They are the dumped .csv files of a GitRepo object's commit_data attribute, which is a pandas dataframe of all commit information. We can initialize a GitRepo object by feeding the cache's Pandas dataframe into the GitRepo init function. However, the init function will need to do some processing before it can use the cache as its commit data. It needs to convert the "Touched File" attribute of the cache dataframe from unicode "[file1, file2, file3]" to an actual list ["file1", "file2", "file3"]. It will also need to convert the time index of the cache from string to datetime.

Single Repos

Here, we can load in three ways. We can use a github url, a local path to a repo, or the name of a repo. All of these return a GitRepo object.

Remote

A remote call to get_repo will extract the repo's name from its git url. Thus, https://github.com/sbenthall/bigbang.git will yield bigbang as its name. It will check if the repo already exists. If it doesn't it will send a shell command to clone the remote repository to archives/sample_git_repos/{repo_name}. It will then return get_repo({name}, in_type="name"). Before returning, however, it will cache the GitRepo object at archives/sample_git_repos/{repo_name}_backup.csv to make loading faster the next time.

Local

A local call is the simplest. It will first extract the repo name from the filepath. Thus, ~/urap/bigbang/archives/sample_git_repos/bigbang will yield bigbang. It will check to see if a git repo exists at the given address. If it does, it will initialize a GitPython object, which only needs a name and a filepath to a Git repo. Note that this option does not check or create a cache.

Name

This is the preferred and easiest way to load a git repository. It works under the assumptions above about where a git repo and its cache should be stored. It will check to see if a cache exists. If it does, then it will load a GitPython object using that cache.

If a cache is not found, then the function constructs a filepath from the name, using the above rule about where repo locations. It will pass off the function to get_repo(filepath, in_type="local"). Before returning the answer, it will cache the result.


In [1]:
from bigbang import repo_loader # The file that handles most loading

repo = repo_loader.get_repo("https://github.com/sbenthall/bigbang.git", in_type = "remote" )
# repo = repo_loader.get_repo("../",  in_type = "local" ) # I commented this out because it may take too long
repo = repo_loader.get_repo("bigbang", in_type = "name")
repo.commit_data


Out[1]:
Unnamed: 0 Commit Message Committer Email Committer Name HEXSHA Parent Commit Time Touched File Person-ID
0 2015-04-13 22:49:33 Merge pull request #195 from jesscxu/master\n\... sbenthall@gmail.com Sebastian Benthall e6f985d15ff4736a08e2112b6c7ff0c0d0836a75 [02d30c7ba4b02e899c4f098531812ca390983c0b, 5b5... 2015-04-13 22:49:33 [examples/viz/git/glass.json, examples/viz/git... 1
1 2015-04-13 22:44:21 Adding d3 visualization of GitDiff.ipynb graph\n jcxu@berkeley.edu Jessica Xu 5b54cc96d652a07b12b5c31d4f5ad5269e1aec37 [02d30c7ba4b02e899c4f098531812ca390983c0b] 2015-04-13 22:44:21 [examples/viz/git/glass.json, examples/viz/git... 2
2 2015-04-10 21:59:33 Merge pull request #194 from vsporeddy/master\... sbenthall@gmail.com Sebastian Benthall 02d30c7ba4b02e899c4f098531812ca390983c0b [3723718c356155a8c2c2104e813d61263a1f23c7, 2ec... 2015-04-10 21:59:33 [examples/File Dependency Network.ipynb] 1
3 2015-04-10 18:19:22 Changed to directed graph vs.poreddy@gmail.com Venkata Poreddy 2ec31ee60878a08e5738dfa40245740e79dde97c [f5316bf07da3d4d51ac3bc1875b24d10693daa02] 2015-04-10 18:19:22 [examples/File Dependency Network.ipynb] 3
4 2015-04-10 18:18:13 Merge pull request #3 from sbenthall/master\n\... vs.poreddy@gmail.com Venkata Poreddy f5316bf07da3d4d51ac3bc1875b24d10693daa02 [9aacab2a8eb5e7eabcb227caea5a82d99e5f8835, 372... 2015-04-10 18:18:13 [bigbang/git_repo.py, bigbang/repo_loader.py] 3
5 2015-04-10 17:54:34 Merge pull request #192 from Aryan-Barbarian/m... sbenthall@gmail.com Sebastian Benthall 3723718c356155a8c2c2104e813d61263a1f23c7 [a22c55ea0887bdff8f62e50d2abdca02f6fdbce6, ed6... 2015-04-10 17:54:34 [bigbang/git_repo.py, bigbang/repo_loader.py] 1
6 2015-04-10 17:53:13 Merge pull request #193 from vsporeddy/master\... sbenthall@gmail.com Sebastian Benthall a22c55ea0887bdff8f62e50d2abdca02f6fdbce6 [2b1f678c8ad75458b6a6b7484bed0ca72baee298, 9aa... 2015-04-10 17:53:13 [bigbang/get_dependencies.py, examples/File De... 1
7 2015-04-10 17:30:29 Fixed an issue where git repos with hyphens in... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh ed60740e26981e216542a258c0c5aa0afa50af95 [8dac7fc397738b057d7fbdcd2bea1552e6f88339] 2015-04-10 17:30:29 [bigbang/repo_loader.py] 4
8 2015-04-10 16:55:36 Update File Dependency Network.ipynb vs.poreddy@gmail.com Venkata Poreddy 9aacab2a8eb5e7eabcb227caea5a82d99e5f8835 [465c3a275bc341e2dab9d43c0363c2a7fff59b15] 2015-04-10 16:55:36 [examples/File Dependency Network.ipynb] 3
9 2015-04-10 16:54:44 Create get_dependencies.py vs.poreddy@gmail.com Venkata Poreddy 465c3a275bc341e2dab9d43c0363c2a7fff59b15 [95e074b3e32017adf92e74a8fb19e471bf95f1ee] 2015-04-10 16:54:44 [bigbang/get_dependencies.py] 3
10 2015-04-10 16:53:57 Update requirements.txt vs.poreddy@gmail.com Venkata Poreddy 95e074b3e32017adf92e74a8fb19e471bf95f1ee [68a5743f1cfe1241cb2608739418850b0b285360] 2015-04-10 16:53:57 [requirements.txt] 3
11 2015-04-10 16:53:31 Create File Dependency Network.ipynb vs.poreddy@gmail.com Venkata Poreddy 68a5743f1cfe1241cb2608739418850b0b285360 [be536710f94ec072e04431e7cd043ad24f5f1afb] 2015-04-10 16:53:31 [examples/File Dependency Network.ipynb] 3
12 2015-04-10 16:18:26 Merge pull request #2 from sbenthall/master\n\... vs.poreddy@gmail.com Venkata Poreddy be536710f94ec072e04431e7cd043ad24f5f1afb [3287f61619d148ccb7deb77c4821812d1dc9cff0, 2b1... 2015-04-10 16:18:26 [.gitignore, README.md, bigbang/archive.py, bi... 3
13 2015-04-10 11:06:56 Warning people how long git diffs will take\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 8dac7fc397738b057d7fbdcd2bea1552e6f88339 [0db0b375fcb90522f6a8700d87820e8fd91e5343] 2015-04-10 11:06:56 [bigbang/git_repo.py] 4
14 2015-04-10 10:56:57 Fixed another bug with repo loading logic\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 0db0b375fcb90522f6a8700d87820e8fd91e5343 [a121a04579461d4a520fbe4113f0cd0b3a052911] 2015-04-10 10:56:57 [bigbang/git_repo.py, bigbang/repo_loader.py] 4
15 2015-04-10 10:35:54 Fixed repo loading bug. The answer fetched was... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh a121a04579461d4a520fbe4113f0cd0b3a052911 [2b1f678c8ad75458b6a6b7484bed0ca72baee298] 2015-04-10 10:35:54 [bigbang/git_repo.py, bigbang/repo_loader.py] 4
16 2015-04-06 23:30:06 Merge pull request #190 from dwins/setting_wit... sbenthall@gmail.com Sebastian Benthall 2b1f678c8ad75458b6a6b7484bed0ca72baee298 [48dfc9b5472471b5a8768f56566c6246c63aa3fe, c03... 2015-04-06 23:30:06 [bigbang/archive.py] 1
17 2015-04-06 23:21:00 Merge branch 'raj4-master'\n sbenthall@gmail.com sb 48dfc9b5472471b5a8768f56566c6246c63aa3fe [ff0a46b3afac4995517d7dc0ad1281f457e818b4, bc5... 2015-04-06 23:21:00 [examples/Collaboration Robustness.ipynb] 1
18 2015-04-06 23:20:37 Merge branch 'master' of https://github.com/ra... sbenthall@gmail.com sb bc5ccc1fe3034f939ef2f74789a949d2f3604694 [ff0a46b3afac4995517d7dc0ad1281f457e818b4, 039... 2015-04-06 23:20:37 [examples/Collaboration Robustness.ipynb] 1
19 2015-04-06 23:13:58 Merge branch 'cool9210-master'\n sbenthall@gmail.com sb ff0a46b3afac4995517d7dc0ad1281f457e818b4 [6856dc4b4b7ce515c34c180f5ff72dd1b2676b1e, 505... 2015-04-06 23:13:58 [bigbang/twopeople.py] 1
20 2015-04-06 23:13:27 Merge branch 'master' of https://github.com/co... sbenthall@gmail.com sb 505689d8494bab11e69f0687364dbba2a461b532 [6856dc4b4b7ce515c34c180f5ff72dd1b2676b1e, 3fa... 2015-04-06 23:13:27 [bigbang/twopeople.py] 1
21 2015-04-03 21:41:36 Avoid SettingWithCopyWarning\n\nfixes #162\n cdwinslow@gmail.com David Winslow c03e3d20fae49a6d2f0458a4132af557b7ec355b [6856dc4b4b7ce515c34c180f5ff72dd1b2676b1e] 2015-04-03 21:41:36 [bigbang/archive.py] 5
22 2015-04-02 23:45:44 committing twopeople\n kdkim@berkeley.edu Ki Deuk Kim 3fa34b21dc5e7d6c7a7154fcda9473f4b0f18f93 [e57bd1d4a81466b73027808d1f55fb9b4c671072] 2015-04-02 23:45:44 [bigbang/twopeople.py] 6
23 2015-04-02 23:26:23 updated robustness notebook\n r.agrawal@berkeley.edu Raj Agrawal 039df37b77929fe52b183dfbf436254b95a4742d [a69e75b9e36afaf1a1b7af1f51ef00e9c3468095] 2015-04-02 23:26:23 [bigbang/twopeople.py, examples/Collaboration ... 7
24 2015-04-01 04:14:15 Merge branch 'dwins-email_character_sets'\n sbenthall@gmail.com sb 6856dc4b4b7ce515c34c180f5ff72dd1b2676b1e [05d773f13331693d796a75daac2529b2efb8ccff, 561... 2015-04-01 04:14:15 [bigbang/mailman.py] 1
25 2015-03-31 20:34:42 Consistently represent email data as Unicode\n cdwinslow@gmail.com David Winslow 56140670a9f627e226d449c17d29544be6f5598d [05d773f13331693d796a75daac2529b2efb8ccff] 2015-03-31 20:34:42 [bigbang/mailman.py] 5
26 2015-03-31 04:50:46 changing type attribute to be keyed to string ... sbenthall@gmail.com sb 05d773f13331693d796a75daac2529b2efb8ccff [3e1c1f07f1b0d4a55751405b65004bd2b469945f] 2015-03-31 04:50:46 [examples/Git Diffs.ipynb] 1
27 2015-03-30 01:08:56 Merge pull request #182 from Aryan-Barbarian/g... sbenthall@gmail.com Sebastian Benthall 3e1c1f07f1b0d4a55751405b65004bd2b469945f [11905640d44377fb0c007cd340ab780e408f2d10, a71... 2015-03-30 01:08:56 [.gitignore, README.md, bigbang/git_repo.py, b... 1
28 2015-03-24 04:43:47 Added the option to override the cache and for... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh a713fad3a49cbb803cac33b01cfa3283fe20840f [225b0ee0c3b4db0cda06155eacc1b7d945572306] 2015-03-24 04:43:47 [bigbang/git_repo.py, bigbang/repo_loader.py, ... 4
29 2015-03-24 04:17:58 Fixed bugs relating to caching the data.\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 225b0ee0c3b4db0cda06155eacc1b7d945572306 [d51c62ea197eedbe3ff7ff63ebb2c1a9a497b21f] 2015-03-24 04:17:58 [bigbang/git_repo.py, bigbang/repo_loader.py, ... 4
30 2015-03-24 03:55:41 Repo Loader wasn't importing pandas\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh d51c62ea197eedbe3ff7ff63ebb2c1a9a497b21f [c5919b8d0fc2482b172923e58e51dad54ff209f9] 2015-03-24 03:55:41 [bigbang/repo_loader.py] 4
31 2015-03-24 03:54:51 Repo Loader tries to cache now?\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh c5919b8d0fc2482b172923e58e51dad54ff209f9 [fa5688b0711d68ec0ffa436d7f31c73907c81e35] 2015-03-24 03:54:51 [bigbang/repo_loader.py] 4
32 2015-03-24 03:40:36 Git Repo takes flags for initialization now. N... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh fa5688b0711d68ec0ffa436d7f31c73907c81e35 [c886ee31fbd48f17afc1b3158983591a17389dfd] 2015-03-24 03:40:36 [bigbang/git_repo.py] 4
33 2015-03-22 19:51:47 Fixed issues in the ipython notebooks regardin... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh c886ee31fbd48f17afc1b3158983591a17389dfd [d5187fadf9a8529bfc57ac9bade890cd7167a20b] 2015-03-22 19:51:47 [examples/Committer Dominance.ipynb, examples/... 4
34 2015-03-22 19:32:33 Moved git files into the main bigbang library.... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh d5187fadf9a8529bfc57ac9bade890cd7167a20b [89de558656441f4f4e2ec16cc96d757c073d4772] 2015-03-22 19:32:33 [bigbang/git_repo.py, bigbang/repo_loader.py, ... 4
35 2015-03-17 21:26:03 Fixing the readme\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 89de558656441f4f4e2ec16cc96d757c073d4772 [befc9ba1742ca9cd8eb2dfc03be3289ab1d1a99d] 2015-03-17 21:26:03 [README.md] 4
36 2015-03-17 21:14:42 One more tweak to the README\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh befc9ba1742ca9cd8eb2dfc03be3289ab1d1a99d [d0f9f1f7e62d9471b8aba0e52831bd93f7fb6501] 2015-03-17 21:14:42 [README.md] 4
37 2015-03-17 21:10:21 Updated README\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh d0f9f1f7e62d9471b8aba0e52831bd93f7fb6501 [b7c4d709b0a07972c90b336a0f7a667981416b7a] 2015-03-17 21:10:21 [README.md] 4
38 2015-03-17 20:41:16 The repo loader can now correctly fetch files.\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh b7c4d709b0a07972c90b336a0f7a667981416b7a [974c7a2e1765365dd40705e6ae7b41d9f984a118] 2015-03-17 20:41:16 [git_data/RepoLoader.py] 4
39 2015-03-17 20:27:58 Small bug with repo loader\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 974c7a2e1765365dd40705e6ae7b41d9f984a118 [598cf71c6697e4e346894bb58dfbeb30bda3c4aa] 2015-03-17 20:27:58 [git_data/RepoLoader.py] 4
40 2015-03-17 20:26:12 RepoLoader generates the sample git directory ... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 598cf71c6697e4e346894bb58dfbeb30bda3c4aa [a0f02f7f9a401c79815df5f5f52ca483dd6c007b] 2015-03-17 20:26:12 [git_data/RepoLoader.py] 4
41 2015-03-17 20:25:37 Moved a lot of git repo loading functionality ... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh a0f02f7f9a401c79815df5f5f52ca483dd6c007b [8c102702f168ba86a8bb81802fe61db70361dfb0] 2015-03-17 20:25:37 [git_data/RepoLoader.py] 4
42 2015-03-17 20:06:49 Very rough first draft of repo loader\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 8c102702f168ba86a8bb81802fe61db70361dfb0 [296dd9a35d2aa006b8f8e9c32852b073e961b3bd] 2015-03-17 20:06:49 [bin/collect_git.py, git_data/RepoLoader.py] 4
43 2015-03-17 19:14:08 collect git now imports from Repository Loader\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 296dd9a35d2aa006b8f8e9c32852b073e961b3bd [0ff39bd05a7b4b792459b991a0f726422c7d2ef0] 2015-03-17 19:14:08 [bin/collect_git.py, git_data/GitRepo.py, git_... 4
44 2015-03-17 18:53:17 Slight cleanup in collect git script\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 0ff39bd05a7b4b792459b991a0f726422c7d2ef0 [4f5104300b17035460a9f5e7819f8999da72e75b] 2015-03-17 18:53:17 [bin/collect_git.py] 4
45 2015-03-17 18:31:44 Merge remote-tracking branch 'upstream/master'... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 4f5104300b17035460a9f5e7819f8999da72e75b [f54194242ea036274d788039e77b2619020434dd, 119... 2015-03-17 18:31:44 [bigbang/mailman.py, bigbang/twopeople.py, req... 4
46 2015-03-17 00:03:27 Merge branch 'raj4-master'\n sbenthall@gmail.com sb 11905640d44377fb0c007cd340ab780e408f2d10 [00f13d97385763b699b52b562fc204d80149098b, 9f6... 2015-03-17 00:03:27 [bigbang/twopeople.py] 1
47 2015-03-17 00:03:11 Merge branch 'master' of https://github.com/ra... sbenthall@gmail.com sb 9f6c74e01dbbdd14468befa8cde1de82d08d7935 [00f13d97385763b699b52b562fc204d80149098b, a69... 2015-03-17 00:03:11 [bigbang/twopeople.py] 1
48 2015-03-16 23:56:02 functions to create df\n r.agrawal@berkeley.edu Raj Agrawal a69e75b9e36afaf1a1b7af1f51ef00e9c3468095 [847720442d7cab223a6c83f0bd9db37ca28bdfbd] 2015-03-16 23:56:02 [bigbang/twopeople.py] 7
49 2015-03-14 20:18:01 fixing variable reference in data collection e... sbenthall@gmail.com sb 00f13d97385763b699b52b562fc204d80149098b [701212ecb79f1b400c2e293d98ff582c750532d0] 2015-03-14 20:18:01 [bigbang/mailman.py] 1
50 2015-03-12 21:51:25 adding jsonschema as a pip requirement\n sbenthall@gmail.com sb 701212ecb79f1b400c2e293d98ff582c750532d0 [aef98ed18e82a52ca4dfc593769f99f4618f8edb] 2015-03-12 21:51:25 [requirements.txt] 1
51 2015-03-10 20:33:56 git will now ignore the git_locals.json file, ... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh f54194242ea036274d788039e77b2619020434dd [aef98ed18e82a52ca4dfc593769f99f4618f8edb] 2015-03-10 20:33:56 [.gitignore] 4
52 2015-03-10 00:06:20 Merge branch 'cool9210-master'\n sbenthall@gmail.com sb aef98ed18e82a52ca4dfc593769f99f4618f8edb [a87af8aed3e0e2fb964579b8a7144361d4c19d2f, e57... 2015-03-10 00:06:20 [examples/Collaboration Robustness.ipynb] 1
53 2015-03-10 00:01:08 Merge branch 'master' of https://github.com/co... kdkim@berkeley.edu Ki Deuk Kim e57bd1d4a81466b73027808d1f55fb9b4c671072 [0547569578a496cf80d153ca9cf2d20849c1736c, 4ba... 2015-03-10 00:01:08 [] 6
54 2015-03-09 23:52:56 This change is adding duration, reciprocity, a... kdkim@berkeley.edu Ki Deuk Kim 0547569578a496cf80d153ca9cf2d20849c1736c [a87af8aed3e0e2fb964579b8a7144361d4c19d2f] 2015-03-09 23:52:56 [examples/Collaboration Robustness.ipynb] 6
55 2015-03-09 23:40:02 Merge branch 'raj4-master'\n sbenthall@gmail.com sb a87af8aed3e0e2fb964579b8a7144361d4c19d2f [8c450a41c5446db94c0cff7151a8ef2297c43a07, 847... 2015-03-09 23:40:02 [bigbang/twopeople.py] 1
56 2015-03-09 23:31:02 first commit\n r.agrawal@berkeley.edu Raj Agrawal 847720442d7cab223a6c83f0bd9db37ca28bdfbd [8c450a41c5446db94c0cff7151a8ef2297c43a07] 2015-03-09 23:31:02 [bigbang/twopeople.py] 7
57 2015-03-09 23:20:54 Create twopeople.py kdkim@berkeley.edu Ki Deuk Kim 4ba2d1df3cb06eec91795ff22489b5533690dcfa [8c450a41c5446db94c0cff7151a8ef2297c43a07] 2015-03-09 23:20:54 [bigbang/twopeople.py] 6
58 2015-03-05 22:47:56 Merge branch 'vsporeddy'\n sbenthall@gmail.com sb 8c450a41c5446db94c0cff7151a8ef2297c43a07 [0b47f504de03817db97e0d3556c98f7c252bc0f9, fef... 2015-03-05 22:47:56 [examples/Git Diffs.ipynb] 1
59 2015-03-04 05:48:45 Update Git Diffs.ipynb\n\nAdded node colors an... vs.poreddy@gmail.com Venkata Poreddy fefb82dbc2b827cafb47edea9678f43f2a411681 [0b47f504de03817db97e0d3556c98f7c252bc0f9] 2015-03-04 05:48:45 [examples/Git Diffs.ipynb] 3
... ... ... ... ... ... ... ... ...

372 rows × 9 columns

MultiRepos

These are the ways we can get MultiGitRepo objects. MultiGitRepo objects are GitRepos that were created with a list of GitRepos. Basically, a MultiGitRepo's commit_data contains the commit_data from all of its GitRepos. The only difference is that each entry has an extra attribute, Repo Name that tells us which Repo that commit is initially from.

List of Repos / List of Repo Names (get_multi_repo)

This is rather simple. We can call the get_multi_repo method with either a list of repo names ["bigbang", "django", "scipy"] or a list of actual GitRepo objects. This returns us the merged MultiGitRepo. Please note that this will not work if a local clone / cache of the repos does not exist for every repo name (e.g. if you ask for ["bigbang", "django", "scipy"], you must already have a local copy of those in your sample_git_repos directory.

Github Organization's Repos (get_org_multirepo)

This is more useful to us. We can use this method to get a MultiGitRepo that contains the information from every repo in a Github Organization. This requires that we input the organization's name exactly as it appears on Github (edX, glass-bead-labs, codeforamerica, etc.)

It will look for examples/{org_name}_urls.txt, which should be a file that contains all of the git urls of the projects that belong to that organization. If this file doesn't yet exist, it will make a call to the Github API. This requires a stable internet connection, and it may randomly stall on requests that do not time out.

The function will then use the list of git urls and the get_repo method to get each repo. It will use this list of repos to create a MultiGitRepo object, using get_multi_repo.

Note that the examples below will not work if you don't have an internet connection, and may take some time to process. The first call may also fail if you do not have all of the repositories


In [2]:
# Using GitHub API
multirepo = repo_loader.get_org_multirepo("glass-bead-labs")

# List of repo names
multirepo = repo_loader.get_multi_repo(repo_names = ["bigbang","bead.glass"])

# List of actual repos
repo1 = repo_loader.get_repo("bigbang", in_type="name")
repo2 = repo_loader.get_repo("bead.glass", in_type="name")
multirepo = repo_loader.get_multi_repo(repos = [repo1, repo2])

multirepo.commit_data


Out[2]:
Unnamed: 0 Commit Message Committer Email Committer Name HEXSHA Parent Commit Time Touched File Person-ID Repo Name
0 2015-04-13 22:49:33 Merge pull request #195 from jesscxu/master\n\... sbenthall@gmail.com Sebastian Benthall e6f985d15ff4736a08e2112b6c7ff0c0d0836a75 [02d30c7ba4b02e899c4f098531812ca390983c0b, 5b5... 2015-04-13 22:49:33 [examples/viz/git/glass.json, examples/viz/git... 1 bigbang
1 2015-04-13 22:44:21 Adding d3 visualization of GitDiff.ipynb graph\n jcxu@berkeley.edu Jessica Xu 5b54cc96d652a07b12b5c31d4f5ad5269e1aec37 [02d30c7ba4b02e899c4f098531812ca390983c0b] 2015-04-13 22:44:21 [examples/viz/git/glass.json, examples/viz/git... 2 bigbang
2 2015-04-10 21:59:33 Merge pull request #194 from vsporeddy/master\... sbenthall@gmail.com Sebastian Benthall 02d30c7ba4b02e899c4f098531812ca390983c0b [3723718c356155a8c2c2104e813d61263a1f23c7, 2ec... 2015-04-10 21:59:33 [examples/File Dependency Network.ipynb] 1 bigbang
3 2015-04-10 18:19:22 Changed to directed graph vs.poreddy@gmail.com Venkata Poreddy 2ec31ee60878a08e5738dfa40245740e79dde97c [f5316bf07da3d4d51ac3bc1875b24d10693daa02] 2015-04-10 18:19:22 [examples/File Dependency Network.ipynb] 3 bigbang
4 2015-04-10 18:18:13 Merge pull request #3 from sbenthall/master\n\... vs.poreddy@gmail.com Venkata Poreddy f5316bf07da3d4d51ac3bc1875b24d10693daa02 [9aacab2a8eb5e7eabcb227caea5a82d99e5f8835, 372... 2015-04-10 18:18:13 [bigbang/git_repo.py, bigbang/repo_loader.py] 3 bigbang
5 2015-04-10 17:54:34 Merge pull request #192 from Aryan-Barbarian/m... sbenthall@gmail.com Sebastian Benthall 3723718c356155a8c2c2104e813d61263a1f23c7 [a22c55ea0887bdff8f62e50d2abdca02f6fdbce6, ed6... 2015-04-10 17:54:34 [bigbang/git_repo.py, bigbang/repo_loader.py] 1 bigbang
6 2015-04-10 17:53:13 Merge pull request #193 from vsporeddy/master\... sbenthall@gmail.com Sebastian Benthall a22c55ea0887bdff8f62e50d2abdca02f6fdbce6 [2b1f678c8ad75458b6a6b7484bed0ca72baee298, 9aa... 2015-04-10 17:53:13 [bigbang/get_dependencies.py, examples/File De... 1 bigbang
7 2015-04-10 17:30:29 Fixed an issue where git repos with hyphens in... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh ed60740e26981e216542a258c0c5aa0afa50af95 [8dac7fc397738b057d7fbdcd2bea1552e6f88339] 2015-04-10 17:30:29 [bigbang/repo_loader.py] 4 bigbang
8 2015-04-10 16:55:36 Update File Dependency Network.ipynb vs.poreddy@gmail.com Venkata Poreddy 9aacab2a8eb5e7eabcb227caea5a82d99e5f8835 [465c3a275bc341e2dab9d43c0363c2a7fff59b15] 2015-04-10 16:55:36 [examples/File Dependency Network.ipynb] 3 bigbang
9 2015-04-10 16:54:44 Create get_dependencies.py vs.poreddy@gmail.com Venkata Poreddy 465c3a275bc341e2dab9d43c0363c2a7fff59b15 [95e074b3e32017adf92e74a8fb19e471bf95f1ee] 2015-04-10 16:54:44 [bigbang/get_dependencies.py] 3 bigbang
10 2015-04-10 16:53:57 Update requirements.txt vs.poreddy@gmail.com Venkata Poreddy 95e074b3e32017adf92e74a8fb19e471bf95f1ee [68a5743f1cfe1241cb2608739418850b0b285360] 2015-04-10 16:53:57 [requirements.txt] 3 bigbang
11 2015-04-10 16:53:31 Create File Dependency Network.ipynb vs.poreddy@gmail.com Venkata Poreddy 68a5743f1cfe1241cb2608739418850b0b285360 [be536710f94ec072e04431e7cd043ad24f5f1afb] 2015-04-10 16:53:31 [examples/File Dependency Network.ipynb] 3 bigbang
12 2015-04-10 16:18:26 Merge pull request #2 from sbenthall/master\n\... vs.poreddy@gmail.com Venkata Poreddy be536710f94ec072e04431e7cd043ad24f5f1afb [3287f61619d148ccb7deb77c4821812d1dc9cff0, 2b1... 2015-04-10 16:18:26 [.gitignore, README.md, bigbang/archive.py, bi... 3 bigbang
13 2015-04-10 11:06:56 Warning people how long git diffs will take\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 8dac7fc397738b057d7fbdcd2bea1552e6f88339 [0db0b375fcb90522f6a8700d87820e8fd91e5343] 2015-04-10 11:06:56 [bigbang/git_repo.py] 4 bigbang
14 2015-04-10 10:56:57 Fixed another bug with repo loading logic\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 0db0b375fcb90522f6a8700d87820e8fd91e5343 [a121a04579461d4a520fbe4113f0cd0b3a052911] 2015-04-10 10:56:57 [bigbang/git_repo.py, bigbang/repo_loader.py] 4 bigbang
15 2015-04-10 10:35:54 Fixed repo loading bug. The answer fetched was... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh a121a04579461d4a520fbe4113f0cd0b3a052911 [2b1f678c8ad75458b6a6b7484bed0ca72baee298] 2015-04-10 10:35:54 [bigbang/git_repo.py, bigbang/repo_loader.py] 4 bigbang
16 2015-04-06 23:30:06 Merge pull request #190 from dwins/setting_wit... sbenthall@gmail.com Sebastian Benthall 2b1f678c8ad75458b6a6b7484bed0ca72baee298 [48dfc9b5472471b5a8768f56566c6246c63aa3fe, c03... 2015-04-06 23:30:06 [bigbang/archive.py] 1 bigbang
17 2015-04-06 23:21:00 Merge branch 'raj4-master'\n sbenthall@gmail.com sb 48dfc9b5472471b5a8768f56566c6246c63aa3fe [ff0a46b3afac4995517d7dc0ad1281f457e818b4, bc5... 2015-04-06 23:21:00 [examples/Collaboration Robustness.ipynb] 1 bigbang
18 2015-04-06 23:20:37 Merge branch 'master' of https://github.com/ra... sbenthall@gmail.com sb bc5ccc1fe3034f939ef2f74789a949d2f3604694 [ff0a46b3afac4995517d7dc0ad1281f457e818b4, 039... 2015-04-06 23:20:37 [examples/Collaboration Robustness.ipynb] 1 bigbang
19 2015-04-06 23:13:58 Merge branch 'cool9210-master'\n sbenthall@gmail.com sb ff0a46b3afac4995517d7dc0ad1281f457e818b4 [6856dc4b4b7ce515c34c180f5ff72dd1b2676b1e, 505... 2015-04-06 23:13:58 [bigbang/twopeople.py] 1 bigbang
20 2015-04-06 23:13:27 Merge branch 'master' of https://github.com/co... sbenthall@gmail.com sb 505689d8494bab11e69f0687364dbba2a461b532 [6856dc4b4b7ce515c34c180f5ff72dd1b2676b1e, 3fa... 2015-04-06 23:13:27 [bigbang/twopeople.py] 1 bigbang
21 2015-04-03 21:41:36 Avoid SettingWithCopyWarning\n\nfixes #162\n cdwinslow@gmail.com David Winslow c03e3d20fae49a6d2f0458a4132af557b7ec355b [6856dc4b4b7ce515c34c180f5ff72dd1b2676b1e] 2015-04-03 21:41:36 [bigbang/archive.py] 5 bigbang
22 2015-04-02 23:45:44 committing twopeople\n kdkim@berkeley.edu Ki Deuk Kim 3fa34b21dc5e7d6c7a7154fcda9473f4b0f18f93 [e57bd1d4a81466b73027808d1f55fb9b4c671072] 2015-04-02 23:45:44 [bigbang/twopeople.py] 6 bigbang
23 2015-04-02 23:26:23 updated robustness notebook\n r.agrawal@berkeley.edu Raj Agrawal 039df37b77929fe52b183dfbf436254b95a4742d [a69e75b9e36afaf1a1b7af1f51ef00e9c3468095] 2015-04-02 23:26:23 [bigbang/twopeople.py, examples/Collaboration ... 7 bigbang
24 2015-04-01 04:14:15 Merge branch 'dwins-email_character_sets'\n sbenthall@gmail.com sb 6856dc4b4b7ce515c34c180f5ff72dd1b2676b1e [05d773f13331693d796a75daac2529b2efb8ccff, 561... 2015-04-01 04:14:15 [bigbang/mailman.py] 1 bigbang
25 2015-03-31 20:34:42 Consistently represent email data as Unicode\n cdwinslow@gmail.com David Winslow 56140670a9f627e226d449c17d29544be6f5598d [05d773f13331693d796a75daac2529b2efb8ccff] 2015-03-31 20:34:42 [bigbang/mailman.py] 5 bigbang
26 2015-03-31 04:50:46 changing type attribute to be keyed to string ... sbenthall@gmail.com sb 05d773f13331693d796a75daac2529b2efb8ccff [3e1c1f07f1b0d4a55751405b65004bd2b469945f] 2015-03-31 04:50:46 [examples/Git Diffs.ipynb] 1 bigbang
27 2015-03-30 01:08:56 Merge pull request #182 from Aryan-Barbarian/g... sbenthall@gmail.com Sebastian Benthall 3e1c1f07f1b0d4a55751405b65004bd2b469945f [11905640d44377fb0c007cd340ab780e408f2d10, a71... 2015-03-30 01:08:56 [.gitignore, README.md, bigbang/git_repo.py, b... 1 bigbang
28 2015-03-24 04:43:47 Added the option to override the cache and for... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh a713fad3a49cbb803cac33b01cfa3283fe20840f [225b0ee0c3b4db0cda06155eacc1b7d945572306] 2015-03-24 04:43:47 [bigbang/git_repo.py, bigbang/repo_loader.py, ... 4 bigbang
29 2015-03-24 04:17:58 Fixed bugs relating to caching the data.\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 225b0ee0c3b4db0cda06155eacc1b7d945572306 [d51c62ea197eedbe3ff7ff63ebb2c1a9a497b21f] 2015-03-24 04:17:58 [bigbang/git_repo.py, bigbang/repo_loader.py, ... 4 bigbang
30 2015-03-24 03:55:41 Repo Loader wasn't importing pandas\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh d51c62ea197eedbe3ff7ff63ebb2c1a9a497b21f [c5919b8d0fc2482b172923e58e51dad54ff209f9] 2015-03-24 03:55:41 [bigbang/repo_loader.py] 4 bigbang
31 2015-03-24 03:54:51 Repo Loader tries to cache now?\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh c5919b8d0fc2482b172923e58e51dad54ff209f9 [fa5688b0711d68ec0ffa436d7f31c73907c81e35] 2015-03-24 03:54:51 [bigbang/repo_loader.py] 4 bigbang
32 2015-03-24 03:40:36 Git Repo takes flags for initialization now. N... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh fa5688b0711d68ec0ffa436d7f31c73907c81e35 [c886ee31fbd48f17afc1b3158983591a17389dfd] 2015-03-24 03:40:36 [bigbang/git_repo.py] 4 bigbang
33 2015-03-22 19:51:47 Fixed issues in the ipython notebooks regardin... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh c886ee31fbd48f17afc1b3158983591a17389dfd [d5187fadf9a8529bfc57ac9bade890cd7167a20b] 2015-03-22 19:51:47 [examples/Committer Dominance.ipynb, examples/... 4 bigbang
34 2015-03-22 19:32:33 Moved git files into the main bigbang library.... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh d5187fadf9a8529bfc57ac9bade890cd7167a20b [89de558656441f4f4e2ec16cc96d757c073d4772] 2015-03-22 19:32:33 [bigbang/git_repo.py, bigbang/repo_loader.py, ... 4 bigbang
35 2015-03-17 21:26:03 Fixing the readme\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 89de558656441f4f4e2ec16cc96d757c073d4772 [befc9ba1742ca9cd8eb2dfc03be3289ab1d1a99d] 2015-03-17 21:26:03 [README.md] 4 bigbang
36 2015-03-17 21:14:42 One more tweak to the README\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh befc9ba1742ca9cd8eb2dfc03be3289ab1d1a99d [d0f9f1f7e62d9471b8aba0e52831bd93f7fb6501] 2015-03-17 21:14:42 [README.md] 4 bigbang
37 2015-03-17 21:10:21 Updated README\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh d0f9f1f7e62d9471b8aba0e52831bd93f7fb6501 [b7c4d709b0a07972c90b336a0f7a667981416b7a] 2015-03-17 21:10:21 [README.md] 4 bigbang
38 2015-03-17 20:41:16 The repo loader can now correctly fetch files.\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh b7c4d709b0a07972c90b336a0f7a667981416b7a [974c7a2e1765365dd40705e6ae7b41d9f984a118] 2015-03-17 20:41:16 [git_data/RepoLoader.py] 4 bigbang
39 2015-03-17 20:27:58 Small bug with repo loader\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 974c7a2e1765365dd40705e6ae7b41d9f984a118 [598cf71c6697e4e346894bb58dfbeb30bda3c4aa] 2015-03-17 20:27:58 [git_data/RepoLoader.py] 4 bigbang
40 2015-03-17 20:26:12 RepoLoader generates the sample git directory ... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 598cf71c6697e4e346894bb58dfbeb30bda3c4aa [a0f02f7f9a401c79815df5f5f52ca483dd6c007b] 2015-03-17 20:26:12 [git_data/RepoLoader.py] 4 bigbang
41 2015-03-17 20:25:37 Moved a lot of git repo loading functionality ... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh a0f02f7f9a401c79815df5f5f52ca483dd6c007b [8c102702f168ba86a8bb81802fe61db70361dfb0] 2015-03-17 20:25:37 [git_data/RepoLoader.py] 4 bigbang
42 2015-03-17 20:06:49 Very rough first draft of repo loader\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 8c102702f168ba86a8bb81802fe61db70361dfb0 [296dd9a35d2aa006b8f8e9c32852b073e961b3bd] 2015-03-17 20:06:49 [bin/collect_git.py, git_data/RepoLoader.py] 4 bigbang
43 2015-03-17 19:14:08 collect git now imports from Repository Loader\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 296dd9a35d2aa006b8f8e9c32852b073e961b3bd [0ff39bd05a7b4b792459b991a0f726422c7d2ef0] 2015-03-17 19:14:08 [bin/collect_git.py, git_data/GitRepo.py, git_... 4 bigbang
44 2015-03-17 18:53:17 Slight cleanup in collect git script\n aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 0ff39bd05a7b4b792459b991a0f726422c7d2ef0 [4f5104300b17035460a9f5e7819f8999da72e75b] 2015-03-17 18:53:17 [bin/collect_git.py] 4 bigbang
45 2015-03-17 18:31:44 Merge remote-tracking branch 'upstream/master'... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh 4f5104300b17035460a9f5e7819f8999da72e75b [f54194242ea036274d788039e77b2619020434dd, 119... 2015-03-17 18:31:44 [bigbang/mailman.py, bigbang/twopeople.py, req... 4 bigbang
46 2015-03-17 00:03:27 Merge branch 'raj4-master'\n sbenthall@gmail.com sb 11905640d44377fb0c007cd340ab780e408f2d10 [00f13d97385763b699b52b562fc204d80149098b, 9f6... 2015-03-17 00:03:27 [bigbang/twopeople.py] 1 bigbang
47 2015-03-17 00:03:11 Merge branch 'master' of https://github.com/ra... sbenthall@gmail.com sb 9f6c74e01dbbdd14468befa8cde1de82d08d7935 [00f13d97385763b699b52b562fc204d80149098b, a69... 2015-03-17 00:03:11 [bigbang/twopeople.py] 1 bigbang
48 2015-03-16 23:56:02 functions to create df\n r.agrawal@berkeley.edu Raj Agrawal a69e75b9e36afaf1a1b7af1f51ef00e9c3468095 [847720442d7cab223a6c83f0bd9db37ca28bdfbd] 2015-03-16 23:56:02 [bigbang/twopeople.py] 7 bigbang
49 2015-03-14 20:18:01 fixing variable reference in data collection e... sbenthall@gmail.com sb 00f13d97385763b699b52b562fc204d80149098b [701212ecb79f1b400c2e293d98ff582c750532d0] 2015-03-14 20:18:01 [bigbang/mailman.py] 1 bigbang
50 2015-03-12 21:51:25 adding jsonschema as a pip requirement\n sbenthall@gmail.com sb 701212ecb79f1b400c2e293d98ff582c750532d0 [aef98ed18e82a52ca4dfc593769f99f4618f8edb] 2015-03-12 21:51:25 [requirements.txt] 1 bigbang
51 2015-03-10 20:33:56 git will now ignore the git_locals.json file, ... aryan.falahatpisheh@berkeley.edu Aryan Falahatpisheh f54194242ea036274d788039e77b2619020434dd [aef98ed18e82a52ca4dfc593769f99f4618f8edb] 2015-03-10 20:33:56 [.gitignore] 4 bigbang
52 2015-03-10 00:06:20 Merge branch 'cool9210-master'\n sbenthall@gmail.com sb aef98ed18e82a52ca4dfc593769f99f4618f8edb [a87af8aed3e0e2fb964579b8a7144361d4c19d2f, e57... 2015-03-10 00:06:20 [examples/Collaboration Robustness.ipynb] 1 bigbang
53 2015-03-10 00:01:08 Merge branch 'master' of https://github.com/co... kdkim@berkeley.edu Ki Deuk Kim e57bd1d4a81466b73027808d1f55fb9b4c671072 [0547569578a496cf80d153ca9cf2d20849c1736c, 4ba... 2015-03-10 00:01:08 [] 6 bigbang
54 2015-03-09 23:52:56 This change is adding duration, reciprocity, a... kdkim@berkeley.edu Ki Deuk Kim 0547569578a496cf80d153ca9cf2d20849c1736c [a87af8aed3e0e2fb964579b8a7144361d4c19d2f] 2015-03-09 23:52:56 [examples/Collaboration Robustness.ipynb] 6 bigbang
55 2015-03-09 23:40:02 Merge branch 'raj4-master'\n sbenthall@gmail.com sb a87af8aed3e0e2fb964579b8a7144361d4c19d2f [8c450a41c5446db94c0cff7151a8ef2297c43a07, 847... 2015-03-09 23:40:02 [bigbang/twopeople.py] 1 bigbang
56 2015-03-09 23:31:02 first commit\n r.agrawal@berkeley.edu Raj Agrawal 847720442d7cab223a6c83f0bd9db37ca28bdfbd [8c450a41c5446db94c0cff7151a8ef2297c43a07] 2015-03-09 23:31:02 [bigbang/twopeople.py] 7 bigbang
57 2015-03-09 23:20:54 Create twopeople.py kdkim@berkeley.edu Ki Deuk Kim 4ba2d1df3cb06eec91795ff22489b5533690dcfa [8c450a41c5446db94c0cff7151a8ef2297c43a07] 2015-03-09 23:20:54 [bigbang/twopeople.py] 6 bigbang
58 2015-03-05 22:47:56 Merge branch 'vsporeddy'\n sbenthall@gmail.com sb 8c450a41c5446db94c0cff7151a8ef2297c43a07 [0b47f504de03817db97e0d3556c98f7c252bc0f9, fef... 2015-03-05 22:47:56 [examples/Git Diffs.ipynb] 1 bigbang
59 2015-03-04 05:48:45 Update Git Diffs.ipynb\n\nAdded node colors an... vs.poreddy@gmail.com Venkata Poreddy fefb82dbc2b827cafb47edea9678f43f2a411681 [0b47f504de03817db97e0d3556c98f7c252bc0f9] 2015-03-04 05:48:45 [examples/Git Diffs.ipynb] 3 bigbang
... ... ... ... ... ... ... ... ... ...

412 rows × 10 columns