In [2]:
print 'Make the "Get the Data" widget code.'


Make the "Get the Data" widget code.

In [1]:
import bs4
import requests

def makeCode(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    fileNames = gitSoup.select('.js-directory-link') #get tag with URL for each file
    return fileNames
    
print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
makeCode(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20150811-fuel-use-AC
Out[1]:
[<a class="js-directory-link" href="/InsideEnergy/Data-for-stories/blob/master/20150811-fuel-use-AC/README.md" id="04c6e90faac2675aa89e2176d2eec7d8-428bb317930f94d3a174967f86ffa08c04b952d1" title="README.md">README.md</a>,
 <a class="js-directory-link" href="/InsideEnergy/Data-for-stories/blob/master/20150811-fuel-use-AC/fuel-use-AC.csv" id="cb9409ebfb9b0ad578daea945dfbe21f-d4b1c8f4468160d784451dc933eba65441eaab1d" title="fuel-use-AC.csv">fuel-use-AC.csv</a>,
 <a class="js-directory-link" href="/InsideEnergy/Data-for-stories/blob/master/20150811-fuel-use-AC/fuel-use-AC.xlsx" id="23bc8c02182b77ef466b2a1ce58bb10b-4c6c7444f11ca245d6cb65a8f2083aa1a9a90efe" title="fuel-use-AC.xlsx">fuel-use-AC.xlsx</a>]

In [3]:
import bs4
import requests

def makeCode(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    urls = []
    for f in files:
        urls.append(f.get('href'))
    print urls

        
    
print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
makeCode(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20150811-fuel-use-AC
['/InsideEnergy/Data-for-stories/blob/master/20150811-fuel-use-AC/README.md', '/InsideEnergy/Data-for-stories/blob/master/20150811-fuel-use-AC/fuel-use-AC.csv', '/InsideEnergy/Data-for-stories/blob/master/20150811-fuel-use-AC/fuel-use-AC.xlsx']

In [13]:
import bs4
import requests

def makeCode(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    urls = []
    for f in files:
        urls.append(f.get('href')) #put urls into list
    # print urls
    for u in urls:
        if "README.md" in u:
            urls.remove(u) #get README out of list
            return urls
    #print urls
    for u in urls:
        v = u.replace("/InsideEnergy/Data-for-stories/blob/master", "")
        return urls
        
    
print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
makeCode(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20150811-fuel-use-AC
Out[13]:
['/InsideEnergy/Data-for-stories/blob/master/20150811-fuel-use-AC/fuel-use-AC.csv',
 '/InsideEnergy/Data-for-stories/blob/master/20150811-fuel-use-AC/fuel-use-AC.xlsx']

In [29]:
import bs4
import requests

def stripUrls(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    urls = []
    for f in files:
        urls.append(f.get('href')) #put urls into list
    # print urls
    for u in urls:
        if "README.md" in u:
            urls.remove(u) #get README out of list
            #return urls
    #print urls
    halfUrls = []
    for v in urls:
        if "/InsideEnergy/Data-for-stories/blob/master" in v:
            w = v.replace("/InsideEnergy/Data-for-stories/blob/master", "")
            halfUrls.append(w)
    return halfUrls
        
print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
stripUrls(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20150811-fuel-use-AC
Out[29]:
['/20150811-fuel-use-AC/fuel-use-AC.csv',
 '/20150811-fuel-use-AC/fuel-use-AC.xlsx']

In [55]:
import bs4
import requests

def makeCode(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    urls = []
    for f in files:
        urls.append(f.get('href')) #put urls into list
    # print urls
    for u in urls:
        if "README.md" in u:
            urls.remove(u) #get README out of list
            #return urls
    #print urls
    halfUrls = []
    for v in urls:
        if "/InsideEnergy/Data-for-stories/blob/master" in v:
            w = v.replace("/InsideEnergy/Data-for-stories/blob/master", "")
            halfUrls.append(w)
    csvFile = halfUrls[0]
    codeHasCsv = "http://rawgit.com/insideenergy/Data-for-stories/master%s" % csvFile
    print codeHasCsv
    xlsFile = halfUrls[1]
    codeHasXls = "http://rawgit.com/insideenergy/Data-for-stories/master%s" % xlsFile
    print codeHasXls

print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
makeCode(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20140915-mining-worker-fatalities
http://rawgit.com/insideenergy/Data-for-stories/master/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.csv
http://rawgit.com/insideenergy/Data-for-stories/master/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.xlsx

In [58]:
import bs4
import requests

def makeCode(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    urls = []
    for f in files:
        urls.append(f.get('href')) #put urls into list
    # print urls
    for u in urls:
        if "README.md" in u:
            urls.remove(u) #get README out of list
            #return urls
    #print urls
    halfUrls = []
    for v in urls:
        if "/InsideEnergy/Data-for-stories/blob/master" in v:
            w = v.replace("/InsideEnergy/Data-for-stories/blob/master", "")
            halfUrls.append(w)
    csvFile = halfUrls[0]
    codeHasCsv = "http://rawgit.com/insideenergy/Data-for-stories/master%s" % csvFile
    #print codeHasCsv
    xlsFile = halfUrls[1]
    codeHasXls = "http://rawgit.com/insideenergy/Data-for-stories/master%s" % xlsFile
    #print codeHasXls
    widgetCode = "<small><strong> Get the data: <a href='" + codeHasCsv + "'>CSV</a> | <a href='" + codeHasXls + "'>XLS</a> | <a href='GOOGLE SHEETS LINK YOU JUST MADE' target='_blank'>Google Sheets</a> | Source and notes: <a href='" + folder + "'>Github</a> </strong></small>"       
    print widgetCode

print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
makeCode(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20140915-mining-worker-fatalities
<small><strong> Get the data: <a href='http://rawgit.com/insideenergy/Data-for-stories/master/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.csv'>CSV</a> | <a href='http://rawgit.com/insideenergy/Data-for-stories/master/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.xlsx'>XLS</a> | <a href='GOOGLE SHEETS LINK YOU JUST MADE' target='_blank'>Google Sheets</a> | Source and notes: <a href='https://github.com/InsideEnergy/Data-for-stories/tree/master/20140915-mining-worker-fatalities'>Github</a> </strong></small>

In [59]:
import bs4
import requests

def makeCode(folder, sheet):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    
    urls = []
    for f in files:
        urls.append(f.get('href')) #put urls into list
   
    for u in urls:
        if "README.md" in u:
            urls.remove(u) #get README out of list
            
    halfUrls = []
    for v in urls:
        if "/InsideEnergy/Data-for-stories/blob/master" in v:
            w = v.replace("/InsideEnergy/Data-for-stories/blob/master", "")
            halfUrls.append(w) #strip extra stuff off front of url
    csvFile = halfUrls[0]
    codeHasCsv = "http://rawgit.com/insideenergy/Data-for-stories/master%s" % csvFile
    xlsFile = halfUrls[1]
    codeHasXls = "http://rawgit.com/insideenergy/Data-for-stories/master%s" % xlsFile

    #now concatonate the code together
    widgetCode = "<small><strong> Get the data: <a href='" + codeHasCsv + "'>CSV</a> | <a href='" + codeHasXls + "'>XLS</a> | <a href='" + sheet + "' target='_blank'>Google Sheets</a> | Source and notes: <a href='" + folder + "'>Github</a> </strong></small>"       
    print widgetCode

print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
print "Enter Google Sheets URL for public viewing:"
mySheet = raw_input()
makeCode(myFolder, mySheet)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20140822-solar-installations
Enter Google Sheet URL for public viewing:
https://docs.google.com/spreadsheets/d/1ChpGgdUabNpMeowjeqDyz_5h1Toimgsauyovpi6iW_E/edit?usp=sharing
<small><strong> Get the data: <a href='http://rawgit.com/insideenergy/Data-for-stories/master/20140822-solar-installations/median-solar-installation-price-1998-2012.csv'>CSV</a> | <a href='http://rawgit.com/insideenergy/Data-for-stories/master/20140822-solar-installations/median-solar-installation-price-1998-2012.xlsx'>XLS</a> | <a href='https://docs.google.com/spreadsheets/d/1ChpGgdUabNpMeowjeqDyz_5h1Toimgsauyovpi6iW_E/edit?usp=sharing' target='_blank'>Google Sheets</a> | Source and notes: <a href='https://github.com/InsideEnergy/Data-for-stories/tree/master/20140822-solar-installations'>Github</a> </strong></small>

In [61]:
import bs4
import requests

def makeCode(folder, sheet):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    
    urls = []
    for f in files:
        urls.append(f.get('href')) #put urls into list
   
    for u in urls:
        if "README.md" in u:
            urls.remove(u) #get README out of list
            
    halfUrls = []
    for v in urls:
        if "/InsideEnergy/Data-for-stories/blob/master" in v:
            w = v.replace("/InsideEnergy/Data-for-stories/blob/master", "")
            halfUrls.append(w) #strip extra stuff off front of url
    csvFile = halfUrls[0]
    codeHasCsv = "http://rawgit.com/insideenergy/Data-for-stories/master%s" % csvFile
    xlsFile = halfUrls[1]
    codeHasXls = "http://rawgit.com/insideenergy/Data-for-stories/master%s" % xlsFile

    #now concatonate the code together
    widgetCode = "<small><strong> Get the data: <a href='" + codeHasCsv + "'>CSV</a> | <a href='" + codeHasXls + "'>XLS</a> | <a href='" + sheet + "' target='_blank'>Google Sheets</a> | Source and notes: <a href='" + folder + "'>Github</a> </strong></small>"       
    print widgetCode

print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
print "Enter Google Sheets URL for public viewing:"
mySheet = raw_input()
print "~~~~~~~~~~Widget Code - Paste this below your chart~~~~~~~~~~"
makeCode(myFolder, mySheet)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20140915-mining-worker-fatalities
Enter Google Sheets URL for public viewing:
https://docs.google.com/spreadsheets/d/1ChpGgdUabNpMeowjeqDyz_5h1Toimgsauyovpi6iW_E/edit?usp=sharing
~~~~~~~~~~Widget Code - Paste this below your chart~~~~~~~~~~
<small><strong> Get the data: <a href='http://rawgit.com/insideenergy/Data-for-stories/master/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.csv'>CSV</a> | <a href='http://rawgit.com/insideenergy/Data-for-stories/master/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.xlsx'>XLS</a> | <a href='https://docs.google.com/spreadsheets/d/1ChpGgdUabNpMeowjeqDyz_5h1Toimgsauyovpi6iW_E/edit?usp=sharing' target='_blank'>Google Sheets</a> | Source and notes: <a href='https://github.com/InsideEnergy/Data-for-stories/tree/master/20140915-mining-worker-fatalities'>Github</a> </strong></small>

In [1]:
#new function needs to strip off .csv and .xlsx
#needs to say, if two items match, get rid of duplicate
#then add each into its own widget code, enter new sheets input for each one

In [9]:
import bs4
import requests

def stripUrls(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    urls = []
    for f in files:
        urls.append(f.get('href')) #put urls into list
    # print urls
    for u in urls:
        if "README.md" in u:
            urls.remove(u) #get README out of list
            #return urls
    #print urls
    halfUrls = []
    for v in urls:
        if "/InsideEnergy/Data-for-stories/blob/master" in v:
            w = v.replace("/InsideEnergy/Data-for-stories/blob/master", "")
            halfUrls.append(w) #strip extra stuff off front of url
    return halfUrls
        
print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
stripUrls(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20140915-mining-worker-fatalities
Out[9]:
['/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.csv',
 '/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.xlsx']

In [2]:
import bs4
import requests

def stripUrls(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    urls = []
    for f in files:
        urls.append(f.get('href')) #put urls into list
    # print urls
    for u in urls:
        if "README.md" in u:
            urls.remove(u) #get README out of list

    halfUrls = []
    for v in urls:
        if "/InsideEnergy/Data-for-stories/blob/master" in v:
            w = v.replace("/InsideEnergy/Data-for-stories/blob/master", "")
            halfUrls.append(w) #strip extra stuff off front of url
    print halfUrls
    justFolders = []
    for x in halfUrls:
        if ".csv" in x:
            y = x.replace(".csv", "")
            justFolders.append(y)
        if ".xlsx" in x:
            z = x.replace(".xlsx", "")
            justFolders.append(z)
    print justFolders #gets file extensions off
    
        
print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
stripUrls(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20140915-mining-worker-fatalities
['/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.csv', '/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013.xlsx']
['/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013', '/20140915-mining-worker-fatalities/mining-worker-fatalities-1930-2013']

In [7]:
import bs4
import requests

def makeCode(folder):
    x = requests.get(folder)
    x.raise_for_status()
    gitSoup = bs4.BeautifulSoup(x.text)
    files = gitSoup.select('.js-directory-link') #get tag with URL for each file
    
    urls = [f.get('href') for f in files if 'README.md' not in f.get('href')] #put urls in list without readme filr

    halfUrls = []
    for v in urls:
        if "/InsideEnergy/Data-for-stories/blob/master" in v:
            w = v.replace("/InsideEnergy/Data-for-stories/blob/master", "")
            halfUrls.append(w) #strip extra stuff off front of url

    justFolders = []
    for x in halfUrls:
        if ".csv" in x:
            y = x.replace(".csv", "")
            justFolders.append(y)
        if ".xlsx" in x:
            z = x.replace(".xlsx", "")
            justFolders.append(z) #gets file extensions off 
    
    noDuplicates = []
    for z in justFolders:
        if z not in noDuplicates:
            noDuplicates.append(z) #gets rid of duplicates
    
    #now concatonate a code for each folder name, and ask for corresponding Google Sheets URL
    for i in noDuplicates:
        print "Enter the Google Sheets URL for public viewing that corresponds with " + i
        mySheet = raw_input()
        print "~~~~~~~~~~Widget code for " + i + "~~~~~~~~~~"
        print
        print '<small><strong> Get the data: <a href="http://rawgit.com/insideenergy/Data-for-stories/master' + i + '.csv">CSV</a> | <a href="http://rawgit.com/insideenergy/Data-for-stories/master' + i + '.xlsx">XLS</a> | <a href="' + mySheet + '" target="_blank">Google Sheets</a> | Source and notes: <a href="' + folder + '">Github</a> </strong></small>'
        print
        
        
print 'Make the "Get the Data" widget code.'
print "Enter GitHub ULR of your new folder inside 'Data-for-stories':"
myFolder = raw_input()
makeCode(myFolder)


Make the "Get the Data" widget code.
Enter GitHub ULR of your new folder inside 'Data-for-stories':
https://github.com/InsideEnergy/Data-for-stories/tree/master/20150326-oilprices-hiring
Enter the Google Sheets URL for public viewing that corresponds with /20150326-oilprices-hiring/nd-unemployment-claims
https://docs.google.com/spreadsheets/d/1SsVKi3xkMZZBeWbzgFWUXUQna_gG2e2CbvkNzdNQj-U/edit?usp=sharing
~~~~~~~~~~Widget code for /20150326-oilprices-hiring/nd-unemployment-claims~~~~~~~~~~

<small><strong> Get the data: <a href="http://rawgit.com/insideenergy/Data-for-stories/master/20150326-oilprices-hiring/nd-unemployment-claims.csv">CSV</a> | <a href="http://rawgit.com/insideenergy/Data-for-stories/master/20150326-oilprices-hiring/nd-unemployment-claims.xlsx">XLS</a> | <a href="https://docs.google.com/spreadsheets/d/1SsVKi3xkMZZBeWbzgFWUXUQna_gG2e2CbvkNzdNQj-U/edit?usp=sharing" target="_blank">Google Sheets</a> | Source and notes: <a href="https://github.com/InsideEnergy/Data-for-stories/tree/master/20150326-oilprices-hiring">Github</a> </strong></small>

Enter the Google Sheets URL for public viewing that corresponds with /20150326-oilprices-hiring/oilgas-job-openings
https://docs.google.com/spreadsheets/d/1LfrI2lGkFx0gzZxH5WmStLuQ2bcPaiJHJUwlJJMIfnI/edit?usp=sharing
~~~~~~~~~~Widget code for /20150326-oilprices-hiring/oilgas-job-openings~~~~~~~~~~

<small><strong> Get the data: <a href="http://rawgit.com/insideenergy/Data-for-stories/master/20150326-oilprices-hiring/oilgas-job-openings.csv">CSV</a> | <a href="http://rawgit.com/insideenergy/Data-for-stories/master/20150326-oilprices-hiring/oilgas-job-openings.xlsx">XLS</a> | <a href="https://docs.google.com/spreadsheets/d/1LfrI2lGkFx0gzZxH5WmStLuQ2bcPaiJHJUwlJJMIfnI/edit?usp=sharing" target="_blank">Google Sheets</a> | Source and notes: <a href="https://github.com/InsideEnergy/Data-for-stories/tree/master/20150326-oilprices-hiring">Github</a> </strong></small>


In [ ]: