1a. A demo of using pandas to bulk download files...

Building off the method we just looked at of using Pandas to grab a single file, we see here how Python can be quite effective at downloading industrial water data for all US states.

Compare running this script with doing the downloads by hand...


In [ ]:
#import os and pandas
import os
import pandas as pd

In [ ]:
#The 'us' module is useful for generating lists of us states...
#import the 'us' module, install if needed
try:
    import us
except:
    import pip
    pip.main(['install','us']);
    import us

In [ ]:
#Create a function to create a data frame of water data for a given state
def getData(state_abbr):
    
    #Update the url with the state code
    theURL = 'https://waterdata.usgs.gov/{}/nwis/water_use?format=rdb&rdb_compression=value&wu_area=County&wu_year=ALL&wu_county=ALL&wu_category=IN&wu_county_nms=--ALL%2BCounties--&wu_category_nms=Industrial'
    theURL = theURL.format(state_abbr.lower())
    
    #Get the data as a dataframe
    df = pd.read_csv(theURL,delimiter='\t',skiprows=range(49)+[50])
    
    #Return the df
    return df

In [ ]:
#Create a folder to hold all the downloads
outFolder = "WaterData"
if not os.path.exists(outFolder): os.mkdir(outFolder)

In [ ]:
#Loop through each state, download it's data, and save to a local file
for state in us.STATES:
    stateAbbr = state.abbr.lower()
    dfState = getData(stateAbbr)
    dfState.to_csv("WaterData/{}.csv".format(stateAbbr),index=False)