DOE Grants: fundingrequestingofficeid -> funding_office_name contractingofficeid -> awarding_office_name dollarsobligated -> federal_action_obligation

Strategy

Every year, would need to download a very specific set of files (updated grants/contracts from each year). What if I downloaded the complete dataset every year? Would be really big, but could serve from an Amazon server and practice my cloud skills. Yeah, it's over 1TB...

Simpler/cheaper option would be to use the USASpending API directly, now that (hopefully!) they won't be changing it anymore. Then every year to update, need to make a specific API call to refresh the DOE/NSF data only, then proceed as usual.

I should just switch to all new datasets -- keeping two sets of code (one for new and one for legacy data) is going to be too complicated. Even if there are differences (I've shown that the recipients are fairly overlapping, at least), it's the public data that is available so I should just go with it. So I need to:

  • Create a method that performs a USASpending API call to get last N years of DOE/NSF grants+contracts.
  • Update cleaning methods to use new variable names for each type of
  • Add this data into a database (key = district code)
  • Work with Justin to integrate this data into WHIPS database/rendering system.

SULI Student strategy

Using the hardcoded address table with fuzzy matching didn't work (all SUNY schools mapped to one address!). Need to find an open-source replacement for the Google Places API, or re-read their terms to see if cacheing the lat/lons, getting the cong. districts from geocodio, and then deleting the lat/lons is in agreement with their terms.


In [ ]: