Need to loop over all of the legislation (10,000s) by 1,000 at a time. Extract the bill IDs, and then extract the bill text one-by-one. After retrieving the bill text, store it to a database on AWS with some associated metadata.

Then, I will need to figure out how to do that for the U.S. Congress. Use the U.S. Congress bill text as the training data, with the given subject terms, and use that to train. See how well that predicts other bills in the U.S. congress and use that model for the New York legislation. Go through a subset of the new york data and see if there are keywords or other information that can be used to hand label. Also, use the given terms to use as a broader base of keywords for labeling the U.S. data. Also, try running in an unsupervised setting to see how the data clusters.


In [1]:
import requests
my_key = open('/Users/Joel/Documents/insight/ny_bill_keys.txt', 'r').readline().strip()

In [2]:
import time

In [3]:
# Set up the database to save the results of the new york bill table
# There will be one table for the New York bills and one for U.S. bills
## Python packages - you may have to pip install sqlalchemy, sqlalchemy_utils, and psycopg2.
from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
import psycopg2
import pandas as pd

In [4]:
#In Python: Define a database name
dbname = 'bills_db'
username = 'Joel'
## 'engine' is a connection to a database
## Here, we're using postgres, but sqlalchemy can connect to other things too.
engine = create_engine('postgres://%s@localhost/%s'%(username,dbname))
print engine.url

## create a database (if it doesn't exist)
if not database_exists(engine.url):
    create_database(engine.url)
print(database_exists(engine.url))


postgres://Joel@localhost/bills_db
True

In [5]:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

In [6]:
from sqlalchemy import Column, Integer, String
class New_York_Bill(Base):
    __tablename__ = 'ny_bills'
    bill_num = Column(String, primary_key=True)
    bill_name = Column(String)
    bill_text = Column(String)

    def __repr__(self):
        return "<New_York_Bill(bill_num='%s', bill_name='%s', bill_text='%s')>" % (
            self.bill_num, self.bill_name, self.bill_text)

In [7]:
ny_bills_table = New_York_Bill.__table__

In [8]:
# Actually create the table
Base.metadata.create_all(engine)

In [9]:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()

In [10]:
#ny_bills_table.drop(engine)
# This seems painful. Drop the table from the command line before running the command below.

In [10]:
#requests.get('http://legislation.nysenate.gov/api/3/bills/2015/A02257?view=only_fullText&key=' + my_key).json()

In [11]:
# Run through a loop getting files 1,000 at a time until we receive all files
offset = 0
year = 2015
limit = 1000
#limit = 10
key = my_key
my_max = 50000
#my_max = 50
request_string = 'http://legislation.nysenate.gov/api/3/bills/{0}?limit={1}&offset={2}&key={3}'.format(year, 
                                                                                                        limit, 
                                                                                                        offset,
                                                                                                        key)
all_bills = requests.get(request_string).json()

while ((all_bills['responseType'] == 'bill-info list') and offset < my_max):
    print all_bills['offsetStart']
    offset += limit
    request_string = 'http://legislation.nysenate.gov/api/3/bills/{0}?limit={1}&offset={2}&key={3}'.format(year, 
                                                                                                        limit, 
                                                                                                        offset,
                                                                                                        key)
    all_bills = requests.get(request_string).json()
    
    if (all_bills['responseType'] == 'bill-info list'):
        for bill in all_bills['result']['items']:
            bill_num = bill['printNo']
            single_request = 'http://legislation.nysenate.gov/api/3/bills/{0}/{1}?view=only_fullText&key={2}'.format(
            year, bill_num, my_key)
            bill_data = requests.get(single_request).json()
            bill_text = bill_data['result']['fullText']
            #print bill_num
            #print bill['title']
            #print bill
            one_bill = New_York_Bill(bill_num=bill_num, bill_name=bill['title'], bill_text=bill_text)
            session.add(one_bill)
            time.sleep(1)
            
    time.sleep(2)
session.commit()


1
1000
2000
3000
4000
5000
---------------------------------------------------------------------------
ChunkedEncodingError                      Traceback (most recent call last)
<ipython-input-11-ff72f1fa376e> in <module>()
     20                                                                                                         offset,
     21                                                                                                         key)
---> 22     all_bills = requests.get(request_string).json()
     23 
     24     if (all_bills['responseType'] == 'bill-info list'):

/Users/Joel/anaconda/envs/insight/lib/python2.7/site-packages/requests/api.pyc in get(url, params, **kwargs)
     69 
     70     kwargs.setdefault('allow_redirects', True)
---> 71     return request('get', url, params=params, **kwargs)
     72 
     73 

/Users/Joel/anaconda/envs/insight/lib/python2.7/site-packages/requests/api.pyc in request(method, url, **kwargs)
     55     # cases, and look like a memory leak in others.
     56     with sessions.Session() as session:
---> 57         return session.request(method=method, url=url, **kwargs)
     58 
     59 

/Users/Joel/anaconda/envs/insight/lib/python2.7/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    473         }
    474         send_kwargs.update(settings)
--> 475         resp = self.send(prep, **send_kwargs)
    476 
    477         return resp

/Users/Joel/anaconda/envs/insight/lib/python2.7/site-packages/requests/sessions.pyc in send(self, request, **kwargs)
    615 
    616         if not stream:
--> 617             r.content
    618 
    619         return r

/Users/Joel/anaconda/envs/insight/lib/python2.7/site-packages/requests/models.pyc in content(self)
    739                     self._content = None
    740                 else:
--> 741                     self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
    742 
    743             except AttributeError:

/Users/Joel/anaconda/envs/insight/lib/python2.7/site-packages/requests/models.pyc in generate()
    665                         yield chunk
    666                 except ProtocolError as e:
--> 667                     raise ChunkedEncodingError(e)
    668                 except DecodeError as e:
    669                     raise ContentDecodingError(e)

ChunkedEncodingError: ('Connection broken: IncompleteRead(186 bytes read, 326 more expected)', IncompleteRead(186 bytes read, 326 more expected))

In [ ]:
from sqlalchemy import text
result = session.query(New_York_Bill).from_statement(text("SELECT * FROM ny_bills"))

In [13]:
all_bills = result.all()

In [14]:
len(all_bills)


Out[14]:
50

In [22]:
all_bills[0]


Out[22]:
<User(bill_num='A5244A', bill_name='Permits retail farm operations as an accessory use to agricultural lands pursuant to the Peconic Bay region community preservation fund', bill_text='
                           S T A T E   O F   N E W   Y O R K
       ________________________________________________________________________

           S. 3689                                                  A. 5244

                              2015-2016 Regular Sessions

                             S E N A T E - A S S E M B L Y

                                   February 13, 2015
                                      ___________

       IN SENATE -- Introduced by Sen. LAVALLE -- read twice and ordered print-
         ed, and when printed to be committed to the Committee on Local Govern-
         ment

       IN  ASSEMBLY  -- Introduced by M. of A. THIELE -- read once and referred
         to the Committee on Local Governments

       AN ACT to amend the town law, in  relation  to  permitting  retail  farm
         operations  as  an accessory use to agricultural lands pursuant to the
         Peconic Bay region community preservation fund

         THE PEOPLE OF THE STATE OF NEW YORK, REPRESENTED IN SENATE AND  ASSEM-
       BLY, DO ENACT AS FOLLOWS:

    1    Section 1. Subdivision 1 of section 64-e of the town law is amended by
    2  adding a new paragraph (e) to read as follows:
    3    (E) "RETAIL FARM OPERATION" MEANS A SEASONAL OR ANNUAL ENTERPRISE WITH
    4  EITHER  PERMANENT  OR  NONPERMANENT STRUCTURES THAT ARE OPERATED FOR THE
    5  PURPOSES OF SELLING PREDOMINATELY FARM AND FOOD PRODUCTS IN  CONJUNCTION
    6  WITH OR IN SUPPORT OF LAND USED IN AGRICULTURAL PRODUCTION AS DEFINED IN
    7  SUBDIVISION  FOUR  OF  SECTION  TWO  OF THE AGRICULTURE AND MARKETS LAW.
    8  SUCH PORTION OF THE FARM AND FOOD PRODUCTS SHALL EXCEED FIFTY PERCENT OF
    9  THE GROSS ANNUAL INCOME OF SUCH RETAIL FARM  OPERATION.  FARM  AND  FOOD
   10  PRODUCTS  SHALL  MEAN  ANY  AGRICULTURAL  PRODUCT  OF THE SOIL OR WATER,
   11  INCLUDING BUT NOT LIMITED TO  FRESH  OR  PROCESSED  FRUITS,  VEGETABLES,
   12  EGGS,  DAIRY  PRODUCTS,  MEAT  AND  MEAT  PRODUCTS,  POULTRY AND POULTRY
   13  PRODUCTS, FISH AND FISH PRODUCTS, APPLE CIDER, FRUIT JUICE, WINE,  ORNA-
   14  MENTAL PLANTS, NURSERY PRODUCTS, FLOWERS, AND CHRISTMAS TREES.
   15    S  2.  Subdivision  4  of  section 64-e of the town law, as amended by
   16  chapter 423 of the laws of 2013, is amended to read as follows:
   17    4. Preservation of community character shall involve one  or  more  of
   18  the  following:  (a) establishment of parks, nature preserves, or recre-
   19  ation areas; (b) preservation  of  open  space,  including  agricultural
   20  lands  AND  RETAIL  FARM  OPERATIONS AS AN ACCESSORY USE TO AGRICULTURAL

        EXPLANATION--Matter in ITALICS (underscored) is new; matter in brackets
                             [ ] is old law to be omitted.
                                                                  LBD07099-01-5

       S. 3689                             2                            A. 5244

    1  LANDS; (c) preservation of lands of exceptional scenic value; (d)  pres-
    2  ervation of fresh and saltwater marshes or other wetlands; (e) preserva-
    3  tion  of  aquifer recharge areas; (f) preservation of undeveloped beach-
    4  lands  or  shoreline  including  those  at  significant  risk of coastal
    5  flooding due to projected sea level rise and future storms;  (g)  estab-
    6  lishment  of  wildlife  refuges  for  the  purpose of maintaining native
    7  animal species diversity, including the protection of habitat  essential
    8  to the recovery of rare, threatened or endangered species; (h) preserva-
    9  tion  of  pine barrens consisting of such biota as pitch pine, and scrub
   10  oak; (i) preservation of unique  or  threatened  ecological  areas;  (j)
   11  preservation of rivers and river areas in a natural, free-flowing condi-
   12  tion;  (k)  preservation  of  forested  land; (l) preservation of public
   13  access to lands for public use including stream  rights  and  waterways;
   14  (m)  preservation  of  historic  places and properties listed on the New
   15  York state register of historic places and/or protected under a  munici-
   16  pal  historic  preservation ordinance or law; and (n) undertaking any of
   17  the aforementioned in furtherance of the establishment of a greenbelt.
   18    S 3. This act shall take effect immediately.
')>

In [16]:
all_bills[-1]


Out[16]:
<User(bill_num='S2251A', bill_name='Elevates assault of a utility worker to the class D felony of assault in the second degree', bill_text='
                           S T A T E   O F   N E W   Y O R K
       ________________________________________________________________________

                                         2251

                              2015-2016 Regular Sessions

                                   I N  S E N A T E

                                   January 22, 2015
                                      ___________

       Introduced  by  Sens. LARKIN, ADDABBO -- read twice and ordered printed,
         and when printed to be committed to the Committee on Codes

       AN ACT to amend the penal law, in relation to elevating an assault of  a
         utility worker to the class D felony of assault in the second degree

         THE  PEOPLE OF THE STATE OF NEW YORK, REPRESENTED IN SENATE AND ASSEM-
       BLY, DO ENACT AS FOLLOWS:

    1    Section 1. Subdivision 3 of  section  120.05  of  the  penal  law,  as
    2  amended  by  chapter  196  of  the  laws  of 2014, is amended to read as
    3  follows:
    4    3. With intent to prevent a peace officer, a police officer,  prosecu-
    5  tor as defined in subdivision thirty-one of section 1.20 of the criminal
    6  procedure  law,  registered  nurse, licensed practical nurse, sanitation
    7  enforcement agent, New  York  city  sanitation  worker,  a  firefighter,
    8  including a firefighter acting as a paramedic or emergency medical tech-
    9  nician  administering  first aid in the course of performance of duty as
   10  such firefighter, an emergency medical service  paramedic  or  emergency
   11  medical  service technician, or medical or related personnel in a hospi-
   12  tal emergency department,  a  city  marshal,  a  school  crossing  guard
   13  appointed pursuant to section two hundred eight-a of the general munici-
   14  pal  law,  a traffic enforcement officer [or], traffic enforcement agent
   15  OR EMPLOYEE OF ANY ENTITY GOVERNED BY THE  PUBLIC  SERVICE  LAW  IN  THE
   16  COURSE  OF  PERFORMING  AN  ESSENTIAL  SERVICE, from performing a lawful
   17  duty, by means including releasing or failing to control an animal under
   18  circumstances evincing the actor's intent that the animal  obstruct  the
   19  lawful  activity  of  such  peace officer, police officer, prosecutor as
   20  defined in subdivision thirty-one of section 1.20 of the criminal proce-
   21  dure  law,  registered  nurse,  licensed  practical  nurse,   sanitation
   22  enforcement   agent,  New  York  city  sanitation  worker,  firefighter,
   23  paramedic, technician, city marshal,  school  crossing  guard  appointed
   24  pursuant  to  section  two hundred eight-a of the general municipal law,
   25  traffic enforcement officer [or], traffic enforcement agent OR  EMPLOYEE

        EXPLANATION--Matter in ITALICS (underscored) is new; matter in brackets
                             [ ] is old law to be omitted.
                                                                  LBD04667-01-5

       S. 2251                             2

    1  OF  AN ENTITY GOVERNED BY THE PUBLIC SERVICE LAW, he or she causes phys-
    2  ical injury to such peace officer, police officer, prosecutor as defined
    3  in subdivision thirty-one of section 1.20 of the criminal procedure law,
    4  registered  nurse,  licensed  practical  nurse,  sanitation  enforcement
    5  agent, New York city sanitation worker, firefighter, paramedic,  techni-
    6  cian or medical or related personnel in a hospital emergency department,
    7  city  marshal,  school crossing guard, traffic enforcement officer [or],
    8  traffic enforcement agent OR EMPLOYEE  OF  AN  ENTITY  GOVERNED  BY  THE
    9  PUBLIC SERVICE LAW; or
   10    S 2. This act shall take effect on the first of November next succeed-
   11  ing the date on which it shall have become a law.
')>

In [17]:
session.close()