This workbook shows many of the core features in Tables in an fairly large example that involves real world wrangling with data. It takes two large data sets from the Berkeley Open Data Portal - the city wide parcel data base and the business license database. It begins by doing some visualization on maps of the parcel data. Digging into this reveals the wrangling challenge of working with native data - Use Codes are literally all over the map. It illustrates some structured techniques for doing the wrangling that leave behind a clear definition of how the raw data is transformed into workable parcel data. It then joins the residential portion of the parcel data with the business license data to answer a simple question - do people with business licenses live in larger homes? Yes, enough bigger for a study.

First, a couple of preliminaries that will be in all our notebooks



In [1]:

    
from datascience import *
import numpy as np
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
%matplotlib inline
# datascience version number of last run of this notebook
version.__version__









    Out[1]:





'0.3.dev19'

We've grabbed a big chunk of data from the recent Berkeley Open Data portal, so let's read it in as a Table.

https://data.cityofberkeley.info/browse



In [2]:

    
raw_parcels = Table.read_table("./data/BerkeleyData/Parcels.csv")

Cool, what does that look like. Tables print themselves in nice HTML format.



In [3]:

    
raw_parcels









    Out[3]:





    
        
            CONDO  APN  POINT_X  POINT_Y  LocationID  StreetNum  Prequalifi  Direction  StreetName  StreetSufx  Unit  City  State  Zip  UseCode  UseCodeDes  BldgSqft  LotSqft  latitude  longitude  X_min  X_max  Y_min  Y_max  Shape_area  Shape_len
        
    
    
        
            YES   055 183213100  565028  4.19089e+06  71843      2550      nan        nan       DANA        ST         nan  BERKELEY  CA    94704  7390    CONDOMINIUM-COMMON AREA   0        0       37.8632  -122.261  565007  565050  4.19088e+06  4.19091e+06  1018.78    129.821  
        
    
        
            YES   053 166004000  562508  4.19004e+06  67007      1012      nan        nan       GRAYSON     ST         A    BERKELEY  CA    94710  4200    INDUSTRIAL LIGHT/MANUFAC.  1932     1667    37.8557  -122.289  562497  562520  4.19002e+06  4.19006e+06  476.854    92.7454  
        
    
        
            YES   053 166004200  562508  4.19004e+06  67009      1012      nan        nan       GRAYSON     ST         C    BERKELEY  CA    94710  4101    CONDOMINIUM-INDUSTRIAL    1932     1667    37.8557  -122.289  562497  562520  4.19002e+06  4.19006e+06  476.854    92.7454  
        
    
        
            YES   057 209002100  562032  4.19171e+06  47615      1813      nan        nan       NINTH       ST         nan  BERKELEY  CA    94710  7300    CONDOMINIUMS-SINGLE RESDL  972      2600    37.8707  -122.295  562012  562053  4.1917e+06  4.19172e+06  483.811    103.162  
        
    
        
            YES   057 208303100  562499  4.19172e+06  12322      1901      nan        nan       CURTIS      ST         nan  BERKELEY  CA    94702  7300    CONDOMINIUMS-SINGLE RESDL  786      2700    37.8708  -122.289  562483  562515  4.19171e+06  4.19173e+06  533.749    96.0604  
        
    
        
            YES   052 152702400  564234  4.18923e+06  65546      1829      nan        nan       SIXTY-THIRD  ST         nan  BERKELEY  CA    94703  7300    CONDOMINIUMS-SINGLE RESDL  1387     2363    37.8483  -122.27   564226  564243  4.18921e+06  4.18926e+06  438.797    103.952  
        
    
        
            YES   052 157310800  565682  4.19013e+06  48709      2641      nan        nan       WEBSTER     ST         1    BERKELEY  CA    94705  7300    CONDOMINIUMS-SINGLE RESDL  630      1167    37.8562  -122.253  565672  565692  4.19011e+06  4.19014e+06  413.513    89.6718  
        
    
        
            YES   052 151603400  563132  4.18952e+06  65541      1314      nan        nan       HASKELL     ST         A    BERKELEY  CA    94702  1160    SINGLE FAMILY RESIDENTIAL  935      1575    37.851   -122.282  563120  563144  4.1895e+06  4.18955e+06  570.197    116.043  
        
    
        
            YES   055 183203100  565019  4.19095e+06  7145       2522      nan        nan       DANA        ST         101  BERKELEY  CA    94704  9300    MEDICAL-DENTAL BUILDING   1228     1270    37.8637  -122.261  564998  565040  4.19093e+06  4.19097e+06  1034.69    130.691  
        
    
        
            YES   056 200500400  563426  4.19151e+06  47852      2122      nan        nan       JEFFERSON   AVE        nan  BERKELEY  CA    94703  7340    CONDOMINIUM-SFR,R&T 402.1  1403     2166    37.8689  -122.279  563369  563480  4.19142e+06  4.19161e+06  16800.6    587.516  
        
    

... (28795 rows omitted)

How many parcels are we talking about here?



In [4]:

    
raw_parcels.num_rows









    Out[4]:





28805

Tables are ordered collections of labeled columns.

How many columns and what are their names? You can tell by inspection above. But you also need to be able get at it programmatically. (Be sure to try command completion with tab.)



In [5]:

    
raw_parcels.column_labels









    Out[5]:





('CONDO',
 'APN',
 'POINT_X',
 'POINT_Y',
 'LocationID',
 'StreetNum',
 'Prequalifi',
 'Direction',
 'StreetName',
 'StreetSufx',
 'Unit',
 'City',
 'State',
 'Zip',
 'UseCode',
 'UseCodeDes',
 'BldgSqft',
 'LotSqft',
 'latitude',
 'longitude',
 'X_min',
 'X_max',
 'Y_min',
 'Y_max',
 'Shape_area',
 'Shape_len')



In [6]:

    
# Len of a table is the number of columns
len(raw_parcels)









    Out[6]:





26

This table seems to contain geocoded data, since it has columns called latitude and longitude. Let's just assume that is what's going on and throw all 28,000 points on a map. We can even label the points in case you click on them.



In [7]:

    
raw_parcels.points('latitude','longitude')









    Out[7]:

We can also easily do some descriptive statistics on the Table to try to figure out what we are looking at. We might want to select some useful columns and plot that.

First, select. This returns a new table with the selected columns.



In [8]:

    
parcels = raw_parcels.select(('APN', 'BldgSqft','LotSqft'))



In [9]:

    
# Table.hist shows a histogram of the data in each of the columns
parcels.select(('BldgSqft','LotSqft')).hist(bins=50)

Well, that's not terrible useful. Seems we have a very long tail on this distribution. We might want to look at some of this data. We get a better sense by sorting and plotting.

Sort also returns a new table



In [10]:

    
sorted_parcels = parcels.sort('BldgSqft', descending=True)
sorted_parcels









    Out[10]:





    
        
            APN  BldgSqft  LotSqft
        
    
    
        
            054 177300304  470587   742601 
        
    
        
            054 174800201  360996   625771 
        
    
        
            052 157308707  337730   84800  
        
    
        
            055 189700501  281496   115103 
        
    
        
            054 177000801  274722   444431 
        
    
        
            053 165200105  263476   449649 
        
    
        
            057 205401201  245629   69261  
        
    
        
            056 195500600  225513   74710  
        
    
        
            053 165500900  202214   308807 
        
    
        
            057 202601203  193269   30702  
        
    

... (28795 rows omitted)

Yep, big industrial properties with half a million sq ft. Here's a bit more.



In [11]:

    
sorted_parcels.select(('BldgSqft','LotSqft')).plot()



In [12]:

    
parcels.select(('BldgSqft','LotSqft')).stats()









    Out[12]:





    
        
            statistic  BldgSqft  LotSqft
        
    
    
        
            min       0           0          
        
    
        
            max       470587      1.90724e+08
        
    
        
            median    1885        5000       
        
    
        
            sum       8.74759e+07  4.2726e+08

If we are going to make much sense of the data, we will have to break it down into useful categories and do analysis within those.

Each of the columns in a Table can be selected to get a column of values.



In [13]:

    
raw_parcels['UseCodeDes']









    Out[13]:





array(['CONDOMINIUM-COMMON AREA', 'INDUSTRIAL LIGHT/MANUFAC.',
       'CONDOMINIUM-INDUSTRIAL', ..., 'SINGLE FAMILY RESIDENTIAL',
       'SINGLE FAMILY RESIDENTIAL', 'EXEMPT PUBLIC AGENCIES'], dtype=object)

OK, so how many use codes and descriptions do we have?



In [14]:

    
use_codes = np.unique(raw_parcels['UseCode'])
len(use_codes)









    Out[14]:





98



In [15]:

    
use_desc = np.unique(raw_parcels['UseCodeDes'])
len(use_desc)









    Out[15]:





95

Isn't that interesting. There must be some places where the same description has multiple different codes! We could already start grouping things by usecode, but 100 codes is still a lot to deal with. Guess we get to do some wrangling. Grouping may still be handy because we will see can find the most common one.



In [16]:

    
parcels_by_use = raw_parcels.select(('APN','UseCode','UseCodeDes')).group('UseCode')
parcels_by_use









    Out[16]:





    
        
            UseCode  APN  UseCodeDes
        
    
    
        
            300     ['056 200507900' '057 211900100' '060 251200300' '052 15 ...  ['EXEMPT PUBLIC AGENCIES' 'EXEMPT PUBLIC AGENCIES' 'EXEM ...
        
    
        
            400     ['057 203401402' '060 236001001']                            ['LEASED TO PUBLIC UTILITY' 'LEASED TO PUBLIC UTILITY']     
        
    
        
            500     ['052 143901400' '052 151902600' '052 151400800' '053 16 ...  ['OWNED BY PUBLIC UTILITY' 'OWNED BY PUBLIC UTILITY'
 'O ...
        
    
        
            750     ['060 900800700' '060 900801000' '060 900801100' '060 90 ...  ['FLOATING HOME' 'FLOATING HOME' 'FLOATING HOME' 'FLOATI ...
        
    
        
            1000    ['056 197702300' '056 197702400' '052 152400200' '052 15 ...  ['VACANT RES LAND <=4 UNITS' 'VACANT RES LAND <=4 UNITS' ...
        
    
        
            1040    ['057 201500900']                                            ['VACANT RES LAND, R&T402.1']                               
        
    
        
            1100    ['052 156707900' '058 218205000' '052 143501200' ..., '0 ...  ['SINGLE FAMILY RESIDENTIAL' 'SINGLE FAMILY RESIDENTIAL' ...
        
    
        
            1101    ['055 190802800']                                            ['SINGLE FAMILY RESIDENTIAL']                               
        
    
        
            1120    ['053 161100800' '053 167701000' '055 184003400']            ['RESDL IMPR ON COMML LAND' 'RESDL IMPR ON COMML LAND'
  ...
        
    
        
            1130    ['056 196204800' '053 163501001' '053 163502000']            ['SINGLE FAMILY RESIDENTIAL' 'SINGLE FAMILY RESIDENTIAL' ...
        
    

... (88 rows omitted)

Now that's interesting. Much more information that 'GROUP BY' in SQL and things like that because the entries in the table are lists of values in the group. Notice that 1100 and 1101 are both "SINGLE FAMILY RESIDENTIAL".

We can provide a collection function to combine these values into one.



In [17]:

    
use_codes = raw_parcels.select(('APN','UseCodeDes')).group('UseCodeDes', len)
use_codes









    Out[17]:





    
        
            UseCodeDes  APN len
        
    
    
        
            1 TO 5 STORY OFFICE BLDG.  162    
        
    
        
            2 UNITS-LESSER QUAL 22    899    
        
    
        
            2,3 OR 4 UNITS W/ ROOMING  45     
        
    
        
            2,3,OR 4 SINGLE FAM HOMES  666    
        
    
        
            3 UNITS-LESSER QUAL 23    341    
        
    
        
            4 UNITS-LESSER QUAL 24    179    
        
    
        
            5+SINGLE FAMILY RES HOMES  2      
        
    
        
            > 1 MOBILE OR MH W/ RESDL  1      
        
    
        
            AUTOMOBILE DEALERSHIP     11     
        
    
        
            BANK                      19     
        
    

... (85 rows omitted)

Now we can sort that. Notice that the label for the collected column reflects ow we got here.



In [18]:

    
common_uses = use_codes.sort('APN len', descending=True)
common_uses.max_str_rows = 20
common_uses









    Out[18]:





    
        
            UseCodeDes  APN len
        
    
    
        
            SINGLE FAMILY RESIDENTIAL  16568  
        
    
        
            CONDOMINIUMS-SINGLE RESDL  1921   
        
    
        
            MULTIPLE RESIDL BLDG. 5+U  1161   
        
    
        
            DOUBLE OR DUPLEX TYPE- 2U  976    
        
    
        
            2 UNITS-LESSER QUAL 22    899    
        
    
        
            2,3,OR 4 SINGLE FAM HOMES  666    
        
    
        
            CONDOMINIUM-COMMON AREA   511    
        
    
        
            FOURPLEX OR TRIPLEX W/SFR  509    
        
    
        
            VACANT RES LAND <=4 UNITS  491    
        
    
        
            EXEMPT PUBLIC AGENCIES    396    
        
    
        
            SFR HOME W/NON-ECO 2ND UN  380    
        
    
        
            TRIPLEX OR DUPLEX W/ SFR  374    
        
    
        
            ONE STORY STORE           350    
        
    
        
            3 UNITS-LESSER QUAL 23    341    
        
    
        
            STORE 1ST FLR W/ OFF/APTS  327    
        
    
        
            INDUSTRIAL LIGHT/MANUFAC.  208    
        
    
        
            SFR PLAN DEVT W/ COM AREA  183    
        
    
        
            4 UNITS-LESSER QUAL 24    179    
        
    
        
            CONDO - COMMERCIAL RETAIL  165    
        
    
        
            1 TO 5 STORY OFFICE BLDG.  162    
        
    

... (75 rows omitted)

Let's pull out the 16,568 'SINGLE FAMILY RESIDENTIAL' parcels.

Unlike select, we want the rows of the table WHERE some particular property holds.



In [19]:

    
residential = raw_parcels.where(raw_parcels['UseCodeDes'] == 'SINGLE FAMILY RESIDENTIAL')



In [20]:

    
residential.points('latitude','longitude')









    Out[20]:

Well, that is more useful. We might like to understand what portion of the parcels are 'SINGLE FAMILY RESIDENTIAL". It would be nice to visualize that too. This brings us to one of the key observations. We have a real programming language around tables and we are learning how to use it. For example we might define a simple function.



In [21]:

    
parcels = raw_parcels.select(['APN','UseCode','UseCodeDes','BldgSqft','LotSqft','latitude','longitude'])

Not only can we use columns of a table by indexing them with their label, we can introduce new columns. Let's apply our color function to every description to get a color for each row.

Tables can also be used to construct handy data tools for doing operations on tables.



In [22]:

    
categories = Table.from_rows([("Residential",["RESIDENTIAL", "SFR", "SINGLE RESDL", "SINGLE FAM"], "blue"),
                              ("Multi-unit", ["UNIT", "FRAT", "SORO"], "orange"),
                              ("Industrial", ["INDUSTRIAL"], "red"),
                              ("Commercial", ["COMMERCIAL", "DEALERSHIP","BANK", "RESTAURANT", "STORE","MARKET",
                                              "SHOP", "OFFICE", "FUNERAL", "THEATER"], "green"),
                              ("Public", ["PUBLIC", "SCHOOL"], "yellow")],
                             ("Category", "Keys", "Color"))
categories









    Out[22]:





    
        
            Category  Keys  Color
        
    
    
        
            Residential  ['RESIDENTIAL', 'SFR', 'SINGLE RESDL', 'SINGLE FAM']         blue  
        
    
        
            Multi-unit  ['UNIT', 'FRAT', 'SORO']                                     orange
        
    
        
            Industrial  ['INDUSTRIAL']                                               red   
        
    
        
            Commercial  ['COMMERCIAL', 'DEALERSHIP', 'BANK', 'RESTAURANT', 'STOR ...  green 
        
    
        
            Public      ['PUBLIC', 'SCHOOL']                                         yellow

Lots of other opportunities to talk about defining functions, scoping, higher order functions. This one is in the yechy of 70s style of global scope. But we don't have lambda yet.



In [23]:

    
def categorize (title) : 
    for category, keywords, _ in categories.rows:
        for word in keywords :
            if title.find(word) >= 0 : return category
    return 'Other'

Try it out



In [24]:

    
categorize("SINGLE FAMILY RESIDENTIAL")









    Out[24]:





'Residential'



In [25]:

    
parcels['Category'] = parcels.apply(categorize, 'UseCodeDes')
parcels









    Out[25]:





    
        
            APN  UseCode  UseCodeDes  BldgSqft  LotSqft  latitude  longitude  Category
        
    
    
        
            055 183213100  7390    CONDOMINIUM-COMMON AREA   0        0       37.8632  -122.261  Other      
        
    
        
            053 166004000  4200    INDUSTRIAL LIGHT/MANUFAC.  1932     1667    37.8557  -122.289  Industrial 
        
    
        
            053 166004200  4101    CONDOMINIUM-INDUSTRIAL    1932     1667    37.8557  -122.289  Industrial 
        
    
        
            057 209002100  7300    CONDOMINIUMS-SINGLE RESDL  972      2600    37.8707  -122.295  Residential
        
    
        
            057 208303100  7300    CONDOMINIUMS-SINGLE RESDL  786      2700    37.8708  -122.289  Residential
        
    
        
            052 152702400  7300    CONDOMINIUMS-SINGLE RESDL  1387     2363    37.8483  -122.27   Residential
        
    
        
            052 157310800  7300    CONDOMINIUMS-SINGLE RESDL  630      1167    37.8562  -122.253  Residential
        
    
        
            052 151603400  1160    SINGLE FAMILY RESIDENTIAL  935      1575    37.851   -122.282  Residential
        
    
        
            055 183203100  9300    MEDICAL-DENTAL BUILDING   1228     1270    37.8637  -122.261  Other      
        
    
        
            056 200500400  7340    CONDOMINIUM-SFR,R&T 402.1  1403     2166    37.8689  -122.279  Residential
        
    

... (28795 rows omitted)

Let's try to get a sense of how our catorigorization did.



In [26]:

    
for cat, des in parcels.select(["UseCodeDes","Category"]).group('Category', np.unique).rows:
    print(cat,des)









    



Commercial ['1 TO 5 STORY OFFICE BLDG.' 'AUTOMOBILE DEALERSHIP' 'BANK'
 'COMMERCIAL REPAIR GARAGE' 'CONDO - COMMERCIAL RETAIL'
 'CONDO-OFFICE, COMMON AREA' 'CONDOMINIUM-OFFICE' 'DEPARTMENT STORE'
 'DISCOUNT STORE' 'FUNERAL HOME' 'HISTORICAL COMMERCIAL'
 'MISC. IMPROVED COMMERCIAL' 'ONE STORY STORE' 'OVER 5 STORY OFFICE BLDG.'
 'RESTAURANT' 'SHOPPING CENTER - NBHD' 'STORE 1ST FLR W/ OFF/APTS'
 'SUPERMARKET' 'VETERINARIAN OFFICE' 'WALK IN THEATER']
Industrial ['CONDO-INDUSTRIAL,COM AREA' 'CONDOMINIUM-INDUSTRIAL' 'HEAVY INDUSTRIAL'
 'INDUSTRIAL FLEX/R&D USE' 'INDUSTRIAL LIGHT/MANUFAC.'
 'MISC INDUSTRIAL(IMPROVED)' 'VACANT INDUSTRIAL/IMPROVE']
Multi-unit ['2 UNITS-LESSER QUAL 22' '2,3 OR 4 UNITS W/ ROOMING'
 '3 UNITS-LESSER QUAL 23' '4 UNITS-LESSER QUAL 24'
 'FRATERNITIES/SORORITIES' 'RES.PROP.CONVERT 5+ UNITS'
 'VACANT APT. LAND 5+ UNITS' 'VACANT RES LAND <=4 UNITS']
Other ['> 1 MOBILE OR MH W/ RESDL' 'CAR WASH' 'CHURCH' 'CHURCH HOME'
 'CONDO COMML - COMMON AREA' 'CONDO-COMMON LIVE/WORK'
 'CONDO-URBAN RES ABOVE OFF' 'CONDO-URBAN RES R&T 402.1'
 'CONDOMINIUM-COMMON AREA' 'CONDOMINIUM-RES LIVE/WORK'
 'COOPERATIVES (DIVIDED)' 'COOPERATIVES (UNDIVIDED)'
 'DOUBLE OR DUPLEX TYPE- 2U' 'FLOATING HOME' 'HOSPITAL (GENERAL)' 'HOTEL'
 'LODGEHALL &/OR CLUBHOUSE' 'MEDICAL-DENTAL BUILDING' 'MOTEL'
 'MULTIPLE RESIDL BLDG. 5+U' 'NURSERIES' 'NURSING/CUSTODIAL CARE FA'
 'OTHER INSTITUTIONAL PROP.' 'PARKING GARAGE' 'PARKING LOT'
 'RECREATIONAL/RINKS,STADIU' 'RESDL IMPR ON COMML LAND'
 'RESTRICTED RES INCOME PRO' 'SECURED PI' 'SERVICE STATIONS'
 'SKILLED NURSING FACILITY' 'TERMINALS,TRUCKING/DISTRI'
 'TOWNHOUSE PLANNED DEVELOP' 'TOWNHOUSE STYLE - CONDO'
 'VACANT APT. COMMON AREA' 'VACANT COMML/INCL IMPROVE'
 'VACANT RES LAND, R&T402.1' 'WAREHOUSE' 'WAREHOUSE/SELF-STORAGE']
Public ['EXEMPT PUBLIC AGENCIES' 'LEASED TO PUBLIC UTILITY'
 'OWNED BY PUBLIC UTILITY' 'SCHOOL']
Residential ['2,3,OR 4 SINGLE FAM HOMES' '5+SINGLE FAMILY RES HOMES'
 'CONDOMINIUM-SFR,R&T 402.1' 'CONDOMINIUMS-SINGLE RESDL'
 'FOURPLEX OR TRIPLEX W/SFR' 'HISTORICAL RESIDENTIAL'
 'SFR CONVERT BOARDING HOME' 'SFR DETACHED SITE CONDO'
 'SFR DEV TRACT COMMON AREA' 'SFR HOME W/NON-ECO 2ND UN'
 'SFR PLAN DEVT W/ COM AREA' 'SFR SITE CONDO COM AREA'
 'SFR W/ SLIGHT COMML/INDUS' 'SINGLE FAM - MANUFACTURED'
 'SINGLE FAM RESL,R&T 402.1' 'SINGLE FAMILY RESIDENTIAL'
 'TRIPLEX OR DUPLEX W/ SFR']



In [27]:

    
colored = parcels.join('Category', categories)
colored.sort('BldgSqft', descending=True)









    Out[27]:





    
        
            Category  APN  UseCode  UseCodeDes  BldgSqft  LotSqft  latitude  longitude  Keys  Color
        
    
    
        
            Industrial  054 177300304  4300    HEAVY INDUSTRIAL          470587   742601  37.8589  -122.296  ['INDUSTRIAL']                                               red   
        
    
        
            Industrial  054 174800201  4300    HEAVY INDUSTRIAL          360996   625771  37.8556  -122.295  ['INDUSTRIAL']                                               red   
        
    
        
            Industrial  054 177000801  4300    HEAVY INDUSTRIAL          274722   444431  37.8572  -122.296  ['INDUSTRIAL']                                               red   
        
    
        
            Commercial  053 165200105  3300    MISC. IMPROVED COMMERCIAL  263476   449649  37.8531  -122.288  ['COMMERCIAL', 'DEALERSHIP', 'BANK', 'RESTAURANT', 'STOR ...  green 
        
    
        
            Public     057 205401201  300     EXEMPT PUBLIC AGENCIES    245629   69261   37.8722  -122.271  ['PUBLIC', 'SCHOOL']                                         yellow
        
    
        
            Commercial  053 165500900  9400    1 TO 5 STORY OFFICE BLDG.  202214   308807  37.8539  -122.293  ['COMMERCIAL', 'DEALERSHIP', 'BANK', 'RESTAURANT', 'STOR ...  green 
        
    
        
            Commercial  057 202601203  9500    OVER 5 STORY OFFICE BLDG.  193269   30702   37.8697  -122.269  ['COMMERCIAL', 'DEALERSHIP', 'BANK', 'RESTAURANT', 'STOR ...  green 
        
    
        
            Public     055 187900601  300     EXEMPT PUBLIC AGENCIES    189867   32685   37.8672  -122.259  ['PUBLIC', 'SCHOOL']                                         yellow
        
    
        
            Public     057 202302003  300     EXEMPT PUBLIC AGENCIES    175500   34267   37.8705  -122.27   ['PUBLIC', 'SCHOOL']                                         yellow
        
    
        
            Industrial  053 164400701  4200    INDUSTRIAL LIGHT/MANUFAC.  173288   168141  37.8515  -122.295  ['INDUSTRIAL']                                               red   
        
    

... (25157 rows omitted)



In [28]:

    
colored.points('latitude', 'longitude', labels='UseCodeDes', colors='Color')









    Out[28]:

Ok, so let's do some more detailed analysis on the residential part, now that we see where it is.



In [29]:

    
residential = parcels.where(parcels['Category'] == 'Residential')



In [30]:

    
residential.select(['BldgSqft']).hist(bins=40)



In [31]:

    
residential.select(['BldgSqft','LotSqft']).stats()









    Out[31]:





    
        
            statistic  BldgSqft  LotSqft
        
    
    
        
            min       0           0          
        
    
        
            max       16264       138114     
        
    
        
            median    1775        4800       
        
    
        
            sum       4.13872e+07  1.07552e+08



In [32]:

    
def firstQtile(x) : return np.percentile(x,25)
def thirdQtile(x) : return np.percentile(x,25)
summary_ops = (min, firstQtile, np.median, np.mean, thirdQtile, max)



In [33]:

    
residential.select(['BldgSqft','LotSqft']).stats(summary_ops)









    Out[33]:





    
        
            statistic  BldgSqft  LotSqft
        
    
    
        
            min        0        0      
        
    
        
            firstQtile  1272     3600   
        
    
        
            median     1775     4800   
        
    
        
            mean       1985.28  5159.13
        
    
        
            thirdQtile  1272     3600   
        
    
        
            max        16264    138114



In [34]:

    
residential['Size'] = ['small' if x < 3000 else 'large' for x in residential['BldgSqft'] ]
residential









    Out[34]:





    
        
            APN  UseCode  UseCodeDes  BldgSqft  LotSqft  latitude  longitude  Category  Size
        
    
    
        
            057 209002100  7300    CONDOMINIUMS-SINGLE RESDL  972      2600    37.8707  -122.295  Residential  small
        
    
        
            057 208303100  7300    CONDOMINIUMS-SINGLE RESDL  786      2700    37.8708  -122.289  Residential  small
        
    
        
            052 152702400  7300    CONDOMINIUMS-SINGLE RESDL  1387     2363    37.8483  -122.27   Residential  small
        
    
        
            052 157310800  7300    CONDOMINIUMS-SINGLE RESDL  630      1167    37.8562  -122.253  Residential  small
        
    
        
            052 151603400  1160    SINGLE FAMILY RESIDENTIAL  935      1575    37.851   -122.282  Residential  small
        
    
        
            056 200500400  7340    CONDOMINIUM-SFR,R&T 402.1  1403     2166    37.8689  -122.279  Residential  small
        
    
        
            058 217002900  7300    CONDOMINIUMS-SINGLE RESDL  973      2250    37.8755  -122.273  Residential  small
        
    
        
            059 228904100  7300    CONDOMINIUMS-SINGLE RESDL  801      3100    37.874   -122.291  Residential  small
        
    
        
            059 229005800  7300    CONDOMINIUMS-SINGLE RESDL  910      2688    37.874   -122.291  Residential  small
        
    
        
            058 217502700  7300    CONDOMINIUMS-SINGLE RESDL  979      2041    37.876   -122.271  Residential  small
        
    

... (20837 rows omitted)

Joining data sets



In [35]:

    
biz = Table.read_table("./data/BerkeleyData/Business_Licenses.csv")



In [36]:

    
biz









    Out[36]:





    
        
            APN  Minority_Owned  Female_Owned  RecordID  BusDesc  B1_PER_SUB_TYPE  B1_APPL_STATUS  DBA  NAICS  Tax_Code  Employee_Num  Bus_Own_Type  B1_BUSINESS_NAME  B1_ADDRESS1  B1_ADDRESS2  B1_CITY  B1_STATE  B1_ZIP  B1_CONTACT_TYPE  B1_FULL_ADDRESS  B1_SITUS_CITY  B1_SITUS_STATE  B1_SITUS_ZIP  InDbDate  Business_Location
        
    
    
        
            052 155900100  nan            nan          BL-038949  CONSULTING-ENGINEER       Business Personal Repair Svs  Unreported     WILLIAM J. CAIN                nan                                                          nan      nan          nan            nan                   2510 PRINCE ST               nan         BERKELEY   CA       94705      Business Owner  2510   PRINCE ST        BERKELEY      CA             94705        06/10/2015 03:30:21 AM  2510 PRINCE ST
BERKELEY, CA 94705
(37.854112933000067, - ...
        
    
        
            056 198304201  No             No           BL-001767  AUTO REPAIR & SERVICE     Business Personal Repair Svs  Delinquent     BEYOND REPAIR JAPANESE         nan                                                          B        4            nan            nan                   2147 SAN PABLO AVE           nan         BERKELEY   CA       94702      Business Owner  2147   SAN PABLO AVE    BERKELEY      CA             94702        06/10/2015 03:30:08 AM  2147 SAN PABLO AVE
BERKELEY, CA 94702
(37.86698046400005 ...
        
    
        
            057 203100900  No             No           BL-029009  LAW OFFICE                Professional SemiProfessional  Closed         RUMBERGER TIMOTHY P LAW OFFICE  445110 - Supermarkets and Other Grocery (except Convenie ...  P        0            Sole Ownership  nan                   1339 BAY ST                  nan         ALAMEDA    CA       94501      Business Owner  2161   SHATTUCK AVE     BERKELEY      CA             94704        06/10/2015 03:30:17 AM  2161 SHATTUCK AVE
BERKELEY, CA 94704
(37.870098538000036 ...
        
    
        
            055 182201800  No             Yes          BL-046854  PSYCHOTHERAPY OFFICE      Professional SemiProfessional  Active         WEKSLER ORIT MFT               621330 - Offices of Mental Health Practitioners (except  ...  P        1            Sole Ownership  nan                   923 BALRA DRIVE              nan         EL CERRITO  CA       94530      Business Owner  2000   DWIGHT WAY D     BERKELEY      CA             94704        06/10/2015 03:30:26 AM  2000 DWIGHT WAY D
BERKELEY, CA 94704
(37.863643455000044 ...
        
    
        
            058 217301900  No             Yes          BL-043569  RENTAL PROPERTY           Rental of Real Property       Closed         NORLUND BRIDGET                531110 - Lessors of Residential Buildings and Dwellings      L        0            Corporation    nan                   574 MONTGOMERY ST            nan         NAPA       CA       94559      Business Owner  1601   MILVIA ST        BERKELEY      CA             94709        06/10/2015 03:30:24 AM  1601 MILVIA ST
BERKELEY, CA 94709
(37.877974384000026, - ...
        
    
        
            057 202800400  nan            nan          BL-000393  ATTORNE AT LAW            Business Personal Repair Svs  Unreported     ANDERSON & DROSSEL             nan                                                          nan      nan          nan            nan                   2041 BANCROFT WAY, SUITE 207  nan         KENSINGTON  CA       94704      Business Owner  2041   BANCROFT WAY 207  BERKELEY      CA             94704        06/10/2015 03:30:06 AM  2041 BANCROFT WAY 207
BERKELEY, CA 94704
(37.86745577600 ...
        
    
        
            063 298303101  No             Yes          BL-052271  ECONOMIC CONSULTING       Professional SemiProfessional  Active         VERNAZZA WOLFE ASSOCIATES INC  541690 - Other Scientific and Technical Consulting Services  P        1            Corporation    nan                   2909 SHASTA RD               nan         BERKELEY   CA       94708-2117  Business Owner  2909   SHASTA RD        BERKELEY      CA             94708        06/10/2015 03:30:31 AM  2909 SHASTA RD
BERKELEY, CA 94708
(37.888940650000052, - ...
        
    
        
            054 180802000  No             No           BL-011440  GRAPHIC ARTS              Professional SemiProfessional  Active         STINEHOUR CHRISTOPHER          541430 - Graphic Design Services                             P        1            Sole Ownership  nan                   1701 PARKER STREET           nan         BERKELEY   CA       94703      Business Owner  1701   PARKER ST        BERKELEY      CA             94703        06/10/2015 03:30:12 AM  1701 PARKER ST
BERKELEY, CA 94703
(37.861098065000078, - ...
        
    
        
            ZZZZZZZZZZZZZ  No             Yes          BL-045996  RECYCLING WASTE DIVERSION  Business Personal Repair Svs  Active         GREEN MARY LLC                 562998 - All Other Miscellaneous Waste Management Services   B        10           Corporation    nan                   PO BOX 4957                  nan         SANTA ROSA  CA       95402      Business Owner  0   VARIOUS             BERKELEY      CA             94704        06/10/2015 03:30:26 AM  0 VARIOUS
BERKELEY, CA 94704
(37.86651629000005, -122.26 ...
        
    
        
            nan           nan            nan          BL-000763  GENEAL CONTRACTOR         Construction or Contractor    Active         EAST BAY CONSTTRUCTION         236118 - Residential Remodelers                              C        0            Sole Ownership  EAST BAY CONSTRUCTION  2817 WENDELL AVE             nan         RICHMOND   CA       94804-1361  Business Owner  0 VARIOUS               BERKELEY      CA             94704        06/10/2015 03:30:06 AM  0 VARIOUS
BERKELEY, CA 94704
(37.86651629000005, -122.26 ...
        
    

... (22500 rows omitted)



In [37]:

    
biz_residential = residential.join('APN', biz)



In [38]:

    
biz_residential









    Out[38]:





    
        
            APN  UseCode  UseCodeDes  BldgSqft  LotSqft  latitude  longitude  Category  Size  Minority_Owned  Female_Owned  RecordID  BusDesc  B1_PER_SUB_TYPE  B1_APPL_STATUS  DBA  NAICS  Tax_Code  Employee_Num  Bus_Own_Type  B1_BUSINESS_NAME  B1_ADDRESS1  B1_ADDRESS2  B1_CITY  B1_STATE  B1_ZIP  B1_CONTACT_TYPE  B1_FULL_ADDRESS  B1_SITUS_CITY  B1_SITUS_STATE  B1_SITUS_ZIP  InDbDate  Business_Location
        
    
    
        
            016 141404600  1100    SINGLE FAMILY RESIDENTIAL  450      731     37.8531  -122.258  Residential  small  No             Yes          BL-053881  MARKETING SERVICE                        Professional SemiProfessional  Active         TLS MARKETING SERVICES LLC     541613 - Marketing Consulting Services                       P        0            LLC            nan              2418 WOOLSEY ST        nan         BERKELEY       CA       94705      Business Owner  2418   WOOLSEY ST      BERKELEY      CA             94705        06/10/2015 03:30:33 AM  2418 WOOLSEY ST
BERKELEY, CA 94705
(37.85322337100007, - ...
        
    
        
            048H766303700  1100    SINGLE FAMILY RESIDENTIAL  0        810     37.8575  -122.239  Residential  small  Yes            Yes          BL-049484  WINE IMPORT/WHOLESALE                    Wholesale Trade               Active         VINOSUR LLC                    424820 - Wine and Distilled Alcoholic Beverage Merchant  ...  W        2            LLC            nan              617 ALVARADO RD        nan         BERKELEY       CA       94705      Business Owner  617   ALVARADO RD      BERKELEY      CA             94705        06/10/2015 03:30:29 AM  617 ALVARADO RD
BERKELEY, CA 94705
(37.857804843000054,  ...
        
    
        
            052 136400300  2400    FOURPLEX OR TRIPLEX W/SFR  4789     4120    37.8489  -122.269  Residential  large  No             No           BL-013577  RENTALS 16 ROOMS                         Rental of Real Property       Active         VINCENT F OSMAN                531110 - Lessors of Residential Buildings and Dwellings      L        0            nan            nan              1920 ALCATRAZ AVE      nan         BERKELEY       CA       94703      Business Owner  1918   ALCATRAZ AVE    BERKELEY      CA             94703        06/10/2015 03:30:13 AM  1918 ALCATRAZ AVE
BERKELEY, CA 94703
(37.849072852000063 ...
        
    
        
            052 141001402  2400    FOURPLEX OR TRIPLEX W/SFR  3772     7749    37.8517  -122.253  Residential  large  Yes            Yes          BL-041295  RES RENTAL PROPERTY/4 UNITS              Rental of Real Property       Active         WU JOHNNY & MONICA             531110 - Lessors of Residential Buildings and Dwellings      L        0            Partnership    nan              3158 COLLEGE AVE APT 4  nan         BERKELEY       CA       94705-2748  Business Owner  3158   COLLEGE AVE     BERKELEY      CA             94705        06/10/2015 03:30:22 AM  3158 COLLEGE AVE
BERKELEY, CA 94705
(37.851777257000037, ...
        
    
        
            052 141003400  2400    FOURPLEX OR TRIPLEX W/SFR  4758     6250    37.8523  -122.253  Residential  large  No             No           BL-004326  APARTMENTS 12 ROOMS                      Rental of Real Property       Active         ERDIAKOFF BORIS                531110 - Lessors of Residential Buildings and Dwellings      L        0            Sole Ownership  nan              36711 VIA CICADA       nan         CATHEDRAL CITY  CA       92234      Business Owner  3140   COLLEGE AVE     BERKELEY      CA             94705        06/10/2015 03:30:10 AM  3140 COLLEGE AVE
BERKELEY, CA 94705
(37.852346484000066, ...
        
    
        
            052 141003900  7300    CONDOMINIUMS-SINGLE RESDL  993      1476    37.8519  -122.253  Residential  small  Yes            Yes          BL-051708  ONLINE SHOW- BLOGGING ADVERTISING        Entertainment Recreation      Closed         FLIPTIZZY                      516110                                                       E        0            nan            nan              3154 COLLEGE AVE 3     nan         BERKELEY       CA       94705      Business Owner  3154   COLLEGE AVE 3   BERKELEY      CA             94705        06/10/2015 03:30:31 AM  3154 COLLEGE AVE 3
BERKELEY, CA 94705
(37.85192705400004 ...
        
    
        
            052 142503000  2300    TRIPLEX OR DUPLEX W/ SFR  3406     4950    37.8523  -122.265  Residential  large  No             No           BL-034453  RENTAL PROPERTY/3 UNITS                  Rental of Real Property       Active         CURTIN MICHAEL/ICHINOSE A      531110 - Lessors of Residential Buildings and Dwellings      L        0            nan            nan              2126 WOOLSEY ST        nan         BERKELEY       CA       947051831  Business Owner  2126   WOOLSEY ST      BERKELEY      CA             94705        06/10/2015 03:30:19 AM  2126 WOOLSEY ST
BERKELEY, CA 94705
(37.852503195000054,  ...
        
    
        
            052 142503200  2100    2,3,OR 4 SINGLE FAM HOMES  2934     5000    37.8523  -122.265  Residential  small  No             No           BL-005481  GARDENER                                 Business Personal Repair Svs  Active         GROUNDWORKS GARDENING          561510 - Travel Agencies                                     B        0            Sole Ownership  nan              2122 WOOLSEY ST        nan         BERKELEY       CA       94705      Business Owner  2122   WOOLSEY ST      BERKELEY      CA             94705        06/10/2015 03:30:10 AM  2122 WOOLSEY ST
BERKELEY, CA 94705
(37.852483606000078,  ...
        
    
        
            052 142900200  1100    SINGLE FAMILY RESIDENTIAL  1943     3800    37.852   -122.268  Residential  small  No             No           BL-035716  FRENCH LANGUAGE/CULTURE EDU (NON-PROFIT)  Non Profit Organizations      Active         ALLIANCE FRANCAISE DE BERKELEY  611630 - Language Schools                                    N        9            Corporation    nan              2004 WOOLSEY ST        nan         BERKELEY       CA       94703      Business Owner  2004   WOOLSEY ST      BERKELEY      CA             94703        06/10/2015 03:30:19 AM  2004 WOOLSEY ST
BERKELEY, CA 94703
(37.852133312000035,  ...
        
    
        
            052 143300400  2300    TRIPLEX OR DUPLEX W/ SFR  2598     5400    37.8469  -122.27   Residential  small  Yes            Yes          BL-021962  RESIDENTIAL RENTAL 3 UNITS               Rental of Real Property       Active         PAMELA THOMAS & MICHAEL THOMAS  531110 - Lessors of Residential Buildings and Dwellings      L        0            Partnership    nan              2908 GOMER DR          nan         SAN PABLO      CA       94806      Business Owner  1824   SIXTY-SECOND ST  BERKELEY      CA             94703        06/10/2015 03:30:14 AM  1824 SIXTY-SECOND ST
BERKELEY, CA 94703
(37.847103602000 ...
        
    

... (2619 rows omitted)



In [39]:

    
biz_residential.select(['latitude', 'longitude']).points('latitude','longitude')









    Out[39]:



In [40]:

    
biz_residential.num_rows









    Out[40]:





2629

What fraction of residential properties have business licenses?



In [41]:

    
biz_residential.num_rows / residential.num_rows









    Out[41]:





0.1261092723173598



In [42]:

    
residential.select(['BldgSqft','LotSqft']).stats(summary_ops)









    Out[42]:





    
        
            statistic  BldgSqft  LotSqft
        
    
    
        
            min        0        0      
        
    
        
            firstQtile  1272     3600   
        
    
        
            median     1775     4800   
        
    
        
            mean       1985.28  5159.13
        
    
        
            thirdQtile  1272     3600   
        
    
        
            max        16264    138114



In [43]:

    
biz_residential.select(['BldgSqft','LotSqft']).stats(summary_ops)









    Out[43]:





    
        
            statistic  BldgSqft  LotSqft
        
    
    
        
            min        0        426    
        
    
        
            firstQtile  1566     4000   
        
    
        
            median     2233     5000   
        
    
        
            mean       2433.89  5388.37
        
    
        
            thirdQtile  1566     4000   
        
    
        
            max        12555    43134



In [44]:

    
bins = np.insert(np.arange(500,6000,100),0,0)
bins









    Out[44]:





array([   0,  500,  600,  700,  800,  900, 1000, 1100, 1200, 1300, 1400,
       1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500,
       2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600,
       3700, 3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700,
       4800, 4900, 5000, 5100, 5200, 5300, 5400, 5500, 5600, 5700, 5800,
       5900])



In [45]:

    
residential.select(['BldgSqft']).hist(bins=bins)



In [46]:

    
biz_residential.select(['BldgSqft']).hist(bins=bins)



In [47]:

    
res_dist = residential.select(['BldgSqft']).bin(bins=bins, normed=True)
res_dist.hist(counts='bin', bins='bin')



In [48]:

    
biz_dist = biz_residential.select(['BldgSqft']).bin(bins=bins, normed=True)
biz_dist.hist(counts='bin', bins='bin')



In [49]:

    
biz_dist









    Out[49]:





    
        
            bin  BldgSqft density
        
    
    
        
            0    2.30415e-06     
        
    
        
            500  2.68817e-05     
        
    
        
            600  3.45622e-05     
        
    
        
            700  9.98464e-05     
        
    
        
            800  0.000142089     
        
    
        
            900  0.000245776     
        
    
        
            1000  0.000303379     
        
    
        
            1100  0.000433948     
        
    
        
            1200  0.000334101     
        
    
        
            1300  0.000353303     
        
    

... (46 rows omitted)



In [50]:

    
res_dist['Biz Bldg Sqft'] = biz_dist['BldgSqft density']
res_dist.relabel('BldgSqft density','Res Bldg Sqft')
res_dist









    Out[50]:





    
        
            bin  Res Bldg Sqft  Biz Bldg Sqft
        
    
    
        
            0    9.3503e-06    2.30415e-06  
        
    
        
            500  3.61481e-05   2.68817e-05  
        
    
        
            600  7.85618e-05   3.45622e-05  
        
    
        
            700  0.000180258   9.98464e-05  
        
    
        
            800  0.000279063   0.000142089  
        
    
        
            900  0.000392327   0.000245776  
        
    
        
            1000  0.000538365   0.000303379  
        
    
        
            1100  0.000572103   0.000433948  
        
    
        
            1200  0.000521978   0.000334101  
        
    
        
            1300  0.00055668    0.000353303  
        
    

... (46 rows omitted)



In [51]:

    
res_dist['Var Dist'] = res_dist['Res Bldg Sqft']-res_dist['Biz Bldg Sqft']



In [52]:

    
res_dist









    Out[52]:





    
        
            bin  Res Bldg Sqft  Biz Bldg Sqft  Var Dist
        
    
    
        
            0    9.3503e-06    2.30415e-06   7.04615e-06
        
    
        
            500  3.61481e-05   2.68817e-05   9.26634e-06
        
    
        
            600  7.85618e-05   3.45622e-05   4.39996e-05
        
    
        
            700  0.000180258   9.98464e-05   8.04119e-05
        
    
        
            800  0.000279063   0.000142089   0.000136974
        
    
        
            900  0.000392327   0.000245776   0.000146551
        
    
        
            1000  0.000538365   0.000303379   0.000234986
        
    
        
            1100  0.000572103   0.000433948   0.000138156
        
    
        
            1200  0.000521978   0.000334101   0.000187877
        
    
        
            1300  0.00055668    0.000353303   0.000203378
        
    

... (46 rows omitted)



In [53]:

    
res_dist.bar('bin')

It sure looks like non-biz-license residences tend to be smaller and biz-license residences tend to be larger.

A statistics based entirely on the differences of these normalized histograms is the Total Variational Distance



In [54]:

    
# Total variational distance
TVD = np.sum(np.abs(res_dist['Var Dist']))
TVD









    Out[54]:





0.0038034167304116374



In [55]:

    
res_size = residential.select(['APN','BldgSqft'])
res_size









    Out[55]:





    
        
            APN  BldgSqft
        
    
    
        
            057 209002100  972     
        
    
        
            057 208303100  786     
        
    
        
            052 152702400  1387    
        
    
        
            052 157310800  630     
        
    
        
            052 151603400  935     
        
    
        
            056 200500400  1403    
        
    
        
            058 217002900  973     
        
    
        
            059 228904100  801     
        
    
        
            059 229005800  910     
        
    
        
            058 217502700  979     
        
    

... (20837 rows omitted)



In [56]:

    
biz_residential.select(['APN','BldgSqft'])









    Out[56]:





    
        
            APN  BldgSqft
        
    
    
        
            016 141404600  450     
        
    
        
            048H766303700  0       
        
    
        
            052 136400300  4789    
        
    
        
            052 141001402  3772    
        
    
        
            052 141003400  4758    
        
    
        
            052 141003900  993     
        
    
        
            052 142503000  3406    
        
    
        
            052 142503200  2934    
        
    
        
            052 142900200  1943    
        
    
        
            052 143300400  2598    
        
    

... (2619 rows omitted)



In [57]:

    
"016 141404600" in biz_residential['APN']









    Out[57]:





True



In [58]:

    
def is_biz(apn):
    return (apn in biz_residential['APN'])



In [59]:

    
is_biz("052 136400300")









    Out[59]:





True



In [60]:

    
is_biz("052 136400301")









    Out[60]:





False



In [61]:

    
res_size['Biz']=res_size.apply(is_biz,'APN')



In [62]:

    
res_size









    Out[62]:





    
        
            APN  BldgSqft  Biz
        
    
    
        
            057 209002100  972      False
        
    
        
            057 208303100  786      False
        
    
        
            052 152702400  1387     True 
        
    
        
            052 157310800  630      False
        
    
        
            052 151603400  935      False
        
    
        
            056 200500400  1403     False
        
    
        
            058 217002900  973      False
        
    
        
            059 228904100  801      False
        
    
        
            059 229005800  910      False
        
    
        
            058 217502700  979      False
        
    

... (20837 rows omitted)



In [63]:

    
# compute a statistic on this
print(np.mean(res_size['BldgSqft']))
print(np.mean(res_size.where('Biz')['BldgSqft']))
print(np.mean(res_size.where('Biz',False)['BldgSqft']))









    



1985.2809517
2433.89235451
1920.5428148



In [64]:

    
np.mean(res_size.where('Biz',True)['BldgSqft'])-np.mean(res_size.where('Biz',False)['BldgSqft'])









    Out[64]:





513.34953970886636



In [65]:

    
# Permuted version of the biz column
permuted_biz = res_size.select(['Biz']).sample()
res_size['pBiz']=permuted_biz['Biz']
res_size









    Out[65]:





    
        
            APN  BldgSqft  Biz  pBiz
        
    
    
        
            057 209002100  972      False  True 
        
    
        
            057 208303100  786      False  True 
        
    
        
            052 152702400  1387     True  False
        
    
        
            052 157310800  630      False  True 
        
    
        
            052 151603400  935      False  False
        
    
        
            056 200500400  1403     False  False
        
    
        
            058 217002900  973      False  False
        
    
        
            059 228904100  801      False  False
        
    
        
            059 229005800  910      False  False
        
    
        
            058 217502700  979      False  False
        
    

... (20837 rows omitted)



In [66]:

    
num_permutations = 300
for i in range(num_permutations):
    permuted_biz = res_size.select(['Biz']).sample()
    res_size['pBiz-' + str(i)] = permuted_biz['Biz']
res_size









    Out[66]:





    
        
            APN  BldgSqft  Biz  pBiz  pBiz-0  pBiz-1  pBiz-2  pBiz-3  pBiz-4  pBiz-5  pBiz-6  pBiz-7  pBiz-8  pBiz-9  pBiz-10  pBiz-11  pBiz-12  pBiz-13  pBiz-14  pBiz-15  pBiz-16  pBiz-17  pBiz-18  pBiz-19  pBiz-20  pBiz-21  pBiz-22  pBiz-23  pBiz-24  pBiz-25  pBiz-26  pBiz-27  pBiz-28  pBiz-29  pBiz-30  pBiz-31  pBiz-32  pBiz-33  pBiz-34  pBiz-35  pBiz-36  pBiz-37  pBiz-38  pBiz-39  pBiz-40  pBiz-41  pBiz-42  pBiz-43  pBiz-44  pBiz-45  pBiz-46  pBiz-47  pBiz-48  pBiz-49  pBiz-50  pBiz-51  pBiz-52  pBiz-53  pBiz-54  pBiz-55  pBiz-56  pBiz-57  pBiz-58  pBiz-59  pBiz-60  pBiz-61  pBiz-62  pBiz-63  pBiz-64  pBiz-65  pBiz-66  pBiz-67  pBiz-68  pBiz-69  pBiz-70  pBiz-71  pBiz-72  pBiz-73  pBiz-74  pBiz-75  pBiz-76  pBiz-77  pBiz-78  pBiz-79  pBiz-80  pBiz-81  pBiz-82  pBiz-83  pBiz-84  pBiz-85  pBiz-86  pBiz-87  pBiz-88  pBiz-89  pBiz-90  pBiz-91  pBiz-92  pBiz-93  pBiz-94  pBiz-95  pBiz-96  pBiz-97  pBiz-98  pBiz-99  pBiz-100  pBiz-101  pBiz-102  pBiz-103  pBiz-104  pBiz-105  pBiz-106  pBiz-107  pBiz-108  pBiz-109  pBiz-110  pBiz-111  pBiz-112  pBiz-113  pBiz-114  pBiz-115  pBiz-116  pBiz-117  pBiz-118  pBiz-119  pBiz-120  pBiz-121  pBiz-122  pBiz-123  pBiz-124  pBiz-125  pBiz-126  pBiz-127  pBiz-128  pBiz-129  pBiz-130  pBiz-131  pBiz-132  pBiz-133  pBiz-134  pBiz-135  pBiz-136  pBiz-137  pBiz-138  pBiz-139  pBiz-140  pBiz-141  pBiz-142  pBiz-143  pBiz-144  pBiz-145  pBiz-146  pBiz-147  pBiz-148  pBiz-149  pBiz-150  pBiz-151  pBiz-152  pBiz-153  pBiz-154  pBiz-155  pBiz-156  pBiz-157  pBiz-158  pBiz-159  pBiz-160  pBiz-161  pBiz-162  pBiz-163  pBiz-164  pBiz-165  pBiz-166  pBiz-167  pBiz-168  pBiz-169  pBiz-170  pBiz-171  pBiz-172  pBiz-173  pBiz-174  pBiz-175  pBiz-176  pBiz-177  pBiz-178  pBiz-179  pBiz-180  pBiz-181  pBiz-182  pBiz-183  pBiz-184  pBiz-185  pBiz-186  pBiz-187  pBiz-188  pBiz-189  pBiz-190  pBiz-191  pBiz-192  pBiz-193  pBiz-194  pBiz-195  pBiz-196  pBiz-197  pBiz-198  pBiz-199  pBiz-200  pBiz-201  pBiz-202  pBiz-203  pBiz-204  pBiz-205  pBiz-206  pBiz-207  pBiz-208  pBiz-209  pBiz-210  pBiz-211  pBiz-212  pBiz-213  pBiz-214  pBiz-215  pBiz-216  pBiz-217  pBiz-218  pBiz-219  pBiz-220  pBiz-221  pBiz-222  pBiz-223  pBiz-224  pBiz-225  pBiz-226  pBiz-227  pBiz-228  pBiz-229  pBiz-230  pBiz-231  pBiz-232  pBiz-233  pBiz-234  pBiz-235  pBiz-236  pBiz-237  pBiz-238  pBiz-239  pBiz-240  pBiz-241  pBiz-242  pBiz-243  pBiz-244  pBiz-245  pBiz-246  pBiz-247  pBiz-248  pBiz-249  pBiz-250  pBiz-251  pBiz-252  pBiz-253  pBiz-254  pBiz-255  pBiz-256  pBiz-257  pBiz-258  pBiz-259  pBiz-260  pBiz-261  pBiz-262  pBiz-263  pBiz-264  pBiz-265  pBiz-266  pBiz-267  pBiz-268  pBiz-269  pBiz-270  pBiz-271  pBiz-272  pBiz-273  pBiz-274  pBiz-275  pBiz-276  pBiz-277  pBiz-278  pBiz-279  pBiz-280  pBiz-281  pBiz-282  pBiz-283  pBiz-284  pBiz-285  pBiz-286  pBiz-287  pBiz-288  pBiz-289  pBiz-290  pBiz-291  pBiz-292  pBiz-293  pBiz-294  pBiz-295  pBiz-296  pBiz-297  pBiz-298  pBiz-299
        
    
    
        
            057 209002100  972      False  True  False  False  True   False  False  False  False  True   False  True   False   False   False   True    False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    True    False   True    True    False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   True    False   False   False   True    False   True    False   False   False   False   False   False   False   False   False   True    False   False   True    False   False   False   False   False   True    False   False   False   False   False   False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    True     False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    True     False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    True     True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    True     False    True     False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    True     False    False    True     True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    True     False    True     False    False    False    True     False    True     False    False    False    True     False    False    True     False    True     False    False    False    False    True     False    False    False    True     False    False    False   
        
    
        
            057 208303100  786      False  True  False  True   False  False  False  False  False  False  False  False  False   False   False   False   False   False   False   True    True    False   False   False   False   False   False   False   True    False   False   False   True    False   False   False   False   False   False   False   False   False   False   True    False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   True    False   False   False   False   False   True    False   False   False   False   False   True    False   False   False   False   False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    True     True     False    False    False    False    False    True     False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    True     True     False    False    False    False    False    False    False    False    True     False    False    True     False    False    False    False    False    True     False    False    True     False    False    False    False    False    False    False    False    True     False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    True     False    False    False    False   
        
    
        
            052 152702400  1387     True  False  False  False  False  False  False  False  True   True   False  False  False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   True    False   True    False   False   True    False   False   False   True    True    False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   True    False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    True     False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    True     False    True     True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    True     False    False    True     False    False    False    True     False    False    False    False    False    True     False    True     False    False    True     True     False    False    False    False    False    False    False    True     False    False    False    True     True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    True     False    False    False    False    False    False    False    False    False    False   
        
    
        
            052 157310800  630      False  True  False  False  False  False  False  True   False  False  False  False  False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   True    False   False   False   True    False   False   True    True    False   False   False   False   False   False   False   True    False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   True    False   False   False   False   True    False   False   False   True    False   False   False   False   True    True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True     True     False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    True     False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    True     True     False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    True     True     False    False    False    False    False    False    False    True     False    True     False    False    False    True     False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    True     False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    True     False    False    False    True     False    False    False    False    False    False    False    False    False    False   
        
    
        
            052 151603400  935      False  False  False  False  False  False  False  False  True   False  False  False  False   False   False   False   True    False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   True    False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False    False    False    False    False    False    False    False    True     True     False    False    False    False    False    False    False    False    False    True     True     False    False    False    False    False    False    False    False    False    True     False    True     False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    True     False    False    False    False    False    False    True     True     False    False    False    False    False    False    False    True     True     False    False    True     False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False   
        
    
        
            056 200500400  1403     False  False  False  False  False  True   False  False  False  True   False  False  False   False   False   False   False   False   False   False   False   True    True    False   False   False   False   True    False   False   False   True    False   False   False   False   False   True    False   False   True    False   False   False   False   False   False   True    True    False   True    True    False   False   False   False   True    False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   True    False   False   False   False   False   True    False   False   False   False   False   True    False   True    False   False   False   False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    True     False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     True     False    False    False    False    False    False    False    True    
        
    
        
            058 217002900  973      False  False  True   True   False  False  False  False  False  False  False  False  False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   True    False   False   True    False   False   False   False   False   False   False   False   False    True     False    False    False    False    True     False    True     False    False    False    False    True     False    False    True     False    False    False    False    False    False    True     True     False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    True     False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    True     False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    True     False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     True     False    False    True     False    False    False    False    False    False    False    False    False    True     False    False    True     False    False    False    False    False    False   
        
    
        
            059 228904100  801      False  False  False  False  False  False  True   False  False  False  False  False  False   False   False   True    False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   True    True    False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     True     False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    True     False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    True     True     False    True     False    False    True     False    False    False    False    True     False    True     False    True     False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    False    False    False    True     False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False   
        
    
        
            059 229005800  910      False  False  False  False  False  False  False  False  False  False  False  False  False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   True    True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   True    True    True    False   False   True    False   False   False   False   False   True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   True    False   False   False    True     False    False    False    False    True     False    False    True     False    False    False    False    False    True     False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    True     False    True     False    False    False    False    False    False    True     False    False    True     True     False    True     False    False    False    False    False    False    True     False    False    True     False    False    True     False    False    False    False    False    False    False    False    False    True     False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    True     False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    True     False    True     True     False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    True     True     False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False   
        
    
        
            058 217502700  979      False  False  False  False  True   False  False  False  False  True   True   False  False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   False   False   True    True    True    False   False   False   False   False   False   False   False   True    False   False   True    False   True    False   True    False   False   False   False   False   False   False   False   False   False   True    False   False   False   False   False   False   True    False   False   False   False   False   True    True    True    False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False   False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    True     False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    True     False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    True     False    True     False    False    False    True     False    False    True     False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    True     False    False    False    True     False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    True     False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False    False   
        
    

... (20837 rows omitted)



In [67]:

    
# Under the null hypothesis (biz license is irrelevant) we compute the
# statistic on the many permuted labels under the null hypothesis
null_hypothesis = Table.empty(['diff mean'])
for i in range(num_permutations):
    p = 'pBiz-' + str(i)
    stat = np.mean(res_size.where(p,True)['BldgSqft']) - np.mean(res_size.where(p,False)['BldgSqft'])
    null_hypothesis.append([stat])
null_hypothesis.hist(bins=20)



In [68]:

    
observation = np.mean(res_size.where('Biz',True)['BldgSqft'])-np.mean(res_size.where('Biz',False)['BldgSqft'])



In [69]:

    
observation









    Out[69]:





513.34953970886636



In [70]:

    
def ptest(x):
    return (x > observation)



In [71]:

    
null_hypothesis['ptest'] = null_hypothesis.apply(ptest,'diff mean')
null_hypothesis









    Out[71]:





    
        
            diff mean  ptest
        
    
    
        
            20.0924   False
        
    
        
            -16.929   False
        
    
        
            -12.9037  False
        
    
        
            40.3312   False
        
    
        
            23.8195   False
        
    
        
            23.2089   False
        
    
        
            -27.6382  False
        
    
        
            -0.990495  False
        
    
        
            -12.6664  False
        
    
        
            25.2132   False
        
    

... (290 rows omitted)



In [72]:

    
p_value = null_hypothesis.where('ptest').num_rows/num_permutations
p_value









    Out[72]:





0.0

CONDO	APN	POINT_X	POINT_Y	LocationID	StreetNum	Prequalifi	Direction	StreetName	StreetSufx	Unit	City	State	Zip	UseCode	UseCodeDes	BldgSqft	LotSqft	latitude	longitude	X_min	X_max	Y_min	Y_max	Shape_area	Shape_len
YES	055 183213100	565028	4.19089e+06	71843	2550	nan	nan	DANA	ST	nan	BERKELEY	CA	94704	7390	CONDOMINIUM-COMMON AREA	0	0	37.8632	-122.261	565007	565050	4.19088e+06	4.19091e+06	1018.78	129.821
YES	053 166004000	562508	4.19004e+06	67007	1012	nan	nan	GRAYSON	ST	A	BERKELEY	CA	94710	4200	INDUSTRIAL LIGHT/MANUFAC.	1932	1667	37.8557	-122.289	562497	562520	4.19002e+06	4.19006e+06	476.854	92.7454
YES	053 166004200	562508	4.19004e+06	67009	1012	nan	nan	GRAYSON	ST	C	BERKELEY	CA	94710	4101	CONDOMINIUM-INDUSTRIAL	1932	1667	37.8557	-122.289	562497	562520	4.19002e+06	4.19006e+06	476.854	92.7454
YES	057 209002100	562032	4.19171e+06	47615	1813	nan	nan	NINTH	ST	nan	BERKELEY	CA	94710	7300	CONDOMINIUMS-SINGLE RESDL	972	2600	37.8707	-122.295	562012	562053	4.1917e+06	4.19172e+06	483.811	103.162
YES	057 208303100	562499	4.19172e+06	12322	1901	nan	nan	CURTIS	ST	nan	BERKELEY	CA	94702	7300	CONDOMINIUMS-SINGLE RESDL	786	2700	37.8708	-122.289	562483	562515	4.19171e+06	4.19173e+06	533.749	96.0604
YES	052 152702400	564234	4.18923e+06	65546	1829	nan	nan	SIXTY-THIRD	ST	nan	BERKELEY	CA	94703	7300	CONDOMINIUMS-SINGLE RESDL	1387	2363	37.8483	-122.27	564226	564243	4.18921e+06	4.18926e+06	438.797	103.952
YES	052 157310800	565682	4.19013e+06	48709	2641	nan	nan	WEBSTER	ST	1	BERKELEY	CA	94705	7300	CONDOMINIUMS-SINGLE RESDL	630	1167	37.8562	-122.253	565672	565692	4.19011e+06	4.19014e+06	413.513	89.6718
YES	052 151603400	563132	4.18952e+06	65541	1314	nan	nan	HASKELL	ST	A	BERKELEY	CA	94702	1160	SINGLE FAMILY RESIDENTIAL	935	1575	37.851	-122.282	563120	563144	4.1895e+06	4.18955e+06	570.197	116.043
YES	055 183203100	565019	4.19095e+06	7145	2522	nan	nan	DANA	ST	101	BERKELEY	CA	94704	9300	MEDICAL-DENTAL BUILDING	1228	1270	37.8637	-122.261	564998	565040	4.19093e+06	4.19097e+06	1034.69	130.691
YES	056 200500400	563426	4.19151e+06	47852	2122	nan	nan	JEFFERSON	AVE	nan	BERKELEY	CA	94703	7340	CONDOMINIUM-SFR,R&T 402.1	1403	2166	37.8689	-122.279	563369	563480	4.19142e+06	4.19161e+06	16800.6	587.516

APN	BldgSqft	LotSqft
054 177300304	470587	742601
054 174800201	360996	625771
052 157308707	337730	84800
055 189700501	281496	115103
054 177000801	274722	444431
053 165200105	263476	449649
057 205401201	245629	69261
056 195500600	225513	74710
053 165500900	202214	308807
057 202601203	193269	30702

statistic	BldgSqft	LotSqft
min	0	0
max	470587	1.90724e+08
median	1885	5000
sum	8.74759e+07	4.2726e+08

UseCode	APN	UseCodeDes
300	['056 200507900' '057 211900100' '060 251200300' '052 15 ...	['EXEMPT PUBLIC AGENCIES' 'EXEMPT PUBLIC AGENCIES' 'EXEM ...
400	['057 203401402' '060 236001001']	['LEASED TO PUBLIC UTILITY' 'LEASED TO PUBLIC UTILITY']
500	['052 143901400' '052 151902600' '052 151400800' '053 16 ...	['OWNED BY PUBLIC UTILITY' 'OWNED BY PUBLIC UTILITY' 'O ...
750	['060 900800700' '060 900801000' '060 900801100' '060 90 ...	['FLOATING HOME' 'FLOATING HOME' 'FLOATING HOME' 'FLOATI ...
1000	['056 197702300' '056 197702400' '052 152400200' '052 15 ...	['VACANT RES LAND <=4 UNITS' 'VACANT RES LAND <=4 UNITS' ...
1040	['057 201500900']	['VACANT RES LAND, R&T402.1']
1100	['052 156707900' '058 218205000' '052 143501200' ..., '0 ...	['SINGLE FAMILY RESIDENTIAL' 'SINGLE FAMILY RESIDENTIAL' ...
1101	['055 190802800']	['SINGLE FAMILY RESIDENTIAL']
1120	['053 161100800' '053 167701000' '055 184003400']	['RESDL IMPR ON COMML LAND' 'RESDL IMPR ON COMML LAND' ...
1130	['056 196204800' '053 163501001' '053 163502000']	['SINGLE FAMILY RESIDENTIAL' 'SINGLE FAMILY RESIDENTIAL' ...

UseCodeDes	APN len
1 TO 5 STORY OFFICE BLDG.	162
2 UNITS-LESSER QUAL 22	899
2,3 OR 4 UNITS W/ ROOMING	45
2,3,OR 4 SINGLE FAM HOMES	666
3 UNITS-LESSER QUAL 23	341
4 UNITS-LESSER QUAL 24	179
5+SINGLE FAMILY RES HOMES	2
> 1 MOBILE OR MH W/ RESDL	1
AUTOMOBILE DEALERSHIP	11
BANK	19

UseCodeDes	APN len
SINGLE FAMILY RESIDENTIAL	16568
CONDOMINIUMS-SINGLE RESDL	1921
MULTIPLE RESIDL BLDG. 5+U	1161
DOUBLE OR DUPLEX TYPE- 2U	976
2 UNITS-LESSER QUAL 22	899
2,3,OR 4 SINGLE FAM HOMES	666
CONDOMINIUM-COMMON AREA	511
FOURPLEX OR TRIPLEX W/SFR	509
VACANT RES LAND <=4 UNITS	491
EXEMPT PUBLIC AGENCIES	396
SFR HOME W/NON-ECO 2ND UN	380
TRIPLEX OR DUPLEX W/ SFR	374
ONE STORY STORE	350
3 UNITS-LESSER QUAL 23	341
STORE 1ST FLR W/ OFF/APTS	327
INDUSTRIAL LIGHT/MANUFAC.	208
SFR PLAN DEVT W/ COM AREA	183
4 UNITS-LESSER QUAL 24	179
CONDO - COMMERCIAL RETAIL	165
1 TO 5 STORY OFFICE BLDG.	162

Category	Keys	Color
Residential	['RESIDENTIAL', 'SFR', 'SINGLE RESDL', 'SINGLE FAM']	blue
Multi-unit	['UNIT', 'FRAT', 'SORO']	orange
Industrial	['INDUSTRIAL']	red
Commercial	['COMMERCIAL', 'DEALERSHIP', 'BANK', 'RESTAURANT', 'STOR ...	green
Public	['PUBLIC', 'SCHOOL']	yellow

Category	APN	UseCode	UseCodeDes	BldgSqft	LotSqft	latitude	longitude	Keys	Color
Industrial	054 177300304	4300	HEAVY INDUSTRIAL	470587	742601	37.8589	-122.296	['INDUSTRIAL']	red
Industrial	054 174800201	4300	HEAVY INDUSTRIAL	360996	625771	37.8556	-122.295	['INDUSTRIAL']	red
Industrial	054 177000801	4300	HEAVY INDUSTRIAL	274722	444431	37.8572	-122.296	['INDUSTRIAL']	red
Commercial	053 165200105	3300	MISC. IMPROVED COMMERCIAL	263476	449649	37.8531	-122.288	['COMMERCIAL', 'DEALERSHIP', 'BANK', 'RESTAURANT', 'STOR ...	green
Public	057 205401201	300	EXEMPT PUBLIC AGENCIES	245629	69261	37.8722	-122.271	['PUBLIC', 'SCHOOL']	yellow
Commercial	053 165500900	9400	1 TO 5 STORY OFFICE BLDG.	202214	308807	37.8539	-122.293	['COMMERCIAL', 'DEALERSHIP', 'BANK', 'RESTAURANT', 'STOR ...	green
Commercial	057 202601203	9500	OVER 5 STORY OFFICE BLDG.	193269	30702	37.8697	-122.269	['COMMERCIAL', 'DEALERSHIP', 'BANK', 'RESTAURANT', 'STOR ...	green
Public	055 187900601	300	EXEMPT PUBLIC AGENCIES	189867	32685	37.8672	-122.259	['PUBLIC', 'SCHOOL']	yellow
Public	057 202302003	300	EXEMPT PUBLIC AGENCIES	175500	34267	37.8705	-122.27	['PUBLIC', 'SCHOOL']	yellow
Industrial	053 164400701	4200	INDUSTRIAL LIGHT/MANUFAC.	173288	168141	37.8515	-122.295	['INDUSTRIAL']	red

statistic	BldgSqft	LotSqft
min	0	0
firstQtile	1272	3600
median	1775	4800
mean	1985.28	5159.13
thirdQtile	1272	3600
max	16264	138114

APN	Minority_Owned	Female_Owned	RecordID	BusDesc	B1_PER_SUB_TYPE	B1_APPL_STATUS	DBA	NAICS	Tax_Code	Employee_Num	Bus_Own_Type	B1_BUSINESS_NAME	B1_ADDRESS1	B1_ADDRESS2	B1_CITY	B1_STATE	B1_ZIP	B1_CONTACT_TYPE	B1_FULL_ADDRESS	B1_SITUS_CITY	B1_SITUS_STATE	B1_SITUS_ZIP	InDbDate	Business_Location
052 155900100	nan	nan	BL-038949	CONSULTING-ENGINEER	Business Personal Repair Svs	Unreported	WILLIAM J. CAIN	nan	nan	nan	nan	nan	2510 PRINCE ST	nan	BERKELEY	CA	94705	Business Owner	2510 PRINCE ST	BERKELEY	CA	94705	06/10/2015 03:30:21 AM	2510 PRINCE ST BERKELEY, CA 94705 (37.854112933000067, - ...
056 198304201	No	No	BL-001767	AUTO REPAIR & SERVICE	Business Personal Repair Svs	Delinquent	BEYOND REPAIR JAPANESE	nan	B	4	nan	nan	2147 SAN PABLO AVE	nan	BERKELEY	CA	94702	Business Owner	2147 SAN PABLO AVE	BERKELEY	CA	94702	06/10/2015 03:30:08 AM	2147 SAN PABLO AVE BERKELEY, CA 94702 (37.86698046400005 ...
057 203100900	No	No	BL-029009	LAW OFFICE	Professional SemiProfessional	Closed	RUMBERGER TIMOTHY P LAW OFFICE	445110 - Supermarkets and Other Grocery (except Convenie ...	P	0	Sole Ownership	nan	1339 BAY ST	nan	ALAMEDA	CA	94501	Business Owner	2161 SHATTUCK AVE	BERKELEY	CA	94704	06/10/2015 03:30:17 AM	2161 SHATTUCK AVE BERKELEY, CA 94704 (37.870098538000036 ...
055 182201800	No	Yes	BL-046854	PSYCHOTHERAPY OFFICE	Professional SemiProfessional	Active	WEKSLER ORIT MFT	621330 - Offices of Mental Health Practitioners (except ...	P	1	Sole Ownership	nan	923 BALRA DRIVE	nan	EL CERRITO	CA	94530	Business Owner	2000 DWIGHT WAY D	BERKELEY	CA	94704	06/10/2015 03:30:26 AM	2000 DWIGHT WAY D BERKELEY, CA 94704 (37.863643455000044 ...
058 217301900	No	Yes	BL-043569	RENTAL PROPERTY	Rental of Real Property	Closed	NORLUND BRIDGET	531110 - Lessors of Residential Buildings and Dwellings	L	0	Corporation	nan	574 MONTGOMERY ST	nan	NAPA	CA	94559	Business Owner	1601 MILVIA ST	BERKELEY	CA	94709	06/10/2015 03:30:24 AM	1601 MILVIA ST BERKELEY, CA 94709 (37.877974384000026, - ...
057 202800400	nan	nan	BL-000393	ATTORNE AT LAW	Business Personal Repair Svs	Unreported	ANDERSON & DROSSEL	nan	nan	nan	nan	nan	2041 BANCROFT WAY, SUITE 207	nan	KENSINGTON	CA	94704	Business Owner	2041 BANCROFT WAY 207	BERKELEY	CA	94704	06/10/2015 03:30:06 AM	2041 BANCROFT WAY 207 BERKELEY, CA 94704 (37.86745577600 ...
063 298303101	No	Yes	BL-052271	ECONOMIC CONSULTING	Professional SemiProfessional	Active	VERNAZZA WOLFE ASSOCIATES INC	541690 - Other Scientific and Technical Consulting Services	P	1	Corporation	nan	2909 SHASTA RD	nan	BERKELEY	CA	94708-2117	Business Owner	2909 SHASTA RD	BERKELEY	CA	94708	06/10/2015 03:30:31 AM	2909 SHASTA RD BERKELEY, CA 94708 (37.888940650000052, - ...
054 180802000	No	No	BL-011440	GRAPHIC ARTS	Professional SemiProfessional	Active	STINEHOUR CHRISTOPHER	541430 - Graphic Design Services	P	1	Sole Ownership	nan	1701 PARKER STREET	nan	BERKELEY	CA	94703	Business Owner	1701 PARKER ST	BERKELEY	CA	94703	06/10/2015 03:30:12 AM	1701 PARKER ST BERKELEY, CA 94703 (37.861098065000078, - ...
ZZZZZZZZZZZZZ	No	Yes	BL-045996	RECYCLING WASTE DIVERSION	Business Personal Repair Svs	Active	GREEN MARY LLC	562998 - All Other Miscellaneous Waste Management Services	B	10	Corporation	nan	PO BOX 4957	nan	SANTA ROSA	CA	95402	Business Owner	0 VARIOUS	BERKELEY	CA	94704	06/10/2015 03:30:26 AM	0 VARIOUS BERKELEY, CA 94704 (37.86651629000005, -122.26 ...
nan	nan	nan	BL-000763	GENEAL CONTRACTOR	Construction or Contractor	Active	EAST BAY CONSTTRUCTION	236118 - Residential Remodelers	C	0	Sole Ownership	EAST BAY CONSTRUCTION	2817 WENDELL AVE	nan	RICHMOND	CA	94804-1361	Business Owner	0 VARIOUS	BERKELEY	CA	94704	06/10/2015 03:30:06 AM	0 VARIOUS BERKELEY, CA 94704 (37.86651629000005, -122.26 ...

APN	UseCode	UseCodeDes	BldgSqft	LotSqft	latitude	longitude	Category	Size	Minority_Owned	Female_Owned	RecordID	BusDesc	B1_PER_SUB_TYPE	B1_APPL_STATUS	DBA	NAICS	Tax_Code	Employee_Num	Bus_Own_Type	B1_BUSINESS_NAME	B1_ADDRESS1	B1_ADDRESS2	B1_CITY	B1_STATE	B1_ZIP	B1_CONTACT_TYPE	B1_FULL_ADDRESS	B1_SITUS_CITY	B1_SITUS_STATE	B1_SITUS_ZIP	InDbDate	Business_Location
016 141404600	1100	SINGLE FAMILY RESIDENTIAL	450	731	37.8531	-122.258	Residential	small	No	Yes	BL-053881	MARKETING SERVICE	Professional SemiProfessional	Active	TLS MARKETING SERVICES LLC	541613 - Marketing Consulting Services	P	0	LLC	nan	2418 WOOLSEY ST	nan	BERKELEY	CA	94705	Business Owner	2418 WOOLSEY ST	BERKELEY	CA	94705	06/10/2015 03:30:33 AM	2418 WOOLSEY ST BERKELEY, CA 94705 (37.85322337100007, - ...
048H766303700	1100	SINGLE FAMILY RESIDENTIAL	0	810	37.8575	-122.239	Residential	small	Yes	Yes	BL-049484	WINE IMPORT/WHOLESALE	Wholesale Trade	Active	VINOSUR LLC	424820 - Wine and Distilled Alcoholic Beverage Merchant ...	W	2	LLC	nan	617 ALVARADO RD	nan	BERKELEY	CA	94705	Business Owner	617 ALVARADO RD	BERKELEY	CA	94705	06/10/2015 03:30:29 AM	617 ALVARADO RD BERKELEY, CA 94705 (37.857804843000054, ...
052 136400300	2400	FOURPLEX OR TRIPLEX W/SFR	4789	4120	37.8489	-122.269	Residential	large	No	No	BL-013577	RENTALS 16 ROOMS	Rental of Real Property	Active	VINCENT F OSMAN	531110 - Lessors of Residential Buildings and Dwellings	L	0	nan	nan	1920 ALCATRAZ AVE	nan	BERKELEY	CA	94703	Business Owner	1918 ALCATRAZ AVE	BERKELEY	CA	94703	06/10/2015 03:30:13 AM	1918 ALCATRAZ AVE BERKELEY, CA 94703 (37.849072852000063 ...
052 141001402	2400	FOURPLEX OR TRIPLEX W/SFR	3772	7749	37.8517	-122.253	Residential	large	Yes	Yes	BL-041295	RES RENTAL PROPERTY/4 UNITS	Rental of Real Property	Active	WU JOHNNY & MONICA	531110 - Lessors of Residential Buildings and Dwellings	L	0	Partnership	nan	3158 COLLEGE AVE APT 4	nan	BERKELEY	CA	94705-2748	Business Owner	3158 COLLEGE AVE	BERKELEY	CA	94705	06/10/2015 03:30:22 AM	3158 COLLEGE AVE BERKELEY, CA 94705 (37.851777257000037, ...
052 141003400	2400	FOURPLEX OR TRIPLEX W/SFR	4758	6250	37.8523	-122.253	Residential	large	No	No	BL-004326	APARTMENTS 12 ROOMS	Rental of Real Property	Active	ERDIAKOFF BORIS	531110 - Lessors of Residential Buildings and Dwellings	L	0	Sole Ownership	nan	36711 VIA CICADA	nan	CATHEDRAL CITY	CA	92234	Business Owner	3140 COLLEGE AVE	BERKELEY	CA	94705	06/10/2015 03:30:10 AM	3140 COLLEGE AVE BERKELEY, CA 94705 (37.852346484000066, ...
052 141003900	7300	CONDOMINIUMS-SINGLE RESDL	993	1476	37.8519	-122.253	Residential	small	Yes	Yes	BL-051708	ONLINE SHOW- BLOGGING ADVERTISING	Entertainment Recreation	Closed	FLIPTIZZY	516110	E	0	nan	nan	3154 COLLEGE AVE 3	nan	BERKELEY	CA	94705	Business Owner	3154 COLLEGE AVE 3	BERKELEY	CA	94705	06/10/2015 03:30:31 AM	3154 COLLEGE AVE 3 BERKELEY, CA 94705 (37.85192705400004 ...
052 142503000	2300	TRIPLEX OR DUPLEX W/ SFR	3406	4950	37.8523	-122.265	Residential	large	No	No	BL-034453	RENTAL PROPERTY/3 UNITS	Rental of Real Property	Active	CURTIN MICHAEL/ICHINOSE A	531110 - Lessors of Residential Buildings and Dwellings	L	0	nan	nan	2126 WOOLSEY ST	nan	BERKELEY	CA	947051831	Business Owner	2126 WOOLSEY ST	BERKELEY	CA	94705	06/10/2015 03:30:19 AM	2126 WOOLSEY ST BERKELEY, CA 94705 (37.852503195000054, ...
052 142503200	2100	2,3,OR 4 SINGLE FAM HOMES	2934	5000	37.8523	-122.265	Residential	small	No	No	BL-005481	GARDENER	Business Personal Repair Svs	Active	GROUNDWORKS GARDENING	561510 - Travel Agencies	B	0	Sole Ownership	nan	2122 WOOLSEY ST	nan	BERKELEY	CA	94705	Business Owner	2122 WOOLSEY ST	BERKELEY	CA	94705	06/10/2015 03:30:10 AM	2122 WOOLSEY ST BERKELEY, CA 94705 (37.852483606000078, ...
052 142900200	1100	SINGLE FAMILY RESIDENTIAL	1943	3800	37.852	-122.268	Residential	small	No	No	BL-035716	FRENCH LANGUAGE/CULTURE EDU (NON-PROFIT)	Non Profit Organizations	Active	ALLIANCE FRANCAISE DE BERKELEY	611630 - Language Schools	N	9	Corporation	nan	2004 WOOLSEY ST	nan	BERKELEY	CA	94703	Business Owner	2004 WOOLSEY ST	BERKELEY	CA	94703	06/10/2015 03:30:19 AM	2004 WOOLSEY ST BERKELEY, CA 94703 (37.852133312000035, ...
052 143300400	2300	TRIPLEX OR DUPLEX W/ SFR	2598	5400	37.8469	-122.27	Residential	small	Yes	Yes	BL-021962	RESIDENTIAL RENTAL 3 UNITS	Rental of Real Property	Active	PAMELA THOMAS & MICHAEL THOMAS	531110 - Lessors of Residential Buildings and Dwellings	L	0	Partnership	nan	2908 GOMER DR	nan	SAN PABLO	CA	94806	Business Owner	1824 SIXTY-SECOND ST	BERKELEY	CA	94703	06/10/2015 03:30:14 AM	1824 SIXTY-SECOND ST BERKELEY, CA 94703 (37.847103602000 ...

statistic	BldgSqft	LotSqft
min	0	426
firstQtile	1566	4000
median	2233	5000
mean	2433.89	5388.37
thirdQtile	1566	4000
max	12555	43134

bin	BldgSqft density
0	2.30415e-06
500	2.68817e-05
600	3.45622e-05
700	9.98464e-05
800	0.000142089
900	0.000245776
1000	0.000303379
1100	0.000433948
1200	0.000334101
1300	0.000353303

bin	Res Bldg Sqft	Biz Bldg Sqft
0	9.3503e-06	2.30415e-06
500	3.61481e-05	2.68817e-05
600	7.85618e-05	3.45622e-05
700	0.000180258	9.98464e-05
800	0.000279063	0.000142089
900	0.000392327	0.000245776
1000	0.000538365	0.000303379
1100	0.000572103	0.000433948
1200	0.000521978	0.000334101
1300	0.00055668	0.000353303

bin	Res Bldg Sqft	Biz Bldg Sqft	Var Dist
0	9.3503e-06	2.30415e-06	7.04615e-06
500	3.61481e-05	2.68817e-05	9.26634e-06
600	7.85618e-05	3.45622e-05	4.39996e-05
700	0.000180258	9.98464e-05	8.04119e-05
800	0.000279063	0.000142089	0.000136974
900	0.000392327	0.000245776	0.000146551
1000	0.000538365	0.000303379	0.000234986
1100	0.000572103	0.000433948	0.000138156
1200	0.000521978	0.000334101	0.000187877
1300	0.00055668	0.000353303	0.000203378

APN	BldgSqft	Biz
057 209002100	972	False
057 208303100	786	False
052 152702400	1387	True
052 157310800	630	False
052 151603400	935	False
056 200500400	1403	False
058 217002900	973	False
059 228904100	801	False
059 229005800	910	False
058 217502700	979	False

diff mean	ptest
20.0924	False
-16.929	False
-12.9037	False
40.3312	False
23.8195	False
23.2089	False
-27.6382	False
-0.990495	False
-12.6664	False
25.2132	False