Exploratory Data Analysis

Today we are going to be reviewing some basic exploratory data analysis techniques.

For this, and many other exercises in the future, we will be importing the following libraries:



In [39]:

    
from matplotlib import pyplot as plt
from matplotlib import rcParams
import numpy as np
%matplotlib inline

OK. Now let's load some data to explore.

First, let's find the dataset in my hard drive:



In [40]:

    
pwd









    Out[40]:





u'/Users/mberges/Documents/courses/2015/Fall/12-752/data'



In [41]:

    
!ls









    



ac.csv              fc_7.txt            source.txt
campusDemand.csv    fridge.csv          temp.txt
fc_14.txt           public_layout.csv
fc_28.txt           recs2009_public.csv

There are a variety of ways to load the data into memory, so I will focus on one of the simplest ones:



In [42]:

    
file = open('recs2009_public.csv','r')

So far I have only opened the file for reading. Now I need to load it into memory, and for that I will use a CSV package (CSV stands for Comma Separated Values).



In [43]:

    
import csv

The csv package has a reader method, which creates an iterator which iterates over the lines of the file.



In [44]:

    
help(csv.reader)









    



Help on built-in function reader in module _csv:

reader(...)
    csv_reader = reader(iterable [, dialect='excel']
                            [optional keyword args])
        for row in csv_reader:
            process(row)
    
    The "iterable" argument can be any object that returns a line
    of input for each iteration, such as a file object or a list.  The
    optional "dialect" parameter is discussed below.  The function
    also accepts optional keyword arguments which override settings
    provided by the dialect.
    
    The returned object is an iterator.  Each iteration returns a row
    of the CSV file (which can span multiple input lines):



In [45]:

    
reader = csv.reader(file, delimiter=',')

Let's see what type is the resulting iterator object called reader:



In [46]:

    
type(reader)









    Out[46]:





_csv.reader

We can also ask for help on the object, to see its methods and attributes:



In [47]:

    
help(reader)









    



Help on reader object:

class reader(__builtin__.object)
 |  CSV reader
 |  
 |  Reader objects are responsible for reading and parsing tabular data
 |  in CSV format.
 |  
 |  Methods defined here:
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  next(...)
 |      x.next() -> the next value, or raise StopIteration
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  dialect
 |  
 |  line_num

Now, we can create a list out of the reader object, and it will iterate through the whole file to generate that list:



In [48]:

    
fullcsv = list(reader)

This is when we have finally loaded all the data into memory. Let's then look at the first two rows:



In [49]:

    
for row in range(2):
    print(fullcsv[row])









    



['DOEID', 'REGIONC', 'DIVISION', 'REPORTABLE_DOMAIN', 'TYPEHUQ', 'NWEIGHT', 'HDD65', 'CDD65', 'HDD30YR', 'CDD30YR', 'Climate_Region_Pub', 'AIA_Zone', 'METROMICRO', 'UR', 'KOWNRENT', 'CONDCOOP', 'YEARMADE', 'YEARMADERANGE', 'OCCUPYYRANGE', 'CONVERSION', 'ORIG1FAM', 'LOOKLIKE', 'NUMFLRS', 'NUMAPTS', 'WALLTYPE', 'ROOFTYPE', 'STUDIO', 'NAPTFLRS', 'STORIES', 'TYPEHUQ4', 'BEDROOMS', 'NCOMBATH', 'NHAFBATH', 'OTHROOMS', 'TOTROOMS', 'CELLAR', 'CRAWL', 'CONCRETE', 'BASEFIN', 'FINBASERMS', 'BASEHEAT', 'BASEHT2', 'PCTBSTHT', 'BASECOOL', 'BASECL2', 'PCTBSTCL', 'BASEUSE', 'ATTIC', 'ATTICFIN', 'FINATTRMS', 'ATTCHEAT', 'ATTCHT2', 'PCTATTHT', 'ATTCCOOL', 'ATTCCL2', 'PCTATTCL', 'ATTICUSE', 'PRKGPLC1', 'SIZEOFGARAGE', 'GARGLOC', 'GARGHEAT', 'GARGCOOL', 'PRKGPLC2', 'SIZEOFDETACH', 'OUTLET', 'ZKOWNRENT', 'ZCONDCOOP', 'ZYEARMADE', 'ZYEARMADERANGE', 'ZOCCUPYYRANGE', 'ZCONVERSION', 'ZORIG1FAM', 'ZLOOKLIKE', 'ZNUMFLRS', 'ZNUMAPTS', 'ZWALLTYPE', 'ZROOFTYPE', 'ZSTUDIO', 'ZNAPTFLRS', 'ZSTORIES', 'ZTYPEHUQ4', 'ZBEDROOMS', 'ZNCOMBATH', 'ZNHAFBATH', 'ZOTHROOMS', 'ZCELLAR', 'ZCRAWL', 'ZCONCRETE', 'ZBASEFIN', 'ZFINBASERMS', 'ZBASEHEAT', 'ZBASEHT2', 'ZPCTBSTHT', 'ZBASECOOL', 'ZBASECL2', 'ZPCTBSTCL', 'ZBASEUSE', 'ZATTIC', 'ZATTICFIN', 'ZFINATTRMS', 'ZATTCHEAT', 'ZATTCHT2', 'ZPCTATTHT', 'ZATTCCOOL', 'ZPCTATTCL', 'ZATTCCL2', 'ZATTICUSE', 'ZPRKGPLC1', 'ZSIZEOFGARAGE', 'ZGARGLOC', 'ZGARGHEAT', 'ZGARGCOOL', 'ZPRKGPLC2', 'ZSIZEOFDETACH', 'STOVEN', 'STOVENFUEL', 'STOVE', 'STOVEFUEL', 'OVEN', 'OVENFUEL', 'OVENUSE', 'OVENCLN', 'TYPECLN', 'MICRO', 'AMTMICRO', 'DEFROST', 'OUTGRILL', 'OUTGRILLFUEL', 'TOPGRILL', 'STGRILA', 'TOASTER', 'NUMMEAL', 'FUELFOOD', 'COFFEE', 'NUMFRIG', 'TYPERFR1', 'SIZRFRI1', 'REFRIGT1', 'ICE', 'AGERFRI1', 'ESFRIG', 'REPLCFRI', 'HELPFRI', 'HELPFRIY', 'TYPERFR2', 'SIZRFRI2', 'REFRIGT2', 'MONRFRI2', 'AGERFRI2', 'ESFRIG2', 'TYPERFR3', 'SIZRFRI3', 'REFRIGT3', 'MONRFRI3', 'AGERFRI3', 'ESFRIG3', 'SEPFREEZ', 'NUMFREEZ', 'UPRTFRZR', 'SIZFREEZ', 'FREEZER', 'AGEFRZR', 'REPLCFRZ', 'HELPFRZ', 'HELPFRZY', 'UPRTFRZR2', 'SIZFREEZ2', 'FREEZER2', 'AGEFRZR2', 'DISHWASH', 'DWASHUSE', 'AGEDW', 'ESDISHW', 'REPLCDW', 'HELPDW', 'HELPDWY', 'ZSTOVEN', 'ZSTOVENFUEL', 'ZSTOVE', 'ZSTOVEFUEL', 'ZOVEN', 'ZOVENFUEL', 'ZOVENUSE', 'ZOVENCLN', 'ZTYPECLN', 'ZMICRO', 'ZAMTMICRO', 'ZDEFROST', 'ZOUTGRILL', 'ZOUTGRILLFUEL', 'ZTOPGRILL', 'ZSTGRILA', 'ZTOASTER', 'ZNUMMEAL', 'ZFUELFOOD', 'ZCOFFEE', 'ZNUMFRIG', 'ZTYPERFR1', 'ZSIZRFRI1', 'ZREFRIGT1', 'ZICE', 'ZAGERFRI1', 'ZTYPERFR2', 'ZSIZRFRI2', 'ZREFRIGT2', 'ZMONRFRI2', 'ZAGERFRI2', 'ZTYPERFR3', 'ZSIZRFRI3', 'ZREFRIGT3', 'ZMONRFRI3', 'ZAGERFRI3', 'ZSEPFREEZ', 'ZNUMFREEZ', 'ZUPRTFRZR', 'ZSIZFREEZ', 'ZFREEZER', 'ZAGEFRZR', 'ZUPRTFRZR2', 'ZSIZFREEZ2', 'ZFREEZER2', 'ZAGEFRZR2', 'ZDISHWASH', 'ZDWASHUSE', 'ZAGEDW', 'CWASHER', 'TOPFRONT', 'WASHLOAD', 'WASHTEMP', 'RNSETEMP', 'AGECWASH', 'ESCWASH', 'REPLCCW', 'HELPCW', 'HELPCWY', 'DRYER', 'DRYRFUEL', 'DRYRUSE', 'AGECDRYER', 'TVCOLOR', 'TVSIZE1', 'TVTYPE1', 'CABLESAT1', 'COMBODVR1', 'DVR1', 'DIGITSTB1', 'PLAYSTA1', 'COMBOVCRDVD1', 'VCR1', 'DVD1', 'TVAUDIOSYS1', 'OTHERSTB1', 'TVONWD1', 'TVONWDWATCH1', 'TVONWE1', 'TVONWEWATCH1', 'TVSIZE2', 'TVTYPE2', 'CABLESAT2', 'COMBODVR2', 'DVR2', 'DIGITSTB2', 'PLAYSTA2', 'COMBOVCRDVD2', 'VCR2', 'DVD2', 'TVAUDIOSYS2', 'OTHERSTB2', 'TVONWD2', 'TVONWDWATCH2', 'TVONWE2', 'TVONWEWATCH2', 'TVSIZE3', 'TVTYPE3', 'CABLESAT3', 'COMBODVR3', 'DVR3', 'DIGITSTB3', 'PLAYSTA3', 'COMBOVCRDVD3', 'VCR3', 'DVD3', 'TVAUDIOSYS3', 'OTHERSTB3', 'TVONWD3', 'TVONWDWATCH3', 'TVONWE3', 'TVONWEWATCH3', 'COMPUTER', 'NUMPC', 'PCTYPE1', 'MONITOR1', 'TIMEON1', 'PCONOFF1', 'PCSLEEP1', 'PCTYPE2', 'MONITOR2', 'TIMEON2', 'PCONOFF2', 'PCSLEEP2', 'PCTYPE3', 'MONITOR3', 'TIMEON3', 'PCONOFF3', 'PCSLEEP3', 'INTERNET', 'INDIALUP', 'INDSL', 'INCABLE', 'INSATEL', 'INWIRELESS', 'PCPRINT', 'FAX', 'COPIER', 'WELLPUMP', 'DIPSTICK', 'SWAMPCOL', 'AQUARIUM', 'STEREO', 'NOCORD', 'ANSMACH', 'BATTOOLS', 'BATCHRG', 'CHRGPLGT', 'ELECDEV', 'ELECCHRG', 'CHRGPLGE', 'ZCWASHER', 'ZTOPFRONT', 'ZWASHLOAD', 'ZWASHTEMP', 'ZRNSETEMP', 'ZAGECWASH', 'ZDRYER', 'ZDRYRFUEL', 'ZDRYRUSE', 'ZAGECDRYER', 'ZTVCOLOR', 'ZTVSIZE1', 'ZTVTYPE1', 'ZCABLESAT1', 'ZCOMBODVR1', 'ZDVR1', 'ZDIGITSTB1', 'ZPLAYSTA1', 'ZCOMBOVCRDVD1', 'ZVCR1', 'ZDVD1', 'ZTVAUDIOSYS1', 'ZOTHERSTB1', 'ZTVONWD1', 'ZTVONWDWATCH1', 'ZTVONWE1', 'ZTVONWEWATCH1', 'ZTVSIZE2', 'ZTVTYPE2', 'ZCABLESAT2', 'ZCOMBODVR2', 'ZDVR2', 'ZDIGITSTB2', 'ZPLAYSTA2', 'ZCOMBOVCRDVD2', 'ZVCR2', 'ZDVD2', 'ZTVAUDIOSYS2', 'ZOTHERSTB2', 'ZTVONWD2', 'ZTVONWDWATCH2', 'ZTVONWE2', 'ZTVONWEWATCH2', 'ZTVSIZE3', 'ZTVTYPE3', 'ZCABLESAT3', 'ZCOMBODVR3', 'ZDVR3', 'ZDIGITSTB3', 'ZPLAYSTA3', 'ZCOMBOVCRDVD3', 'ZVCR3', 'ZDVD3', 'ZTVAUDIOSYS3', 'ZOTHERSTB3', 'ZTVONWD3', 'ZTVONWDWATCH3', 'ZTVONWE3', 'ZTVONWEWATCH3', 'ZCOMPUTER', 'ZNUMPC', 'ZPCTYPE1', 'ZMONITOR1', 'ZTIMEON1', 'ZPCONOFF1', 'ZPCSLEEP1', 'ZPCTYPE2', 'ZMONITOR2', 'ZTIMEON2', 'ZPCONOFF2', 'ZPCSLEEP2', 'ZPCTYPE3', 'ZMONITOR3', 'ZTIMEON3', 'ZPCONOFF3', 'ZPCSLEEP3', 'ZINTERNET', 'ZINDIALUP', 'ZINDSL', 'ZINCABLE', 'ZINSATEL', 'ZINWIRELESS', 'ZPCPRINT', 'ZFAX', 'ZCOPIER', 'ZWELLPUMP', 'ZDIPSTICK', 'ZSWAMPCOL', 'ZAQUARIUM', 'ZSTEREO', 'ZNOCORD', 'ZANSMACH', 'ZBATTOOLS', 'ZBATCHRG', 'ZCHRGPLGT', 'ZELECDEV', 'ZELECCHRG', 'ZCHRGPLGE', 'HEATHOME', 'DNTHEAT', 'EQUIPNOHEAT', 'FUELNOHEAT', 'EQUIPM', 'FUELHEAT', 'MAINTHT', 'EQUIPAGE', 'REPLCHT', 'HELPHT', 'HELPHTY', 'HEATOTH', 'EQUIPAUX', 'REVERSE', 'WARMAIR', 'FURNFUEL', 'STEAMR', 'RADFUEL', 'PERMELEC', 'PIPELESS', 'PIPEFUEL', 'ROOMHEAT', 'RMHTFUEL', 'WOODKILN', 'HSFUEL', 'CARRYEL', 'CARRYKER', 'CHIMNEY', 'FPFUEL', 'NGFPFLUE', 'USENGFP', 'RANGE', 'RNGFUEL', 'DIFEQUIP', 'DIFFUEL', 'EQMAMT', 'HEATROOM', 'THERMAIN', 'NUMTHERM', 'PROTHERM', 'AUTOHEATNITE', 'AUTOHEATDAY', 'TEMPHOME', 'TEMPGONE', 'TEMPNITE', 'MOISTURE', 'USEMOISTURE', 'ZHEATHOME', 'ZDNTHEAT', 'ZEQUIPNOHEAT', 'ZFUELNOHEAT', 'ZEQUIPM', 'ZFUELHEAT', 'ZMAINTHT', 'ZEQUIPAGE', 'ZHEATOTH', 'ZFURNFUEL', 'ZRADFUEL', 'ZPIPEFUEL', 'ZRMHTFUEL', 'ZHSFUEL', 'ZFPFUEL', 'ZNGFPFLUE', 'ZUSENGFP', 'ZRNGFUEL', 'ZDIFFUEL', 'ZEQMAMT', 'ZHEATROOM', 'ZTHERMAIN', 'ZNUMTHERM', 'ZPROTHERM', 'ZAUTOHEATNITE', 'ZAUTOHEATDAY', 'ZTEMPHOME', 'ZTEMPGONE', 'ZTEMPNITE', 'ZMOISTURE', 'ZUSEMOISTURE', 'NUMH2ONOTNK', 'NUMH2OHTRS', 'H2OTYPE1', 'FUELH2O', 'WHEATOTH', 'WHEATSIZ', 'WHEATAGE', 'WHEATBKT', 'HELPWH', 'HELPWHY', 'H2OTYPE2', 'FUELH2O2', 'WHEATSIZ2', 'WHEATAGE2', 'ZNUMH2OHTRS', 'ZNUMH2ONOTNK', 'ZH2OTYPE1', 'ZFUELH2O', 'ZWHEATOTH', 'ZWHEATSIZ', 'ZWHEATAGE', 'ZWHEATBKT', 'ZH2OTYPE2', 'ZFUELH2O2', 'ZWHEATSIZ2', 'ZWHEATAGE2', 'AIRCOND', 'DNTAC', 'COOLTYPENOAC', 'COOLTYPE', 'DUCTS', 'CENACHP', 'ACOTHERS', 'MAINTAC', 'AGECENAC', 'REPLCCAC', 'HELPCAC', 'HELPCACY', 'ACROOMS', 'USECENAC', 'THERMAINAC', 'PROTHERMAC', 'AUTOCOOLNITE', 'AUTOCOOLDAY', 'TEMPHOMEAC', 'TEMPGONEAC', 'TEMPNITEAC', 'NUMBERAC', 'WWACAGE', 'ESWWAC', 'REPLCWWAC', 'HELPWWAC', 'HELPWWACY', 'USEWWAC', 'NUMCFAN', 'USECFAN', 'TREESHAD', 'NOTMOIST', 'USENOTMOIST', 'ZAIRCOND', 'ZDNTAC', 'ZCOOLTYPENOAC', 'ZCOOLTYPE', 'ZDUCTS', 'ZCENACHP', 'ZACOTHERS', 'ZMAINTAC', 'ZAGECENAC', 'ZUSECENAC', 'ZACROOMS', 'ZTHERMAINAC', 'ZPROTHERMAC', 'ZAUTOCOOLNITE', 'ZAUTOCOOLDAY', 'ZTEMPHOMEAC', 'ZTEMPGONEAC', 'ZTEMPNITEAC', 'ZNUMBERAC', 'ZWWACAGE', 'ZUSEWWAC', 'ZNUMCFAN', 'ZUSECFAN', 'ZTREESHAD', 'ZNOTMOIST', 'ZUSENOTMOIST', 'HIGHCEIL', 'CATHCEIL', 'SWIMPOOL', 'POOL', 'FUELPOOL', 'RECBATH', 'FUELTUB', 'LGT12', 'LGT12EE', 'LGT4', 'LGT4EE', 'LGT1', 'LGT1EE', 'NOUTLGTNT', 'LGTOEE', 'NGASLIGHT', 'INSTLCFL', 'HELPCFL', 'HELPCFLY', 'SLDDRS', 'DOOR1SUM', 'WINDOWS', 'TYPEGLASS', 'NEWGLASS', 'HELPWIN', 'HELPWINY', 'ADQINSUL', 'INSTLINS', 'AGEINS', 'HELPINS', 'HELPINSY', 'DRAFTY', 'INSTLWS', 'AGEWS', 'HELPWS', 'HELPWSY', 'AUDIT', 'AGEAUD', 'HELPAUD', 'HELPAUDY', 'ZHIGHCEIL', 'ZCATHCEIL', 'ZSWIMPOOL', 'ZPOOL', 'ZFUELPOOL', 'ZRECBATH', 'ZFUELTUB', 'ZLGT12', 'ZLGT4', 'ZLGT1', 'ZNOUTLGTNT', 'ZNGASLIGHT', 'ZSLDDRS', 'ZDOOR1SUM', 'ZWINDOWS', 'ZTYPEGLASS', 'ZNEWGLASS', 'ZADQINSUL', 'ZINSTLINS', 'ZAGEINS', 'ZDRAFTY', 'ZINSTLWS', 'ZAGEWS', 'ZAUDIT', 'ZAGEAUD', 'USEEL', 'USENG', 'USELP', 'USEFO', 'USEKERO', 'USEWOOD', 'USESOLAR', 'USEOTH', 'ELWARM', 'ELECAUX', 'ELCOOL', 'ELWATER', 'ELFOOD', 'ELOTHER', 'UGWARM', 'UGASAUX', 'UGWATER', 'UGCOOK', 'UGOTH', 'LPWARM', 'LPGAUX', 'LPWATER', 'LPCOOK', 'LPOTHER', 'FOWARM', 'FOILAUX', 'FOWATER', 'FOOTHER', 'KRWARM', 'KEROAUX', 'KRWATER', 'KROTHER', 'WDWARM', 'WOODAUX', 'WDWATER', 'WDOTHUSE', 'SOLWARM', 'SOLARAUX', 'SOLWATER', 'SOLOTHER', 'OTHWARM', 'OTHERAUX', 'OTHWATER', 'OTHCOOK', 'ONSITE', 'ONSITEGRID', 'PELHEAT', 'PELHOTWA', 'PELCOOK', 'PELAC', 'PELLIGHT', 'OTHERWAYEL', 'PGASHEAT', 'PGASHTWA', 'PUGCOOK', 'PUGOTH', 'OTHERWAYNG', 'FOPAY', 'OTHERWAYFO', 'LPGPAY', 'OTHERWAYLPG', 'LPGDELV', 'KERODEL', 'KEROCASH', 'NOCRCASH', 'NKRGALNC', 'WOODLOGS', 'WDSCRAP', 'WDPELLET', 'WDOTHER', 'WOODAMT', 'NUMCORDS', 'ZONSITE', 'ZONSITEGRID', 'ZPELHEAT', 'ZPELHOTWA', 'ZPELCOOK', 'ZPELAC', 'ZPELLIGHT', 'ZOTHERWAYEL', 'ZPGASHEAT', 'ZPGASHTWA', 'ZPUGCOOK', 'ZPUGOTH', 'ZOTHERWAYNG', 'ZFOPAY', 'ZOTHERWAYFO', 'ZLPGPAY', 'ZOTHERWAYLPG', 'ZKERODEL', 'ZKEROCASH', 'ZNOCRCASH', 'ZNKRGALNC', 'ZWOODLOGS', 'ZWDSCRAP', 'ZWDPELLET', 'ZWDOTHER', 'ZWOODAMT', 'ZNUMCORDS', 'KFUELOT', 'HHSEX', 'EMPLOYHH', 'SPOUSE', 'SDESCENT', 'Householder_Race', 'EDUCATION', 'NHSLDMEM', 'HHAGE', 'AGEHHMEMCAT2', 'AGEHHMEMCAT3', 'AGEHHMEMCAT4', 'AGEHHMEMCAT5', 'AGEHHMEMCAT6', 'AGEHHMEMCAT7', 'AGEHHMEMCAT8', 'AGEHHMEMCAT9', 'AGEHHMEMCAT10', 'AGEHHMEMCAT11', 'AGEHHMEMCAT12', 'AGEHHMEMCAT13', 'AGEHHMEMCAT14', 'HBUSNESS', 'ATHOME', 'TELLWORK', 'TELLDAYS', 'OTHWORK', 'WORKPAY', 'RETIREPY', 'SSINCOME', 'CASHBEN', 'INVESTMT', 'RGLRPAY', 'MONEYPY', 'POVERTY100', 'POVERTY150', 'HUPROJ', 'RENTHELP', 'FOODASST', 'ZHHSEX', 'ZHHAGE', 'ZEMPLOYHH', 'ZSPOUSE', 'ZSDESCENT', 'ZHouseholder_Race', 'ZEDUCATION', 'ZNHSLDMEM', 'ZAGEHHMEMCAT2', 'ZAGEHHMEMCAT3', 'ZAGEHHMEMCAT4', 'ZAGEHHMEMCAT5', 'ZAGEHHMEMCAT6', 'ZAGEHHMEMCAT7', 'ZAGEHHMEMCAT8', 'ZAGEHHMEMCAT9', 'ZAGEHHMEMCAT10', 'ZAGEHHMEMCAT11', 'ZAGEHHMEMCAT12', 'ZAGEHHMEMCAT13', 'ZAGEHHMEMCAT14', 'ZHBUSNESS', 'ZATHOME', 'ZTELLWORK', 'ZTELLDAYS', 'ZOTHWORK', 'ZWORKPAY', 'ZRETIREPY', 'ZSSINCOME', 'ZCASHBEN', 'ZINVESTMT', 'ZRGLRPAY', 'ZMONEYPY', 'ZHUPROJ', 'ZRENTHELP', 'ZFOODASST', 'TOTSQFT', 'TOTSQFT_EN', 'TOTHSQFT', 'TOTUSQFT', 'TOTCSQFT', 'TOTUCSQFT', 'ZTOTSQFT', 'ZTOTSQFT_EN', 'ZTOTHSQFT', 'ZTOTUSQFT', 'ZTOTCSQFT', 'ZTOTUCSQFT', 'KWH', 'KWHSPH', 'KWHCOL', 'KWHWTH', 'KWHRFG', 'KWHOTH', 'BTUEL', 'BTUELSPH', 'BTUELCOL', 'BTUELWTH', 'BTUELRFG', 'BTUELOTH', 'DOLLAREL', 'DOLELSPH', 'DOLELCOL', 'DOLELWTH', 'DOLELRFG', 'DOLELOTH', 'CUFEETNG', 'CUFEETNGSPH', 'CUFEETNGWTH', 'CUFEETNGOTH', 'BTUNG', 'BTUNGSPH', 'BTUNGWTH', 'BTUNGOTH', 'DOLLARNG', 'DOLNGSPH', 'DOLNGWTH', 'DOLNGOTH', 'GALLONLP', 'GALLONLPSPH', 'GALLONLPWTH', 'GALLONLPOTH', 'BTULP', 'BTULPSPH', 'BTULPWTH', 'BTULPOTH', 'DOLLARLP', 'DOLLPSPH', 'DOLLPWTH', 'DOLLPOTH', 'GALLONFO', 'GALLONFOSPH', 'GALLONFOWTH', 'GALLONFOOTH', 'BTUFO', 'BTUFOSPH', 'BTUFOWTH', 'BTUFOOTH', 'DOLLARFO', 'DOLFOSPH', 'DOLFOWTH', 'DOLFOOTH', 'GALLONKER', 'GALLONKERSPH', 'GALLONKERWTH', 'GALLONKEROTH', 'BTUKER', 'BTUKERSPH', 'BTUKERWTH', 'BTUKEROTH', 'DOLLARKER', 'DOLKERSPH', 'DOLKERWTH', 'DOLKEROTH', 'BTUWOOD', 'CORDSWD', 'TOTALBTU', 'TOTALBTUSPH', 'TOTALBTUCOL', 'TOTALBTUWTH', 'TOTALBTURFG', 'TOTALBTUOTH', 'TOTALDOL', 'TOTALDOLSPH', 'TOTALDOLCOL', 'TOTALDOLWTH', 'TOTALDOLRFG', 'TOTALDOLOTH', 'KAVALEL', 'PERIODEL', 'SCALEEL', 'KAVALNG', 'PERIODNG', 'SCALENG', 'PERIODLP', 'SCALELP', 'PERIODFO', 'SCALEFO', 'PERIODKR', 'SCALEKER']
['1', '2', '4', '12', '2', '2471.679705', '4742', '1080', '4953', '1271', '4', '3', 'METRO', 'U', '1', '-2', '2004', '7', '8', '-2', '-2', '-2', '-2', '-2', '1', '5', '-2', '-2', '20', '-2', '4', '1', '2', '5', '9', '1', '0', '0', '1', '1', '1', '2', '3', '1', '2', '3', '1', '0', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '1', '2', '2', '0', '0', '-2', '-2', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '5', '0', '-2', '0', '-2', '3', '1', '1', '1', '3', '1', '1', '2', '0', '-2', '0', '4', '5', '1', '2', '21', '4', '2', '1', '3', '1', '-2', '-2', '-2', '21', '4', '2', '12', '3', '1', '-2', '-2', '-2', '-2', '-2', '-2', '0', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '1', '13', '3', '1', '-2', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '2', '2', '3', '3', '2', '1', '1', '0', '-2', '1', '5', '1', '1', '4', '3', '2', '1', '1', '-2', '0', '1', '0', '0', '1', '1', '0', '3', '4', '3', '4', '3', '4', '1', '1', '-2', '0', '0', '0', '0', '0', '0', '0', '2', '-2', '2', '-2', '3', '3', '0', '-2', '0', '0', '0', '0', '0', '1', '0', '0', '3', '-2', '2', '-2', '1', '2', '1', '1', '3', '0', '1', '1', '1', '3', '0', '1', '-2', '-2', '-2', '-2', '-2', '1', '0', '1', '0', '0', '1', '1', '0', '0', '0', '-2', '-2', '0', '0', '1', '1', '1', '2', '0', '1', '2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '-2', '-2', '-2', '3', '5', '0', '3', '-2', '-2', '-2', '0', '0', '0', '0', '-2', '0', '-2', '0', '0', '-2', '0', '-2', '0', '-2', '0', '0', '0', '-2', '-2', '-2', '0', '-2', '0', '-2', '-2', '9', '1', '2', '1', '1', '1', '68', '66', '68', '0', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '5', '0', '3', '3', '0', '-2', '-2', '-2', '-2', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '-2', '-2', '1', '-2', '0', '0', '1', '3', '-2', '0', '-2', '9', '3', '1', '1', '1', '1', '74', '78', '73', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '3', '3', '0', '0', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '-2', '0', '-2', '-2', '0', '-2', '3', '3', '2', '2', '2', '2', '0', '-2', '-2', '1', '0', '-2', '1', '1', '41', '2', '3', '-2', '-2', '1', '1', '2', '0', '-2', '4', '0', '-2', '-2', '-2', '0', '-2', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '1', '0', '0', '0', '0', '0', '1', '0', '1', '1', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '-2', '1', '1', '1', '1', '1', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '0', '1', '5', '4', '35', '8', '1', '1', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '0', '0', '0', '-2', '0', '1', '0', '0', '0', '0', '0', '23', '0', '0', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '5075', '4675', '3958', '1118', '3958', '1118', '0', '0', '0', '0', '0', '0', '18466', '3186.707', '3068.795', '2968.45', '1515.504', '7726.545', '63006', '10873.045', '10470.729', '10128.354', '5170.899', '26362.973', '1315', '226.932', '218.535', '211.389', '107.922', '550.222', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '63006', '10873', '10471', '10128', '5171', '26363', '1315', '227', '219', '211', '108', '550', '1', '1', '0', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2']

This is a very simple way to load files, but not necessarily the most convenient one. Another more versatile funciton is the numpy function called genfromtxt:



In [50]:

    
help(np.genfromtxt)









    



Help on function genfromtxt in module numpy.lib.npyio:

genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, skiprows=0, skip_header=0, skip_footer=0, converters=None, missing='', missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True)
    Load data from a text file, with missing values handled as specified.
    
    Each line past the first `skip_header` lines is split at the `delimiter`
    character, and characters following the `comments` character are discarded.
    
    Parameters
    ----------
    fname : file or str
        File, filename, or generator to read.  If the filename extension is
        `.gz` or `.bz2`, the file is first decompressed. Note that
        generators must return byte strings in Python 3k.
    dtype : dtype, optional
        Data type of the resulting array.
        If None, the dtypes will be determined by the contents of each
        column, individually.
    comments : str, optional
        The character used to indicate the start of a comment.
        All the characters occurring on a line after a comment are discarded
    delimiter : str, int, or sequence, optional
        The string used to separate values.  By default, any consecutive
        whitespaces act as delimiter.  An integer or sequence of integers
        can also be provided as width(s) of each field.
    skip_rows : int, optional
        `skip_rows` was deprecated in numpy 1.5, and will be removed in
        numpy 2.0. Please use `skip_header` instead.
    skip_header : int, optional
        The number of lines to skip at the beginning of the file.
    skip_footer : int, optional
        The number of lines to skip at the end of the file.
    converters : variable, optional
        The set of functions that convert the data of a column to a value.
        The converters can also be used to provide a default value
        for missing data: ``converters = {3: lambda s: float(s or 0)}``.
    missing : variable, optional
        `missing` was deprecated in numpy 1.5, and will be removed in
        numpy 2.0. Please use `missing_values` instead.
    missing_values : variable, optional
        The set of strings corresponding to missing data.
    filling_values : variable, optional
        The set of values to be used as default when the data are missing.
    usecols : sequence, optional
        Which columns to read, with 0 being the first.  For example,
        ``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.
    names : {None, True, str, sequence}, optional
        If `names` is True, the field names are read from the first valid line
        after the first `skip_header` lines.
        If `names` is a sequence or a single-string of comma-separated names,
        the names will be used to define the field names in a structured dtype.
        If `names` is None, the names of the dtype fields will be used, if any.
    excludelist : sequence, optional
        A list of names to exclude. This list is appended to the default list
        ['return','file','print']. Excluded names are appended an underscore:
        for example, `file` would become `file_`.
    deletechars : str, optional
        A string combining invalid characters that must be deleted from the
        names.
    defaultfmt : str, optional
        A format used to define default field names, such as "f%i" or "f_%02i".
    autostrip : bool, optional
        Whether to automatically strip white spaces from the variables.
    replace_space : char, optional
        Character(s) used in replacement of white spaces in the variables
        names. By default, use a '_'.
    case_sensitive : {True, False, 'upper', 'lower'}, optional
        If True, field names are case sensitive.
        If False or 'upper', field names are converted to upper case.
        If 'lower', field names are converted to lower case.
    unpack : bool, optional
        If True, the returned array is transposed, so that arguments may be
        unpacked using ``x, y, z = loadtxt(...)``
    usemask : bool, optional
        If True, return a masked array.
        If False, return a regular array.
    loose : bool, optional
        If True, do not raise errors for invalid values.
    invalid_raise : bool, optional
        If True, an exception is raised if an inconsistency is detected in the
        number of columns.
        If False, a warning is emitted and the offending lines are skipped.
    
    Returns
    -------
    out : ndarray
        Data read from the text file. If `usemask` is True, this is a
        masked array.
    
    See Also
    --------
    numpy.loadtxt : equivalent function when no data is missing.
    
    Notes
    -----
    * When spaces are used as delimiters, or when no delimiter has been given
      as input, there should not be any missing data between two fields.
    * When the variables are named (either by a flexible dtype or with `names`,
      there must not be any header in the file (else a ValueError
      exception is raised).
    * Individual values are not stripped of spaces by default.
      When using a custom converter, make sure the function does remove spaces.
    
    References
    ----------
    .. [1] Numpy User Guide, section `I/O with Numpy
           <http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html>`_.
    
    Examples
    ---------
    >>> from StringIO import StringIO
    >>> import numpy as np
    
    Comma delimited file with mixed dtype
    
    >>> s = StringIO("1,1.3,abcde")
    >>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
    ... ('mystring','S5')], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    Using dtype = None
    
    >>> s.seek(0) # needed for StringIO example only
    >>> data = np.genfromtxt(s, dtype=None,
    ... names = ['myint','myfloat','mystring'], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    Specifying dtype and names
    
    >>> s.seek(0)
    >>> data = np.genfromtxt(s, dtype="i8,f8,S5",
    ... names=['myint','myfloat','mystring'], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    An example with fixed-width columns
    
    >>> s = StringIO("11.3abcde")
    >>> data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
    ...     delimiter=[1,3,5])
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('strvar', '|S5')])

Now we can use that function to load the data much more easily. Here's an example that only scratches the surface of what's possible with this function:



In [51]:

    
data = np.genfromtxt('recs2009_public.csv',delimiter=',',skiprows=1,usecols=(841,842,843))

Let's look at the first column of that dataset:



In [54]:

    
print data[:,0]









    



[ 3068.795   181.998   184.459 ...,   847.734   135.687     0.   ]

We can also plot it against the row number:



In [59]:

    
plt.plot(data[:,0], 'rd')









    Out[59]:





[<matplotlib.lines.Line2D at 0x10a8e52d0>]

The look and feel may not be very good, but thankfully we can customize that. For much more detailed information on how to do that, please see this link.

For now, let me just show you how to do it using the rcParams method we imported earlier from the matplotlib library:



In [ ]:

    
help(plt.rcParams)



In [62]:

    
rcParams['font.size'] = 20
rcParams['lines.linewidth'] = 3
rcParams['figure.figsize'] = (10, 6)
plt.plot(data[:,0], 'o', markersize=20, alpha=0.25)









    Out[62]:





[<matplotlib.lines.Line2D at 0x10b2fab50>]



In [67]:

    
plt.scatter(data[:,1], data[:,2])









    Out[67]:





<matplotlib.collections.PathCollection at 0x10bbd2350>



In [68]:

    
!ls -l









    



total 275648
-rw-r--r--+ 1 mberges  staff  48864668 Oct 27 11:02 ac.csv
-rw-r--r--+ 1 mberges  staff  64609139 Oct 27 11:02 campusDemand.csv
-rw-r--r--+ 1 mberges  staff       326 Oct 27 11:02 fc_14.txt
-rw-r--r--+ 1 mberges  staff       326 Oct 27 11:02 fc_28.txt
-rw-r--r--+ 1 mberges  staff       341 Oct 27 11:02 fc_7.txt
-rw-r--r--+ 1 mberges  staff    108574 Oct 27 11:02 fridge.csv
-rw-r--r--@ 1 mberges  staff     59883 Oct 27 11:02 public_layout.csv
-rw-r--r--@ 1 mberges  staff  27460827 Oct 27 11:02 recs2009_public.csv
-rw-r--r--@ 1 mberges  staff      1032 Oct 27 11:02 source.txt
-rw-r--r--+ 1 mberges  staff        63 Oct 27 11:02 temp.txt



In [69]:

    
!head public_layout.csv



In [77]:

    
names = np.genfromtxt('public_layout.csv', delimiter=',', skip_header=1, usecols=(1), dtype='string')



In [80]:

    
names[840]









    Out[80]:





'"Electricity usage for space heating'



In [74]:

    
help(np.genfromtxt)









    



Help on function genfromtxt in module numpy.lib.npyio:

genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, skiprows=0, skip_header=0, skip_footer=0, converters=None, missing='', missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True)
    Load data from a text file, with missing values handled as specified.
    
    Each line past the first `skip_header` lines is split at the `delimiter`
    character, and characters following the `comments` character are discarded.
    
    Parameters
    ----------
    fname : file or str
        File, filename, or generator to read.  If the filename extension is
        `.gz` or `.bz2`, the file is first decompressed. Note that
        generators must return byte strings in Python 3k.
    dtype : dtype, optional
        Data type of the resulting array.
        If None, the dtypes will be determined by the contents of each
        column, individually.
    comments : str, optional
        The character used to indicate the start of a comment.
        All the characters occurring on a line after a comment are discarded
    delimiter : str, int, or sequence, optional
        The string used to separate values.  By default, any consecutive
        whitespaces act as delimiter.  An integer or sequence of integers
        can also be provided as width(s) of each field.
    skip_rows : int, optional
        `skip_rows` was deprecated in numpy 1.5, and will be removed in
        numpy 2.0. Please use `skip_header` instead.
    skip_header : int, optional
        The number of lines to skip at the beginning of the file.
    skip_footer : int, optional
        The number of lines to skip at the end of the file.
    converters : variable, optional
        The set of functions that convert the data of a column to a value.
        The converters can also be used to provide a default value
        for missing data: ``converters = {3: lambda s: float(s or 0)}``.
    missing : variable, optional
        `missing` was deprecated in numpy 1.5, and will be removed in
        numpy 2.0. Please use `missing_values` instead.
    missing_values : variable, optional
        The set of strings corresponding to missing data.
    filling_values : variable, optional
        The set of values to be used as default when the data are missing.
    usecols : sequence, optional
        Which columns to read, with 0 being the first.  For example,
        ``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.
    names : {None, True, str, sequence}, optional
        If `names` is True, the field names are read from the first valid line
        after the first `skip_header` lines.
        If `names` is a sequence or a single-string of comma-separated names,
        the names will be used to define the field names in a structured dtype.
        If `names` is None, the names of the dtype fields will be used, if any.
    excludelist : sequence, optional
        A list of names to exclude. This list is appended to the default list
        ['return','file','print']. Excluded names are appended an underscore:
        for example, `file` would become `file_`.
    deletechars : str, optional
        A string combining invalid characters that must be deleted from the
        names.
    defaultfmt : str, optional
        A format used to define default field names, such as "f%i" or "f_%02i".
    autostrip : bool, optional
        Whether to automatically strip white spaces from the variables.
    replace_space : char, optional
        Character(s) used in replacement of white spaces in the variables
        names. By default, use a '_'.
    case_sensitive : {True, False, 'upper', 'lower'}, optional
        If True, field names are case sensitive.
        If False or 'upper', field names are converted to upper case.
        If 'lower', field names are converted to lower case.
    unpack : bool, optional
        If True, the returned array is transposed, so that arguments may be
        unpacked using ``x, y, z = loadtxt(...)``
    usemask : bool, optional
        If True, return a masked array.
        If False, return a regular array.
    loose : bool, optional
        If True, do not raise errors for invalid values.
    invalid_raise : bool, optional
        If True, an exception is raised if an inconsistency is detected in the
        number of columns.
        If False, a warning is emitted and the offending lines are skipped.
    
    Returns
    -------
    out : ndarray
        Data read from the text file. If `usemask` is True, this is a
        masked array.
    
    See Also
    --------
    numpy.loadtxt : equivalent function when no data is missing.
    
    Notes
    -----
    * When spaces are used as delimiters, or when no delimiter has been given
      as input, there should not be any missing data between two fields.
    * When the variables are named (either by a flexible dtype or with `names`,
      there must not be any header in the file (else a ValueError
      exception is raised).
    * Individual values are not stripped of spaces by default.
      When using a custom converter, make sure the function does remove spaces.
    
    References
    ----------
    .. [1] Numpy User Guide, section `I/O with Numpy
           <http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html>`_.
    
    Examples
    ---------
    >>> from StringIO import StringIO
    >>> import numpy as np
    
    Comma delimited file with mixed dtype
    
    >>> s = StringIO("1,1.3,abcde")
    >>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
    ... ('mystring','S5')], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    Using dtype = None
    
    >>> s.seek(0) # needed for StringIO example only
    >>> data = np.genfromtxt(s, dtype=None,
    ... names = ['myint','myfloat','mystring'], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    Specifying dtype and names
    
    >>> s.seek(0)
    >>> data = np.genfromtxt(s, dtype="i8,f8,S5",
    ... names=['myint','myfloat','mystring'], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    An example with fixed-width columns
    
    >>> s = StringIO("11.3abcde")
    >>> data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
    ...     delimiter=[1,3,5])
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('strvar', '|S5')])



In [83]:

    
names[843]









    Out[83]:





'"Electricity usage for refrigerators'



In [90]:

    
plt.hist(data[:,0],bins=100,range=(0,20000))









    Out[90]:





(array([  3.44300000e+03,   1.20900000e+03,   8.41000000e+02,
          6.95000000e+02,   5.80000000e+02,   5.26000000e+02,
          4.51000000e+02,   3.54000000e+02,   3.10000000e+02,
          2.85000000e+02,   2.69000000e+02,   2.24000000e+02,
          2.25000000e+02,   2.00000000e+02,   1.76000000e+02,
          1.57000000e+02,   1.81000000e+02,   1.46000000e+02,
          1.40000000e+02,   1.33000000e+02,   1.04000000e+02,
          1.16000000e+02,   1.05000000e+02,   9.90000000e+01,
          9.60000000e+01,   8.00000000e+01,   7.00000000e+01,
          6.70000000e+01,   6.50000000e+01,   6.10000000e+01,
          4.50000000e+01,   5.30000000e+01,   3.60000000e+01,
          3.50000000e+01,   4.20000000e+01,   2.80000000e+01,
          3.00000000e+01,   2.20000000e+01,   2.70000000e+01,
          2.00000000e+01,   3.20000000e+01,   2.10000000e+01,
          2.40000000e+01,   1.70000000e+01,   1.40000000e+01,
          9.00000000e+00,   8.00000000e+00,   1.60000000e+01,
          1.80000000e+01,   1.40000000e+01,   1.30000000e+01,
          8.00000000e+00,   6.00000000e+00,   7.00000000e+00,
          1.00000000e+01,   3.00000000e+00,   9.00000000e+00,
          6.00000000e+00,   2.00000000e+00,   2.00000000e+00,
          5.00000000e+00,   2.00000000e+00,   5.00000000e+00,
          5.00000000e+00,   6.00000000e+00,   2.00000000e+00,
          4.00000000e+00,   4.00000000e+00,   4.00000000e+00,
          3.00000000e+00,   1.00000000e+00,   2.00000000e+00,
          2.00000000e+00,   5.00000000e+00,   1.00000000e+00,
          5.00000000e+00,   6.00000000e+00,   3.00000000e+00,
          1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   2.00000000e+00,   3.00000000e+00,
          1.00000000e+00,   2.00000000e+00,   1.00000000e+00,
          1.00000000e+00,   0.00000000e+00,   1.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   1.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   1.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.00000000e+00]),
 array([     0.,    200.,    400.,    600.,    800.,   1000.,   1200.,
          1400.,   1600.,   1800.,   2000.,   2200.,   2400.,   2600.,
          2800.,   3000.,   3200.,   3400.,   3600.,   3800.,   4000.,
          4200.,   4400.,   4600.,   4800.,   5000.,   5200.,   5400.,
          5600.,   5800.,   6000.,   6200.,   6400.,   6600.,   6800.,
          7000.,   7200.,   7400.,   7600.,   7800.,   8000.,   8200.,
          8400.,   8600.,   8800.,   9000.,   9200.,   9400.,   9600.,
          9800.,  10000.,  10200.,  10400.,  10600.,  10800.,  11000.,
         11200.,  11400.,  11600.,  11800.,  12000.,  12200.,  12400.,
         12600.,  12800.,  13000.,  13200.,  13400.,  13600.,  13800.,
         14000.,  14200.,  14400.,  14600.,  14800.,  15000.,  15200.,
         15400.,  15600.,  15800.,  16000.,  16200.,  16400.,  16600.,
         16800.,  17000.,  17200.,  17400.,  17600.,  17800.,  18000.,
         18200.,  18400.,  18600.,  18800.,  19000.,  19200.,  19400.,
         19600.,  19800.,  20000.]),
 <a list of 100 Patch objects>)



In [91]:

    
help(plt.boxplot)









    



Help on function boxplot in module matplotlib.pyplot:

boxplot(x, notch=False, sym='b+', vert=True, whis=1.5, positions=None, widths=None, patch_artist=False, bootstrap=None, usermedians=None, conf_intervals=None, hold=None)
    Make a box and whisker plot.
    
    Call signature::
    
      boxplot(x, notch=False, sym='+', vert=True, whis=1.5,
              positions=None, widths=None, patch_artist=False,
              bootstrap=None, usermedians=None, conf_intervals=None)
    
    Make a box and whisker plot for each column of *x* or each
    vector in sequence *x*.  The box extends from the lower to
    upper quartile values of the data, with a line at the median.
    The whiskers extend from the box to show the range of the
    data.  Flier points are those past the end of the whiskers.
    
    Function Arguments:
    
      *x* :
        Array or a sequence of vectors.
    
      *notch* : [ False (default) | True ]
        If False (default), produces a rectangular box plot.
        If True, will produce a notched box plot
    
      *sym* : [ default 'b+' ]
        The default symbol for flier points.
        Enter an empty string ('') if you don't want to show fliers.
    
      *vert* : [ False | True (default) ]
        If True (default), makes the boxes vertical.
        If False, makes horizontal boxes.
    
      *whis* : [ default 1.5 ]
        Defines the length of the whiskers as a function of the inner
        quartile range.  They extend to the most extreme data point
        within ( ``whis*(75%-25%)`` ) data range.
    
      *bootstrap* : [ *None* (default) | integer ]
        Specifies whether to bootstrap the confidence intervals
        around the median for notched boxplots. If bootstrap==None,
        no bootstrapping is performed, and notches are calculated
        using a Gaussian-based asymptotic approximation  (see McGill, R.,
        Tukey, J.W., and Larsen, W.A., 1978, and Kendall and Stuart,
        1967). Otherwise, bootstrap specifies the number of times to
        bootstrap the median to determine it's 95% confidence intervals.
        Values between 1000 and 10000 are recommended.
    
      *usermedians* : [ default None ]
        An array or sequence whose first dimension (or length) is
        compatible with *x*. This overrides the medians computed by
        matplotlib for each element of *usermedians* that is not None.
        When an element of *usermedians* == None, the median will be
        computed directly as normal.
    
      *conf_intervals* : [ default None ]
        Array or sequence whose first dimension (or length) is compatible
        with *x* and whose second dimension is 2. When the current element
        of *conf_intervals* is not None, the notch locations computed by
        matplotlib are overridden (assuming notch is True). When an
        element of *conf_intervals* is None, boxplot compute notches the
        method specified by the other kwargs (e.g., *bootstrap*).
    
      *positions* : [ default 1,2,...,n ]
        Sets the horizontal positions of the boxes. The ticks and limits
        are automatically set to match the positions.
    
      *widths* : [ default 0.5 ]
        Either a scalar or a vector and sets the width of each box. The
        default is 0.5, or ``0.15*(distance between extreme positions)``
        if that is smaller.
    
      *patch_artist* : [ False (default) | True ]
        If False produces boxes with the Line2D artist
        If True produces boxes with the Patch artist
    
    Returns a dictionary mapping each component of the boxplot
    to a list of the :class:`matplotlib.lines.Line2D`
    instances created. That dictionary has the following keys
    (assuming vertical boxplots):
    
        - boxes: the main body of the boxplot showing the quartiles
          and the median's confidence intervals if enabled.
        - medians: horizonal lines at the median of each box.
        - whiskers: the vertical lines extending to the most extreme,
          n-outlier data points.
        - caps: the horizontal lines at the ends of the whiskers.
        - fliers: points representing data that extend beyone the
          whiskers (outliers).
    
    **Example:**
    
    .. plot:: pyplots/boxplot_demo.py
    
    Additional kwargs: hold = [True|False] overrides default hold state



In [105]:

    
plt.boxplot(data[:,0:35], notch=True, sym='bd', vert=False)









    Out[105]:





{'boxes': [<matplotlib.lines.Line2D at 0x10e156490>,
  <matplotlib.lines.Line2D at 0x10e1786d0>,
  <matplotlib.lines.Line2D at 0x10e19a910>],
 'caps': [<matplotlib.lines.Line2D at 0x10e1497d0>,
  <matplotlib.lines.Line2D at 0x10e149e10>,
  <matplotlib.lines.Line2D at 0x10e16ca10>,
  <matplotlib.lines.Line2D at 0x10e178090>,
  <matplotlib.lines.Line2D at 0x10e191c50>,
  <matplotlib.lines.Line2D at 0x10e19a2d0>],
 'fliers': [<matplotlib.lines.Line2D at 0x10e162150>,
  <matplotlib.lines.Line2D at 0x10e162750>,
  <matplotlib.lines.Line2D at 0x10e184390>,
  <matplotlib.lines.Line2D at 0x10e184990>,
  <matplotlib.lines.Line2D at 0x10e1a65d0>,
  <matplotlib.lines.Line2D at 0x10e1a6bd0>],
 'medians': [<matplotlib.lines.Line2D at 0x10e156ad0>,
  <matplotlib.lines.Line2D at 0x10e178d10>,
  <matplotlib.lines.Line2D at 0x10e19af50>],
 'whiskers': [<matplotlib.lines.Line2D at 0x10e13ee50>,
  <matplotlib.lines.Line2D at 0x10e149110>,
  <matplotlib.lines.Line2D at 0x10e16c150>,
  <matplotlib.lines.Line2D at 0x10e16c3d0>,
  <matplotlib.lines.Line2D at 0x10e191390>,
  <matplotlib.lines.Line2D at 0x10e191610>]}



In [98]:

    
print list(a:b)









    



  File "<ipython-input-98-d204121a220f>", line 1
    list([1:3])
           ^
SyntaxError: invalid syntax



In [ ]: