Today we are going to be reviewing some basic exploratory data analysis techniques.
For this, and many other exercises in the future, we will be importing the following libraries:
In [39]:
from matplotlib import pyplot as plt
from matplotlib import rcParams
import numpy as np
%matplotlib inline
In [40]:
pwd
Out[40]:
u'/Users/mberges/Documents/courses/2015/Fall/12-752/data'
In [41]:
!ls
ac.csv fc_7.txt source.txt
campusDemand.csv fridge.csv temp.txt
fc_14.txt public_layout.csv
fc_28.txt recs2009_public.csv
There are a variety of ways to load the data into memory, so I will focus on one of the simplest ones:
In [42]:
file = open('recs2009_public.csv','r')
So far I have only opened the file for reading. Now I need to load it into memory, and for that I will use a CSV package (CSV stands for Comma Separated Values).
In [43]:
import csv
The csv package has a reader method, which creates an iterator which iterates over the lines of the file.
In [44]:
help(csv.reader)
Help on built-in function reader in module _csv:
reader(...)
csv_reader = reader(iterable [, dialect='excel']
[optional keyword args])
for row in csv_reader:
process(row)
The "iterable" argument can be any object that returns a line
of input for each iteration, such as a file object or a list. The
optional "dialect" parameter is discussed below. The function
also accepts optional keyword arguments which override settings
provided by the dialect.
The returned object is an iterator. Each iteration returns a row
of the CSV file (which can span multiple input lines):
In [45]:
reader = csv.reader(file, delimiter=',')
Let's see what type is the resulting iterator object called reader:
In [46]:
type(reader)
Out[46]:
_csv.reader
We can also ask for help on the object, to see its methods and attributes:
In [47]:
help(reader)
Help on reader object:
class reader(__builtin__.object)
| CSV reader
|
| Reader objects are responsible for reading and parsing tabular data
| in CSV format.
|
| Methods defined here:
|
| __iter__(...)
| x.__iter__() <==> iter(x)
|
| next(...)
| x.next() -> the next value, or raise StopIteration
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| dialect
|
| line_num
Now, we can create a list out of the reader object, and it will iterate through the whole file to generate that list:
In [48]:
fullcsv = list(reader)
This is when we have finally loaded all the data into memory. Let's then look at the first two rows:
In [49]:
for row in range(2):
print(fullcsv[row])
['DOEID', 'REGIONC', 'DIVISION', 'REPORTABLE_DOMAIN', 'TYPEHUQ', 'NWEIGHT', 'HDD65', 'CDD65', 'HDD30YR', 'CDD30YR', 'Climate_Region_Pub', 'AIA_Zone', 'METROMICRO', 'UR', 'KOWNRENT', 'CONDCOOP', 'YEARMADE', 'YEARMADERANGE', 'OCCUPYYRANGE', 'CONVERSION', 'ORIG1FAM', 'LOOKLIKE', 'NUMFLRS', 'NUMAPTS', 'WALLTYPE', 'ROOFTYPE', 'STUDIO', 'NAPTFLRS', 'STORIES', 'TYPEHUQ4', 'BEDROOMS', 'NCOMBATH', 'NHAFBATH', 'OTHROOMS', 'TOTROOMS', 'CELLAR', 'CRAWL', 'CONCRETE', 'BASEFIN', 'FINBASERMS', 'BASEHEAT', 'BASEHT2', 'PCTBSTHT', 'BASECOOL', 'BASECL2', 'PCTBSTCL', 'BASEUSE', 'ATTIC', 'ATTICFIN', 'FINATTRMS', 'ATTCHEAT', 'ATTCHT2', 'PCTATTHT', 'ATTCCOOL', 'ATTCCL2', 'PCTATTCL', 'ATTICUSE', 'PRKGPLC1', 'SIZEOFGARAGE', 'GARGLOC', 'GARGHEAT', 'GARGCOOL', 'PRKGPLC2', 'SIZEOFDETACH', 'OUTLET', 'ZKOWNRENT', 'ZCONDCOOP', 'ZYEARMADE', 'ZYEARMADERANGE', 'ZOCCUPYYRANGE', 'ZCONVERSION', 'ZORIG1FAM', 'ZLOOKLIKE', 'ZNUMFLRS', 'ZNUMAPTS', 'ZWALLTYPE', 'ZROOFTYPE', 'ZSTUDIO', 'ZNAPTFLRS', 'ZSTORIES', 'ZTYPEHUQ4', 'ZBEDROOMS', 'ZNCOMBATH', 'ZNHAFBATH', 'ZOTHROOMS', 'ZCELLAR', 'ZCRAWL', 'ZCONCRETE', 'ZBASEFIN', 'ZFINBASERMS', 'ZBASEHEAT', 'ZBASEHT2', 'ZPCTBSTHT', 'ZBASECOOL', 'ZBASECL2', 'ZPCTBSTCL', 'ZBASEUSE', 'ZATTIC', 'ZATTICFIN', 'ZFINATTRMS', 'ZATTCHEAT', 'ZATTCHT2', 'ZPCTATTHT', 'ZATTCCOOL', 'ZPCTATTCL', 'ZATTCCL2', 'ZATTICUSE', 'ZPRKGPLC1', 'ZSIZEOFGARAGE', 'ZGARGLOC', 'ZGARGHEAT', 'ZGARGCOOL', 'ZPRKGPLC2', 'ZSIZEOFDETACH', 'STOVEN', 'STOVENFUEL', 'STOVE', 'STOVEFUEL', 'OVEN', 'OVENFUEL', 'OVENUSE', 'OVENCLN', 'TYPECLN', 'MICRO', 'AMTMICRO', 'DEFROST', 'OUTGRILL', 'OUTGRILLFUEL', 'TOPGRILL', 'STGRILA', 'TOASTER', 'NUMMEAL', 'FUELFOOD', 'COFFEE', 'NUMFRIG', 'TYPERFR1', 'SIZRFRI1', 'REFRIGT1', 'ICE', 'AGERFRI1', 'ESFRIG', 'REPLCFRI', 'HELPFRI', 'HELPFRIY', 'TYPERFR2', 'SIZRFRI2', 'REFRIGT2', 'MONRFRI2', 'AGERFRI2', 'ESFRIG2', 'TYPERFR3', 'SIZRFRI3', 'REFRIGT3', 'MONRFRI3', 'AGERFRI3', 'ESFRIG3', 'SEPFREEZ', 'NUMFREEZ', 'UPRTFRZR', 'SIZFREEZ', 'FREEZER', 'AGEFRZR', 'REPLCFRZ', 'HELPFRZ', 'HELPFRZY', 'UPRTFRZR2', 'SIZFREEZ2', 'FREEZER2', 'AGEFRZR2', 'DISHWASH', 'DWASHUSE', 'AGEDW', 'ESDISHW', 'REPLCDW', 'HELPDW', 'HELPDWY', 'ZSTOVEN', 'ZSTOVENFUEL', 'ZSTOVE', 'ZSTOVEFUEL', 'ZOVEN', 'ZOVENFUEL', 'ZOVENUSE', 'ZOVENCLN', 'ZTYPECLN', 'ZMICRO', 'ZAMTMICRO', 'ZDEFROST', 'ZOUTGRILL', 'ZOUTGRILLFUEL', 'ZTOPGRILL', 'ZSTGRILA', 'ZTOASTER', 'ZNUMMEAL', 'ZFUELFOOD', 'ZCOFFEE', 'ZNUMFRIG', 'ZTYPERFR1', 'ZSIZRFRI1', 'ZREFRIGT1', 'ZICE', 'ZAGERFRI1', 'ZTYPERFR2', 'ZSIZRFRI2', 'ZREFRIGT2', 'ZMONRFRI2', 'ZAGERFRI2', 'ZTYPERFR3', 'ZSIZRFRI3', 'ZREFRIGT3', 'ZMONRFRI3', 'ZAGERFRI3', 'ZSEPFREEZ', 'ZNUMFREEZ', 'ZUPRTFRZR', 'ZSIZFREEZ', 'ZFREEZER', 'ZAGEFRZR', 'ZUPRTFRZR2', 'ZSIZFREEZ2', 'ZFREEZER2', 'ZAGEFRZR2', 'ZDISHWASH', 'ZDWASHUSE', 'ZAGEDW', 'CWASHER', 'TOPFRONT', 'WASHLOAD', 'WASHTEMP', 'RNSETEMP', 'AGECWASH', 'ESCWASH', 'REPLCCW', 'HELPCW', 'HELPCWY', 'DRYER', 'DRYRFUEL', 'DRYRUSE', 'AGECDRYER', 'TVCOLOR', 'TVSIZE1', 'TVTYPE1', 'CABLESAT1', 'COMBODVR1', 'DVR1', 'DIGITSTB1', 'PLAYSTA1', 'COMBOVCRDVD1', 'VCR1', 'DVD1', 'TVAUDIOSYS1', 'OTHERSTB1', 'TVONWD1', 'TVONWDWATCH1', 'TVONWE1', 'TVONWEWATCH1', 'TVSIZE2', 'TVTYPE2', 'CABLESAT2', 'COMBODVR2', 'DVR2', 'DIGITSTB2', 'PLAYSTA2', 'COMBOVCRDVD2', 'VCR2', 'DVD2', 'TVAUDIOSYS2', 'OTHERSTB2', 'TVONWD2', 'TVONWDWATCH2', 'TVONWE2', 'TVONWEWATCH2', 'TVSIZE3', 'TVTYPE3', 'CABLESAT3', 'COMBODVR3', 'DVR3', 'DIGITSTB3', 'PLAYSTA3', 'COMBOVCRDVD3', 'VCR3', 'DVD3', 'TVAUDIOSYS3', 'OTHERSTB3', 'TVONWD3', 'TVONWDWATCH3', 'TVONWE3', 'TVONWEWATCH3', 'COMPUTER', 'NUMPC', 'PCTYPE1', 'MONITOR1', 'TIMEON1', 'PCONOFF1', 'PCSLEEP1', 'PCTYPE2', 'MONITOR2', 'TIMEON2', 'PCONOFF2', 'PCSLEEP2', 'PCTYPE3', 'MONITOR3', 'TIMEON3', 'PCONOFF3', 'PCSLEEP3', 'INTERNET', 'INDIALUP', 'INDSL', 'INCABLE', 'INSATEL', 'INWIRELESS', 'PCPRINT', 'FAX', 'COPIER', 'WELLPUMP', 'DIPSTICK', 'SWAMPCOL', 'AQUARIUM', 'STEREO', 'NOCORD', 'ANSMACH', 'BATTOOLS', 'BATCHRG', 'CHRGPLGT', 'ELECDEV', 'ELECCHRG', 'CHRGPLGE', 'ZCWASHER', 'ZTOPFRONT', 'ZWASHLOAD', 'ZWASHTEMP', 'ZRNSETEMP', 'ZAGECWASH', 'ZDRYER', 'ZDRYRFUEL', 'ZDRYRUSE', 'ZAGECDRYER', 'ZTVCOLOR', 'ZTVSIZE1', 'ZTVTYPE1', 'ZCABLESAT1', 'ZCOMBODVR1', 'ZDVR1', 'ZDIGITSTB1', 'ZPLAYSTA1', 'ZCOMBOVCRDVD1', 'ZVCR1', 'ZDVD1', 'ZTVAUDIOSYS1', 'ZOTHERSTB1', 'ZTVONWD1', 'ZTVONWDWATCH1', 'ZTVONWE1', 'ZTVONWEWATCH1', 'ZTVSIZE2', 'ZTVTYPE2', 'ZCABLESAT2', 'ZCOMBODVR2', 'ZDVR2', 'ZDIGITSTB2', 'ZPLAYSTA2', 'ZCOMBOVCRDVD2', 'ZVCR2', 'ZDVD2', 'ZTVAUDIOSYS2', 'ZOTHERSTB2', 'ZTVONWD2', 'ZTVONWDWATCH2', 'ZTVONWE2', 'ZTVONWEWATCH2', 'ZTVSIZE3', 'ZTVTYPE3', 'ZCABLESAT3', 'ZCOMBODVR3', 'ZDVR3', 'ZDIGITSTB3', 'ZPLAYSTA3', 'ZCOMBOVCRDVD3', 'ZVCR3', 'ZDVD3', 'ZTVAUDIOSYS3', 'ZOTHERSTB3', 'ZTVONWD3', 'ZTVONWDWATCH3', 'ZTVONWE3', 'ZTVONWEWATCH3', 'ZCOMPUTER', 'ZNUMPC', 'ZPCTYPE1', 'ZMONITOR1', 'ZTIMEON1', 'ZPCONOFF1', 'ZPCSLEEP1', 'ZPCTYPE2', 'ZMONITOR2', 'ZTIMEON2', 'ZPCONOFF2', 'ZPCSLEEP2', 'ZPCTYPE3', 'ZMONITOR3', 'ZTIMEON3', 'ZPCONOFF3', 'ZPCSLEEP3', 'ZINTERNET', 'ZINDIALUP', 'ZINDSL', 'ZINCABLE', 'ZINSATEL', 'ZINWIRELESS', 'ZPCPRINT', 'ZFAX', 'ZCOPIER', 'ZWELLPUMP', 'ZDIPSTICK', 'ZSWAMPCOL', 'ZAQUARIUM', 'ZSTEREO', 'ZNOCORD', 'ZANSMACH', 'ZBATTOOLS', 'ZBATCHRG', 'ZCHRGPLGT', 'ZELECDEV', 'ZELECCHRG', 'ZCHRGPLGE', 'HEATHOME', 'DNTHEAT', 'EQUIPNOHEAT', 'FUELNOHEAT', 'EQUIPM', 'FUELHEAT', 'MAINTHT', 'EQUIPAGE', 'REPLCHT', 'HELPHT', 'HELPHTY', 'HEATOTH', 'EQUIPAUX', 'REVERSE', 'WARMAIR', 'FURNFUEL', 'STEAMR', 'RADFUEL', 'PERMELEC', 'PIPELESS', 'PIPEFUEL', 'ROOMHEAT', 'RMHTFUEL', 'WOODKILN', 'HSFUEL', 'CARRYEL', 'CARRYKER', 'CHIMNEY', 'FPFUEL', 'NGFPFLUE', 'USENGFP', 'RANGE', 'RNGFUEL', 'DIFEQUIP', 'DIFFUEL', 'EQMAMT', 'HEATROOM', 'THERMAIN', 'NUMTHERM', 'PROTHERM', 'AUTOHEATNITE', 'AUTOHEATDAY', 'TEMPHOME', 'TEMPGONE', 'TEMPNITE', 'MOISTURE', 'USEMOISTURE', 'ZHEATHOME', 'ZDNTHEAT', 'ZEQUIPNOHEAT', 'ZFUELNOHEAT', 'ZEQUIPM', 'ZFUELHEAT', 'ZMAINTHT', 'ZEQUIPAGE', 'ZHEATOTH', 'ZFURNFUEL', 'ZRADFUEL', 'ZPIPEFUEL', 'ZRMHTFUEL', 'ZHSFUEL', 'ZFPFUEL', 'ZNGFPFLUE', 'ZUSENGFP', 'ZRNGFUEL', 'ZDIFFUEL', 'ZEQMAMT', 'ZHEATROOM', 'ZTHERMAIN', 'ZNUMTHERM', 'ZPROTHERM', 'ZAUTOHEATNITE', 'ZAUTOHEATDAY', 'ZTEMPHOME', 'ZTEMPGONE', 'ZTEMPNITE', 'ZMOISTURE', 'ZUSEMOISTURE', 'NUMH2ONOTNK', 'NUMH2OHTRS', 'H2OTYPE1', 'FUELH2O', 'WHEATOTH', 'WHEATSIZ', 'WHEATAGE', 'WHEATBKT', 'HELPWH', 'HELPWHY', 'H2OTYPE2', 'FUELH2O2', 'WHEATSIZ2', 'WHEATAGE2', 'ZNUMH2OHTRS', 'ZNUMH2ONOTNK', 'ZH2OTYPE1', 'ZFUELH2O', 'ZWHEATOTH', 'ZWHEATSIZ', 'ZWHEATAGE', 'ZWHEATBKT', 'ZH2OTYPE2', 'ZFUELH2O2', 'ZWHEATSIZ2', 'ZWHEATAGE2', 'AIRCOND', 'DNTAC', 'COOLTYPENOAC', 'COOLTYPE', 'DUCTS', 'CENACHP', 'ACOTHERS', 'MAINTAC', 'AGECENAC', 'REPLCCAC', 'HELPCAC', 'HELPCACY', 'ACROOMS', 'USECENAC', 'THERMAINAC', 'PROTHERMAC', 'AUTOCOOLNITE', 'AUTOCOOLDAY', 'TEMPHOMEAC', 'TEMPGONEAC', 'TEMPNITEAC', 'NUMBERAC', 'WWACAGE', 'ESWWAC', 'REPLCWWAC', 'HELPWWAC', 'HELPWWACY', 'USEWWAC', 'NUMCFAN', 'USECFAN', 'TREESHAD', 'NOTMOIST', 'USENOTMOIST', 'ZAIRCOND', 'ZDNTAC', 'ZCOOLTYPENOAC', 'ZCOOLTYPE', 'ZDUCTS', 'ZCENACHP', 'ZACOTHERS', 'ZMAINTAC', 'ZAGECENAC', 'ZUSECENAC', 'ZACROOMS', 'ZTHERMAINAC', 'ZPROTHERMAC', 'ZAUTOCOOLNITE', 'ZAUTOCOOLDAY', 'ZTEMPHOMEAC', 'ZTEMPGONEAC', 'ZTEMPNITEAC', 'ZNUMBERAC', 'ZWWACAGE', 'ZUSEWWAC', 'ZNUMCFAN', 'ZUSECFAN', 'ZTREESHAD', 'ZNOTMOIST', 'ZUSENOTMOIST', 'HIGHCEIL', 'CATHCEIL', 'SWIMPOOL', 'POOL', 'FUELPOOL', 'RECBATH', 'FUELTUB', 'LGT12', 'LGT12EE', 'LGT4', 'LGT4EE', 'LGT1', 'LGT1EE', 'NOUTLGTNT', 'LGTOEE', 'NGASLIGHT', 'INSTLCFL', 'HELPCFL', 'HELPCFLY', 'SLDDRS', 'DOOR1SUM', 'WINDOWS', 'TYPEGLASS', 'NEWGLASS', 'HELPWIN', 'HELPWINY', 'ADQINSUL', 'INSTLINS', 'AGEINS', 'HELPINS', 'HELPINSY', 'DRAFTY', 'INSTLWS', 'AGEWS', 'HELPWS', 'HELPWSY', 'AUDIT', 'AGEAUD', 'HELPAUD', 'HELPAUDY', 'ZHIGHCEIL', 'ZCATHCEIL', 'ZSWIMPOOL', 'ZPOOL', 'ZFUELPOOL', 'ZRECBATH', 'ZFUELTUB', 'ZLGT12', 'ZLGT4', 'ZLGT1', 'ZNOUTLGTNT', 'ZNGASLIGHT', 'ZSLDDRS', 'ZDOOR1SUM', 'ZWINDOWS', 'ZTYPEGLASS', 'ZNEWGLASS', 'ZADQINSUL', 'ZINSTLINS', 'ZAGEINS', 'ZDRAFTY', 'ZINSTLWS', 'ZAGEWS', 'ZAUDIT', 'ZAGEAUD', 'USEEL', 'USENG', 'USELP', 'USEFO', 'USEKERO', 'USEWOOD', 'USESOLAR', 'USEOTH', 'ELWARM', 'ELECAUX', 'ELCOOL', 'ELWATER', 'ELFOOD', 'ELOTHER', 'UGWARM', 'UGASAUX', 'UGWATER', 'UGCOOK', 'UGOTH', 'LPWARM', 'LPGAUX', 'LPWATER', 'LPCOOK', 'LPOTHER', 'FOWARM', 'FOILAUX', 'FOWATER', 'FOOTHER', 'KRWARM', 'KEROAUX', 'KRWATER', 'KROTHER', 'WDWARM', 'WOODAUX', 'WDWATER', 'WDOTHUSE', 'SOLWARM', 'SOLARAUX', 'SOLWATER', 'SOLOTHER', 'OTHWARM', 'OTHERAUX', 'OTHWATER', 'OTHCOOK', 'ONSITE', 'ONSITEGRID', 'PELHEAT', 'PELHOTWA', 'PELCOOK', 'PELAC', 'PELLIGHT', 'OTHERWAYEL', 'PGASHEAT', 'PGASHTWA', 'PUGCOOK', 'PUGOTH', 'OTHERWAYNG', 'FOPAY', 'OTHERWAYFO', 'LPGPAY', 'OTHERWAYLPG', 'LPGDELV', 'KERODEL', 'KEROCASH', 'NOCRCASH', 'NKRGALNC', 'WOODLOGS', 'WDSCRAP', 'WDPELLET', 'WDOTHER', 'WOODAMT', 'NUMCORDS', 'ZONSITE', 'ZONSITEGRID', 'ZPELHEAT', 'ZPELHOTWA', 'ZPELCOOK', 'ZPELAC', 'ZPELLIGHT', 'ZOTHERWAYEL', 'ZPGASHEAT', 'ZPGASHTWA', 'ZPUGCOOK', 'ZPUGOTH', 'ZOTHERWAYNG', 'ZFOPAY', 'ZOTHERWAYFO', 'ZLPGPAY', 'ZOTHERWAYLPG', 'ZKERODEL', 'ZKEROCASH', 'ZNOCRCASH', 'ZNKRGALNC', 'ZWOODLOGS', 'ZWDSCRAP', 'ZWDPELLET', 'ZWDOTHER', 'ZWOODAMT', 'ZNUMCORDS', 'KFUELOT', 'HHSEX', 'EMPLOYHH', 'SPOUSE', 'SDESCENT', 'Householder_Race', 'EDUCATION', 'NHSLDMEM', 'HHAGE', 'AGEHHMEMCAT2', 'AGEHHMEMCAT3', 'AGEHHMEMCAT4', 'AGEHHMEMCAT5', 'AGEHHMEMCAT6', 'AGEHHMEMCAT7', 'AGEHHMEMCAT8', 'AGEHHMEMCAT9', 'AGEHHMEMCAT10', 'AGEHHMEMCAT11', 'AGEHHMEMCAT12', 'AGEHHMEMCAT13', 'AGEHHMEMCAT14', 'HBUSNESS', 'ATHOME', 'TELLWORK', 'TELLDAYS', 'OTHWORK', 'WORKPAY', 'RETIREPY', 'SSINCOME', 'CASHBEN', 'INVESTMT', 'RGLRPAY', 'MONEYPY', 'POVERTY100', 'POVERTY150', 'HUPROJ', 'RENTHELP', 'FOODASST', 'ZHHSEX', 'ZHHAGE', 'ZEMPLOYHH', 'ZSPOUSE', 'ZSDESCENT', 'ZHouseholder_Race', 'ZEDUCATION', 'ZNHSLDMEM', 'ZAGEHHMEMCAT2', 'ZAGEHHMEMCAT3', 'ZAGEHHMEMCAT4', 'ZAGEHHMEMCAT5', 'ZAGEHHMEMCAT6', 'ZAGEHHMEMCAT7', 'ZAGEHHMEMCAT8', 'ZAGEHHMEMCAT9', 'ZAGEHHMEMCAT10', 'ZAGEHHMEMCAT11', 'ZAGEHHMEMCAT12', 'ZAGEHHMEMCAT13', 'ZAGEHHMEMCAT14', 'ZHBUSNESS', 'ZATHOME', 'ZTELLWORK', 'ZTELLDAYS', 'ZOTHWORK', 'ZWORKPAY', 'ZRETIREPY', 'ZSSINCOME', 'ZCASHBEN', 'ZINVESTMT', 'ZRGLRPAY', 'ZMONEYPY', 'ZHUPROJ', 'ZRENTHELP', 'ZFOODASST', 'TOTSQFT', 'TOTSQFT_EN', 'TOTHSQFT', 'TOTUSQFT', 'TOTCSQFT', 'TOTUCSQFT', 'ZTOTSQFT', 'ZTOTSQFT_EN', 'ZTOTHSQFT', 'ZTOTUSQFT', 'ZTOTCSQFT', 'ZTOTUCSQFT', 'KWH', 'KWHSPH', 'KWHCOL', 'KWHWTH', 'KWHRFG', 'KWHOTH', 'BTUEL', 'BTUELSPH', 'BTUELCOL', 'BTUELWTH', 'BTUELRFG', 'BTUELOTH', 'DOLLAREL', 'DOLELSPH', 'DOLELCOL', 'DOLELWTH', 'DOLELRFG', 'DOLELOTH', 'CUFEETNG', 'CUFEETNGSPH', 'CUFEETNGWTH', 'CUFEETNGOTH', 'BTUNG', 'BTUNGSPH', 'BTUNGWTH', 'BTUNGOTH', 'DOLLARNG', 'DOLNGSPH', 'DOLNGWTH', 'DOLNGOTH', 'GALLONLP', 'GALLONLPSPH', 'GALLONLPWTH', 'GALLONLPOTH', 'BTULP', 'BTULPSPH', 'BTULPWTH', 'BTULPOTH', 'DOLLARLP', 'DOLLPSPH', 'DOLLPWTH', 'DOLLPOTH', 'GALLONFO', 'GALLONFOSPH', 'GALLONFOWTH', 'GALLONFOOTH', 'BTUFO', 'BTUFOSPH', 'BTUFOWTH', 'BTUFOOTH', 'DOLLARFO', 'DOLFOSPH', 'DOLFOWTH', 'DOLFOOTH', 'GALLONKER', 'GALLONKERSPH', 'GALLONKERWTH', 'GALLONKEROTH', 'BTUKER', 'BTUKERSPH', 'BTUKERWTH', 'BTUKEROTH', 'DOLLARKER', 'DOLKERSPH', 'DOLKERWTH', 'DOLKEROTH', 'BTUWOOD', 'CORDSWD', 'TOTALBTU', 'TOTALBTUSPH', 'TOTALBTUCOL', 'TOTALBTUWTH', 'TOTALBTURFG', 'TOTALBTUOTH', 'TOTALDOL', 'TOTALDOLSPH', 'TOTALDOLCOL', 'TOTALDOLWTH', 'TOTALDOLRFG', 'TOTALDOLOTH', 'KAVALEL', 'PERIODEL', 'SCALEEL', 'KAVALNG', 'PERIODNG', 'SCALENG', 'PERIODLP', 'SCALELP', 'PERIODFO', 'SCALEFO', 'PERIODKR', 'SCALEKER']
['1', '2', '4', '12', '2', '2471.679705', '4742', '1080', '4953', '1271', '4', '3', 'METRO', 'U', '1', '-2', '2004', '7', '8', '-2', '-2', '-2', '-2', '-2', '1', '5', '-2', '-2', '20', '-2', '4', '1', '2', '5', '9', '1', '0', '0', '1', '1', '1', '2', '3', '1', '2', '3', '1', '0', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '1', '2', '2', '0', '0', '-2', '-2', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '5', '0', '-2', '0', '-2', '3', '1', '1', '1', '3', '1', '1', '2', '0', '-2', '0', '4', '5', '1', '2', '21', '4', '2', '1', '3', '1', '-2', '-2', '-2', '21', '4', '2', '12', '3', '1', '-2', '-2', '-2', '-2', '-2', '-2', '0', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '1', '13', '3', '1', '-2', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '2', '2', '3', '3', '2', '1', '1', '0', '-2', '1', '5', '1', '1', '4', '3', '2', '1', '1', '-2', '0', '1', '0', '0', '1', '1', '0', '3', '4', '3', '4', '3', '4', '1', '1', '-2', '0', '0', '0', '0', '0', '0', '0', '2', '-2', '2', '-2', '3', '3', '0', '-2', '0', '0', '0', '0', '0', '1', '0', '0', '3', '-2', '2', '-2', '1', '2', '1', '1', '3', '0', '1', '1', '1', '3', '0', '1', '-2', '-2', '-2', '-2', '-2', '1', '0', '1', '0', '0', '1', '1', '0', '0', '0', '-2', '-2', '0', '0', '1', '1', '1', '2', '0', '1', '2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '-2', '-2', '-2', '3', '5', '0', '3', '-2', '-2', '-2', '0', '0', '0', '0', '-2', '0', '-2', '0', '0', '-2', '0', '-2', '0', '-2', '0', '0', '0', '-2', '-2', '-2', '0', '-2', '0', '-2', '-2', '9', '1', '2', '1', '1', '1', '68', '66', '68', '0', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '5', '0', '3', '3', '0', '-2', '-2', '-2', '-2', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '-2', '-2', '1', '-2', '0', '0', '1', '3', '-2', '0', '-2', '9', '3', '1', '1', '1', '1', '74', '78', '73', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '3', '3', '0', '0', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '-2', '0', '-2', '-2', '0', '-2', '3', '3', '2', '2', '2', '2', '0', '-2', '-2', '1', '0', '-2', '1', '1', '41', '2', '3', '-2', '-2', '1', '1', '2', '0', '-2', '4', '0', '-2', '-2', '-2', '0', '-2', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '1', '0', '0', '0', '0', '0', '1', '0', '1', '1', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '-2', '1', '1', '1', '1', '1', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '0', '1', '5', '4', '35', '8', '1', '1', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '0', '0', '0', '-2', '0', '1', '0', '0', '0', '0', '0', '23', '0', '0', '-2', '-2', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '5075', '4675', '3958', '1118', '3958', '1118', '0', '0', '0', '0', '0', '0', '18466', '3186.707', '3068.795', '2968.45', '1515.504', '7726.545', '63006', '10873.045', '10470.729', '10128.354', '5170.899', '26362.973', '1315', '226.932', '218.535', '211.389', '107.922', '550.222', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '63006', '10873', '10471', '10128', '5171', '26363', '1315', '227', '219', '211', '108', '550', '1', '1', '0', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2', '-2']
This is a very simple way to load files, but not necessarily the most convenient one. Another more versatile funciton is the numpy function called genfromtxt:
In [50]:
help(np.genfromtxt)
Help on function genfromtxt in module numpy.lib.npyio:
genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, skiprows=0, skip_header=0, skip_footer=0, converters=None, missing='', missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True)
Load data from a text file, with missing values handled as specified.
Each line past the first `skip_header` lines is split at the `delimiter`
character, and characters following the `comments` character are discarded.
Parameters
----------
fname : file or str
File, filename, or generator to read. If the filename extension is
`.gz` or `.bz2`, the file is first decompressed. Note that
generators must return byte strings in Python 3k.
dtype : dtype, optional
Data type of the resulting array.
If None, the dtypes will be determined by the contents of each
column, individually.
comments : str, optional
The character used to indicate the start of a comment.
All the characters occurring on a line after a comment are discarded
delimiter : str, int, or sequence, optional
The string used to separate values. By default, any consecutive
whitespaces act as delimiter. An integer or sequence of integers
can also be provided as width(s) of each field.
skip_rows : int, optional
`skip_rows` was deprecated in numpy 1.5, and will be removed in
numpy 2.0. Please use `skip_header` instead.
skip_header : int, optional
The number of lines to skip at the beginning of the file.
skip_footer : int, optional
The number of lines to skip at the end of the file.
converters : variable, optional
The set of functions that convert the data of a column to a value.
The converters can also be used to provide a default value
for missing data: ``converters = {3: lambda s: float(s or 0)}``.
missing : variable, optional
`missing` was deprecated in numpy 1.5, and will be removed in
numpy 2.0. Please use `missing_values` instead.
missing_values : variable, optional
The set of strings corresponding to missing data.
filling_values : variable, optional
The set of values to be used as default when the data are missing.
usecols : sequence, optional
Which columns to read, with 0 being the first. For example,
``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.
names : {None, True, str, sequence}, optional
If `names` is True, the field names are read from the first valid line
after the first `skip_header` lines.
If `names` is a sequence or a single-string of comma-separated names,
the names will be used to define the field names in a structured dtype.
If `names` is None, the names of the dtype fields will be used, if any.
excludelist : sequence, optional
A list of names to exclude. This list is appended to the default list
['return','file','print']. Excluded names are appended an underscore:
for example, `file` would become `file_`.
deletechars : str, optional
A string combining invalid characters that must be deleted from the
names.
defaultfmt : str, optional
A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
Whether to automatically strip white spaces from the variables.
replace_space : char, optional
Character(s) used in replacement of white spaces in the variables
names. By default, use a '_'.
case_sensitive : {True, False, 'upper', 'lower'}, optional
If True, field names are case sensitive.
If False or 'upper', field names are converted to upper case.
If 'lower', field names are converted to lower case.
unpack : bool, optional
If True, the returned array is transposed, so that arguments may be
unpacked using ``x, y, z = loadtxt(...)``
usemask : bool, optional
If True, return a masked array.
If False, return a regular array.
loose : bool, optional
If True, do not raise errors for invalid values.
invalid_raise : bool, optional
If True, an exception is raised if an inconsistency is detected in the
number of columns.
If False, a warning is emitted and the offending lines are skipped.
Returns
-------
out : ndarray
Data read from the text file. If `usemask` is True, this is a
masked array.
See Also
--------
numpy.loadtxt : equivalent function when no data is missing.
Notes
-----
* When spaces are used as delimiters, or when no delimiter has been given
as input, there should not be any missing data between two fields.
* When the variables are named (either by a flexible dtype or with `names`,
there must not be any header in the file (else a ValueError
exception is raised).
* Individual values are not stripped of spaces by default.
When using a custom converter, make sure the function does remove spaces.
References
----------
.. [1] Numpy User Guide, section `I/O with Numpy
<http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html>`_.
Examples
---------
>>> from StringIO import StringIO
>>> import numpy as np
Comma delimited file with mixed dtype
>>> s = StringIO("1,1.3,abcde")
>>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
... ('mystring','S5')], delimiter=",")
>>> data
array((1, 1.3, 'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
Using dtype = None
>>> s.seek(0) # needed for StringIO example only
>>> data = np.genfromtxt(s, dtype=None,
... names = ['myint','myfloat','mystring'], delimiter=",")
>>> data
array((1, 1.3, 'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
Specifying dtype and names
>>> s.seek(0)
>>> data = np.genfromtxt(s, dtype="i8,f8,S5",
... names=['myint','myfloat','mystring'], delimiter=",")
>>> data
array((1, 1.3, 'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
An example with fixed-width columns
>>> s = StringIO("11.3abcde")
>>> data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
... delimiter=[1,3,5])
>>> data
array((1, 1.3, 'abcde'),
dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('strvar', '|S5')])
Now we can use that function to load the data much more easily. Here's an example that only scratches the surface of what's possible with this function:
In [51]:
data = np.genfromtxt('recs2009_public.csv',delimiter=',',skiprows=1,usecols=(841,842,843))
Let's look at the first column of that dataset:
In [54]:
print data[:,0]
[ 3068.795 181.998 184.459 ..., 847.734 135.687 0. ]
We can also plot it against the row number:
In [59]:
plt.plot(data[:,0], 'rd')
Out[59]:
[<matplotlib.lines.Line2D at 0x10a8e52d0>]
The look and feel may not be very good, but thankfully we can customize that. For much more detailed information on how to do that, please see this link.
For now, let me just show you how to do it using the rcParams method we imported earlier from the matplotlib library:
In [ ]:
help(plt.rcParams)
In [62]:
rcParams['font.size'] = 20
rcParams['lines.linewidth'] = 3
rcParams['figure.figsize'] = (10, 6)
plt.plot(data[:,0], 'o', markersize=20, alpha=0.25)
Out[62]:
[<matplotlib.lines.Line2D at 0x10b2fab50>]
In [67]:
plt.scatter(data[:,1], data[:,2])
Out[67]:
<matplotlib.collections.PathCollection at 0x10bbd2350>
In [68]:
!ls -l
total 275648
-rw-r--r--+ 1 mberges staff 48864668 Oct 27 11:02 ac.csv
-rw-r--r--+ 1 mberges staff 64609139 Oct 27 11:02 campusDemand.csv
-rw-r--r--+ 1 mberges staff 326 Oct 27 11:02 fc_14.txt
-rw-r--r--+ 1 mberges staff 326 Oct 27 11:02 fc_28.txt
-rw-r--r--+ 1 mberges staff 341 Oct 27 11:02 fc_7.txt
-rw-r--r--+ 1 mberges staff 108574 Oct 27 11:02 fridge.csv
-rw-r--r--@ 1 mberges staff 59883 Oct 27 11:02 public_layout.csv
-rw-r--r--@ 1 mberges staff 27460827 Oct 27 11:02 recs2009_public.csv
-rw-r--r--@ 1 mberges staff 1032 Oct 27 11:02 source.txt
-rw-r--r--+ 1 mberges staff 63 Oct 27 11:02 temp.txt
In [69]:
!head public_layout.csv
In [77]:
names = np.genfromtxt('public_layout.csv', delimiter=',', skip_header=1, usecols=(1), dtype='string')
In [80]:
names[840]
Out[80]:
'"Electricity usage for space heating'
In [74]:
help(np.genfromtxt)
Help on function genfromtxt in module numpy.lib.npyio:
genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, skiprows=0, skip_header=0, skip_footer=0, converters=None, missing='', missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True)
Load data from a text file, with missing values handled as specified.
Each line past the first `skip_header` lines is split at the `delimiter`
character, and characters following the `comments` character are discarded.
Parameters
----------
fname : file or str
File, filename, or generator to read. If the filename extension is
`.gz` or `.bz2`, the file is first decompressed. Note that
generators must return byte strings in Python 3k.
dtype : dtype, optional
Data type of the resulting array.
If None, the dtypes will be determined by the contents of each
column, individually.
comments : str, optional
The character used to indicate the start of a comment.
All the characters occurring on a line after a comment are discarded
delimiter : str, int, or sequence, optional
The string used to separate values. By default, any consecutive
whitespaces act as delimiter. An integer or sequence of integers
can also be provided as width(s) of each field.
skip_rows : int, optional
`skip_rows` was deprecated in numpy 1.5, and will be removed in
numpy 2.0. Please use `skip_header` instead.
skip_header : int, optional
The number of lines to skip at the beginning of the file.
skip_footer : int, optional
The number of lines to skip at the end of the file.
converters : variable, optional
The set of functions that convert the data of a column to a value.
The converters can also be used to provide a default value
for missing data: ``converters = {3: lambda s: float(s or 0)}``.
missing : variable, optional
`missing` was deprecated in numpy 1.5, and will be removed in
numpy 2.0. Please use `missing_values` instead.
missing_values : variable, optional
The set of strings corresponding to missing data.
filling_values : variable, optional
The set of values to be used as default when the data are missing.
usecols : sequence, optional
Which columns to read, with 0 being the first. For example,
``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.
names : {None, True, str, sequence}, optional
If `names` is True, the field names are read from the first valid line
after the first `skip_header` lines.
If `names` is a sequence or a single-string of comma-separated names,
the names will be used to define the field names in a structured dtype.
If `names` is None, the names of the dtype fields will be used, if any.
excludelist : sequence, optional
A list of names to exclude. This list is appended to the default list
['return','file','print']. Excluded names are appended an underscore:
for example, `file` would become `file_`.
deletechars : str, optional
A string combining invalid characters that must be deleted from the
names.
defaultfmt : str, optional
A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
Whether to automatically strip white spaces from the variables.
replace_space : char, optional
Character(s) used in replacement of white spaces in the variables
names. By default, use a '_'.
case_sensitive : {True, False, 'upper', 'lower'}, optional
If True, field names are case sensitive.
If False or 'upper', field names are converted to upper case.
If 'lower', field names are converted to lower case.
unpack : bool, optional
If True, the returned array is transposed, so that arguments may be
unpacked using ``x, y, z = loadtxt(...)``
usemask : bool, optional
If True, return a masked array.
If False, return a regular array.
loose : bool, optional
If True, do not raise errors for invalid values.
invalid_raise : bool, optional
If True, an exception is raised if an inconsistency is detected in the
number of columns.
If False, a warning is emitted and the offending lines are skipped.
Returns
-------
out : ndarray
Data read from the text file. If `usemask` is True, this is a
masked array.
See Also
--------
numpy.loadtxt : equivalent function when no data is missing.
Notes
-----
* When spaces are used as delimiters, or when no delimiter has been given
as input, there should not be any missing data between two fields.
* When the variables are named (either by a flexible dtype or with `names`,
there must not be any header in the file (else a ValueError
exception is raised).
* Individual values are not stripped of spaces by default.
When using a custom converter, make sure the function does remove spaces.
References
----------
.. [1] Numpy User Guide, section `I/O with Numpy
<http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html>`_.
Examples
---------
>>> from StringIO import StringIO
>>> import numpy as np
Comma delimited file with mixed dtype
>>> s = StringIO("1,1.3,abcde")
>>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
... ('mystring','S5')], delimiter=",")
>>> data
array((1, 1.3, 'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
Using dtype = None
>>> s.seek(0) # needed for StringIO example only
>>> data = np.genfromtxt(s, dtype=None,
... names = ['myint','myfloat','mystring'], delimiter=",")
>>> data
array((1, 1.3, 'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
Specifying dtype and names
>>> s.seek(0)
>>> data = np.genfromtxt(s, dtype="i8,f8,S5",
... names=['myint','myfloat','mystring'], delimiter=",")
>>> data
array((1, 1.3, 'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
An example with fixed-width columns
>>> s = StringIO("11.3abcde")
>>> data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
... delimiter=[1,3,5])
>>> data
array((1, 1.3, 'abcde'),
dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('strvar', '|S5')])
In [83]:
names[843]
Out[83]:
'"Electricity usage for refrigerators'
In [90]:
plt.hist(data[:,0],bins=100,range=(0,20000))
Out[90]:
(array([ 3.44300000e+03, 1.20900000e+03, 8.41000000e+02,
6.95000000e+02, 5.80000000e+02, 5.26000000e+02,
4.51000000e+02, 3.54000000e+02, 3.10000000e+02,
2.85000000e+02, 2.69000000e+02, 2.24000000e+02,
2.25000000e+02, 2.00000000e+02, 1.76000000e+02,
1.57000000e+02, 1.81000000e+02, 1.46000000e+02,
1.40000000e+02, 1.33000000e+02, 1.04000000e+02,
1.16000000e+02, 1.05000000e+02, 9.90000000e+01,
9.60000000e+01, 8.00000000e+01, 7.00000000e+01,
6.70000000e+01, 6.50000000e+01, 6.10000000e+01,
4.50000000e+01, 5.30000000e+01, 3.60000000e+01,
3.50000000e+01, 4.20000000e+01, 2.80000000e+01,
3.00000000e+01, 2.20000000e+01, 2.70000000e+01,
2.00000000e+01, 3.20000000e+01, 2.10000000e+01,
2.40000000e+01, 1.70000000e+01, 1.40000000e+01,
9.00000000e+00, 8.00000000e+00, 1.60000000e+01,
1.80000000e+01, 1.40000000e+01, 1.30000000e+01,
8.00000000e+00, 6.00000000e+00, 7.00000000e+00,
1.00000000e+01, 3.00000000e+00, 9.00000000e+00,
6.00000000e+00, 2.00000000e+00, 2.00000000e+00,
5.00000000e+00, 2.00000000e+00, 5.00000000e+00,
5.00000000e+00, 6.00000000e+00, 2.00000000e+00,
4.00000000e+00, 4.00000000e+00, 4.00000000e+00,
3.00000000e+00, 1.00000000e+00, 2.00000000e+00,
2.00000000e+00, 5.00000000e+00, 1.00000000e+00,
5.00000000e+00, 6.00000000e+00, 3.00000000e+00,
1.00000000e+00, 1.00000000e+00, 0.00000000e+00,
1.00000000e+00, 2.00000000e+00, 3.00000000e+00,
1.00000000e+00, 2.00000000e+00, 1.00000000e+00,
1.00000000e+00, 0.00000000e+00, 1.00000000e+00,
0.00000000e+00, 0.00000000e+00, 1.00000000e+00,
0.00000000e+00, 1.00000000e+00, 1.00000000e+00,
0.00000000e+00, 1.00000000e+00, 0.00000000e+00,
1.00000000e+00]),
array([ 0., 200., 400., 600., 800., 1000., 1200.,
1400., 1600., 1800., 2000., 2200., 2400., 2600.,
2800., 3000., 3200., 3400., 3600., 3800., 4000.,
4200., 4400., 4600., 4800., 5000., 5200., 5400.,
5600., 5800., 6000., 6200., 6400., 6600., 6800.,
7000., 7200., 7400., 7600., 7800., 8000., 8200.,
8400., 8600., 8800., 9000., 9200., 9400., 9600.,
9800., 10000., 10200., 10400., 10600., 10800., 11000.,
11200., 11400., 11600., 11800., 12000., 12200., 12400.,
12600., 12800., 13000., 13200., 13400., 13600., 13800.,
14000., 14200., 14400., 14600., 14800., 15000., 15200.,
15400., 15600., 15800., 16000., 16200., 16400., 16600.,
16800., 17000., 17200., 17400., 17600., 17800., 18000.,
18200., 18400., 18600., 18800., 19000., 19200., 19400.,
19600., 19800., 20000.]),
<a list of 100 Patch objects>)
In [91]:
help(plt.boxplot)
Help on function boxplot in module matplotlib.pyplot:
boxplot(x, notch=False, sym='b+', vert=True, whis=1.5, positions=None, widths=None, patch_artist=False, bootstrap=None, usermedians=None, conf_intervals=None, hold=None)
Make a box and whisker plot.
Call signature::
boxplot(x, notch=False, sym='+', vert=True, whis=1.5,
positions=None, widths=None, patch_artist=False,
bootstrap=None, usermedians=None, conf_intervals=None)
Make a box and whisker plot for each column of *x* or each
vector in sequence *x*. The box extends from the lower to
upper quartile values of the data, with a line at the median.
The whiskers extend from the box to show the range of the
data. Flier points are those past the end of the whiskers.
Function Arguments:
*x* :
Array or a sequence of vectors.
*notch* : [ False (default) | True ]
If False (default), produces a rectangular box plot.
If True, will produce a notched box plot
*sym* : [ default 'b+' ]
The default symbol for flier points.
Enter an empty string ('') if you don't want to show fliers.
*vert* : [ False | True (default) ]
If True (default), makes the boxes vertical.
If False, makes horizontal boxes.
*whis* : [ default 1.5 ]
Defines the length of the whiskers as a function of the inner
quartile range. They extend to the most extreme data point
within ( ``whis*(75%-25%)`` ) data range.
*bootstrap* : [ *None* (default) | integer ]
Specifies whether to bootstrap the confidence intervals
around the median for notched boxplots. If bootstrap==None,
no bootstrapping is performed, and notches are calculated
using a Gaussian-based asymptotic approximation (see McGill, R.,
Tukey, J.W., and Larsen, W.A., 1978, and Kendall and Stuart,
1967). Otherwise, bootstrap specifies the number of times to
bootstrap the median to determine it's 95% confidence intervals.
Values between 1000 and 10000 are recommended.
*usermedians* : [ default None ]
An array or sequence whose first dimension (or length) is
compatible with *x*. This overrides the medians computed by
matplotlib for each element of *usermedians* that is not None.
When an element of *usermedians* == None, the median will be
computed directly as normal.
*conf_intervals* : [ default None ]
Array or sequence whose first dimension (or length) is compatible
with *x* and whose second dimension is 2. When the current element
of *conf_intervals* is not None, the notch locations computed by
matplotlib are overridden (assuming notch is True). When an
element of *conf_intervals* is None, boxplot compute notches the
method specified by the other kwargs (e.g., *bootstrap*).
*positions* : [ default 1,2,...,n ]
Sets the horizontal positions of the boxes. The ticks and limits
are automatically set to match the positions.
*widths* : [ default 0.5 ]
Either a scalar or a vector and sets the width of each box. The
default is 0.5, or ``0.15*(distance between extreme positions)``
if that is smaller.
*patch_artist* : [ False (default) | True ]
If False produces boxes with the Line2D artist
If True produces boxes with the Patch artist
Returns a dictionary mapping each component of the boxplot
to a list of the :class:`matplotlib.lines.Line2D`
instances created. That dictionary has the following keys
(assuming vertical boxplots):
- boxes: the main body of the boxplot showing the quartiles
and the median's confidence intervals if enabled.
- medians: horizonal lines at the median of each box.
- whiskers: the vertical lines extending to the most extreme,
n-outlier data points.
- caps: the horizontal lines at the ends of the whiskers.
- fliers: points representing data that extend beyone the
whiskers (outliers).
**Example:**
.. plot:: pyplots/boxplot_demo.py
Additional kwargs: hold = [True|False] overrides default hold state
In [105]:
plt.boxplot(data[:,0:35], notch=True, sym='bd', vert=False)
Out[105]:
{'boxes': [<matplotlib.lines.Line2D at 0x10e156490>,
<matplotlib.lines.Line2D at 0x10e1786d0>,
<matplotlib.lines.Line2D at 0x10e19a910>],
'caps': [<matplotlib.lines.Line2D at 0x10e1497d0>,
<matplotlib.lines.Line2D at 0x10e149e10>,
<matplotlib.lines.Line2D at 0x10e16ca10>,
<matplotlib.lines.Line2D at 0x10e178090>,
<matplotlib.lines.Line2D at 0x10e191c50>,
<matplotlib.lines.Line2D at 0x10e19a2d0>],
'fliers': [<matplotlib.lines.Line2D at 0x10e162150>,
<matplotlib.lines.Line2D at 0x10e162750>,
<matplotlib.lines.Line2D at 0x10e184390>,
<matplotlib.lines.Line2D at 0x10e184990>,
<matplotlib.lines.Line2D at 0x10e1a65d0>,
<matplotlib.lines.Line2D at 0x10e1a6bd0>],
'medians': [<matplotlib.lines.Line2D at 0x10e156ad0>,
<matplotlib.lines.Line2D at 0x10e178d10>,
<matplotlib.lines.Line2D at 0x10e19af50>],
'whiskers': [<matplotlib.lines.Line2D at 0x10e13ee50>,
<matplotlib.lines.Line2D at 0x10e149110>,
<matplotlib.lines.Line2D at 0x10e16c150>,
<matplotlib.lines.Line2D at 0x10e16c3d0>,
<matplotlib.lines.Line2D at 0x10e191390>,
<matplotlib.lines.Line2D at 0x10e191610>]}
In [98]:
print list(a:b)
File "<ipython-input-98-d204121a220f>", line 1
list([1:3])
^
SyntaxError: invalid syntax
In [ ]:
Content source: keylime1/courses_12-752
Similar notebooks: