Spartanburg Breakout Data Set

Analysis with Machine-Learning, SK-Learn, and Python

Data Preparation

Our data is in a file called "breakout_data.csv" We'll use python to grab it and process it into a usable numpy array


In [1]:
import numpy as np  # get numpy package
data = np.genfromtxt(fname='breakout_data.csv', # data filename
              dtype=None,  # figure out the data type by column
              delimiter=',',  # delimit on commas
              names=True,  # first line contains column namesh
             )

Now our data is in a nice numpy ndarray. We can access it using the numpy methods. For example:

We can print the headers and the number of columns...


In [7]:
column_headers = data.dtype.names
print(column_headers)  # print the column headers
print('Number of columns: {}'.format(len(column_headers)))


('DateTime', 'SPT_003_CHP_FI214_PV', 'SPT_003_CHP_LIC213_PV', 'SPT_003_CHP_MI201_PV', 'SPT_003_CHP_SIC101_PV', 'SPT_003_CHP_TI202_PV', 'SPT_003_CHP_TI203_PV', 'SPT_003_CHP_TI204_PV', 'SPT_003_CHP_TI206_PV', 'SPT_003_CHP_TI212_PV', 'SPT_003_CHP_TIC208_PV', 'SPT_003_CHP_TIC210_PV', 'SPT_003_CHP_WI205_PV', 'SPT_003_EXT_PI202_PV', 'SPT_003_EXT_PIC101_PV', 'SPT_003_EXT_TI201_PV', 'SPT_003_EXT_TIC203_PV', 'SPT_003_EXT_TIC204_PV', 'SPT_003_EXT_TIC205_PV', 'SPT_003_EXT_TIC206_PV', 'SPT_003_EXT_TIC207_PV', 'SPT_003_FIB_FI201_PV', 'SPT_003_FIB_MIC203_PV', 'SPT_003_FIB_PIC202_PV', 'SPT_003_FIB_TIC204_PV', 'SPT_003_FIB_TIC205_PV', 'SPT_003_SPB_PI101_PV', 'SPT_003_SPB_PI201_PV', 'SPT_003_SPB_PI203_PV', 'SPT_003_SPB_PI205_PV', 'SPT_003_SPB_PI226_PV', 'SPT_003_SPB_PI227_PV', 'SPT_003_SPB_PI229_PV', 'SPT_003_SPB_PI230_PV', 'SPT_003_SPB_PI231_PV', 'SPT_003_SPB_PI232_PV', 'SPT_003_SPB_PIC217_PV', 'SPT_003_SPB_PIC243_PV', 'SPT_003_SPB_TI202_PV', 'SPT_003_SPB_TI204_PV', 'SPT_003_SPB_TI206_PV', 'SPT_003_SPB_TI207_PV', 'SPT_003_SPB_TI208_PV', 'SPT_003_SPB_TI209_PV', 'SPT_003_SPB_TI210_PV', 'SPT_003_SPB_TI211_PV', 'SPT_003_SPB_TI212_PV', 'SPT_003_SPB_TI213_PV', 'SPT_003_SPB_TI214_PV', 'SPT_003_SPB_TI215_PV', 'SPT_003_SPB_TI216_PV', 'SPT_003_SPB_TIC233_PV', 'SPT_003_SPB_TIC234_PV', 'SPT_003_SPB_TIC235_PV', 'SPT_003_SPB_TIC236_PV', 'SPT_003_SPB_TIC237_PV', 'SPT_003_SPB_TIC238_PV', 'SPT_003_SPB_TIC239_PV', 'SPT_003_UTIL_LI215_PV', 'SPT_003_UTIL_LI216_PV', 'SPT_003_UTIL_LI221_PV', 'SPT_003_UTIL_TIC218_PV', 'Breakout')
Number of columns: 63

We can also print specific rows of data...


In [3]:
print('The first row of data is: \n{}'.format(data[0]))  # print the first row
print('\n')  # print a blank line
print('and the last row of data is: \n{}'.format(data[len(data)-1]))  # print the last row


The first row of data is: 
(b'9/16/14 7:00', b'531.3883057', b'61.74238586', b'-16.28650665', b'0.389999986', b'24.20000076', b'40.12931442', b'40.12931442', b'27.70000076', b'143.9000092', b'154.5', b'71.59999847', b'8.891676903', b'98.31745148', b'61.57915115', b'869.9000244', b'320.2000122', b'317.3000183', b'314.5', b'312.5', b'303.8000183', b'198.2268677', b'85.19852448', b'18.00164795', b'75.59999847', b'65', b'61.2305069', b'40.75392914', b'35.78575897', b'36.70095062', b'92.82631683', b'101.7447433', b'107.6747971', b'106.4234161', b'125.5615082', b'83.99194336', b'0', b'2.799654722', b'-210', b'-210', b'-20.39999962', b'276.6000061', b'283.5', b'283.2000122', b'282.7999878', b'282.3000183', b'283.3999939', b'283.3000183', b'282.8000183', b'282.3000183', b'229.1999969', b'284.7000122', b'285.3999939', b'285.1000061', b'285.5', b'284.6000061', b'284.3999939', b'0', b'98.63887787', b'37.45536804', b'0', b'286.9850159', 0)


and the last row of data is: 
(b'2/23/15 9:47', b'566.2526245', b'60.72130966', b'-13.29569244', b'0.50999999', b'8.300000191', b'40.12931442', b'40.12931442', b'23.5', b'142.6000061', b'159.1999969', b'60.70000076', b'9.652906418', b'108.1790848', b'61.16824722', b'869.9000244', b'323.2000122', b'319.3000183', b'316.6000061', b'315.5', b'306.3000183', b'193.7589722', b'84.29212189', b'18.98861694', b'72', b'64.09999847', b'60.86318588', b'31.44016838', b'32.7413559', b'-4.64442873', b'155.7502289', b'88.61457825', b'142.5733795', b'126.7910385', b'116.7868118', b'122.0469971', b'0', b'2.703630447', b'-210', b'-210', b'-210', b'275.3000183', b'285.8999939', b'285.6000061', b'285.5999756', b'284.8000183', b'285.8000183', b'285.8000183', b'286.1000061', b'285.7000122', b'231.5', b'285.1000061', b'284.2000122', b'284.8999939', b'284.7000122', b'284.8000183', b'285.1000061', b'0', b'98.98678589', b'31.73314667', b'0', b'289.3500061', 0)