Assignment #2 from 2014 - ReDo

This is an attempt to complete the tasks laid out on Assignment #2 from this class in 2014.

We begin by importing all of the libraries that are necessary, and setting up the plotting environment:


In [4]:
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt

%matplotlib inline

Task #1

Then, for this first task, we import the csv file into variable called data. We leverage a new lambda function that will allow the importer to convert the timestamp strings into datetime objects:


In [101]:
anewdate = '2014/11/10 17:34:28'

dateConverter = lambda d : dt.datetime.strptime(d,'%Y/%m/%d %H:%M:%S')

data = np.genfromtxt('../../../data/campusDemand.csv',delimiter=",",names=True,dtype=('a255',type(dt),float,),converters={1: dateConverter})

In [42]:
data[0]


Out[42]:
('Porter Hall Electric (Shark 30) - Watts', datetime.datetime(2014, 9, 10, 0, 0, 50), 80635.421875)

In [53]:
data['Point_name']


Out[53]:
array(['Porter Hall Electric (Shark 30) - Watts',
       'Porter Hall Electric (Shark 30) - Watts',
       'Porter Hall Electric (Shark 30) - Watts', ...,
       'Baker Hall Electric (Shark 29) - Demand Watts ',
       'Baker Hall Electric (Shark 29) - Demand Watts ',
       'Baker Hall Electric (Shark 29) - Demand Watts '], 
      dtype='|S255')

To make sure that the import succeeded, we print the contents of the variable. Also, because we wan't to make sure the full meter names appear in the printed output, we modify Numpy's printoptions by using the method np.set_printoptions:


In [58]:
np.set_printoptions(threshold=8) # make sure all the power meter names will be printed

Task #2

To find the unique number of point names, we use the unique function from Numpy, and apply it to the 'Point_name' column in data:


In [59]:
pointNames = np.unique(data['Point_name'])
print "There are {} unique meters.".format(pointNames.shape[0])


There are 7 unique meters.

Task #3

We now print the contents of the pointNames array:


In [60]:
print pointNames


['Baker Hall Electric (Shark 29) - Demand Watts '
 'Baker Hall Electric (Shark 29) - Watts'
 'Doherty Apts Electric (Shark 11) - Demand Watts'
 'Electric kW Calculations - Main Campus kW'
 'Porter Hall Electric (Shark 30) - Watts'
 'Scaife Hall Electric (Shark 21) - Watts'
 'University Center Electric (Shark 34) - Watts']

In [92]:
#extractedData = np.extract(data['Point_name']==pointNames[6],data)
plt.plot(data['Time'][np.where(data['Point_name']==pointNames[0])],'rd')


Out[92]:
[<matplotlib.lines.Line2D at 0x119ca6f10>]

Task #4

To count the numer of samples present on each power meter, there are many ways to achieve it. For instance, we can use an iterator to loop over all pointNames and create a list of tuples in the process (this is formally called a List Comprehension). Every tuple will then contain two elements: the meter name, and the number of samples in it:


In [ ]:

Task #5

First, we can use another List Comprehension to iterate over the point names and create a new list whose elements are in turn tuples with the indeces for the samples corresponding to this meter:


In [87]:
idx = [np.where(data['Point_name']==meter) for meter in pointNames]

print "idx is now a {0:s} of {1:d} items.".format(type(idx),len(idx))
print "Each item in idx is of {0:s}.".format(type(idx[0]))

[(meter,(data[idxItem]['Time'][-1]-data[idxItem]['Time'][0]).days) for meter,idxItem in zip(pointNames,idx)]


idx is now a <type 'list'> of 7 items.
Each item in idx is of <type 'tuple'>.
Out[87]:
[('Baker Hall Electric (Shark 29) - Demand Watts ', 271),
 ('Baker Hall Electric (Shark 29) - Watts', 7),
 ('Doherty Apts Electric (Shark 11) - Demand Watts', 31),
 ('Electric kW Calculations - Main Campus kW', 365),
 ('Porter Hall Electric (Shark 30) - Watts', 61),
 ('Scaife Hall Electric (Shark 21) - Watts', 31),
 ('University Center Electric (Shark 34) - Watts', 7)]

And then use yet another list comprehension to calculate the differences between the first and last timestamp:


In [76]:
help(zip)


Help on built-in function zip in module __builtin__:

zip(...)
    zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
    
    Return a list of tuples, where each tuple contains the i-th element
    from each of the argument sequences.  The returned list is truncated
    in length to the length of the shortest argument sequence.

Task #6

For this task, we are going to directly take the difference between any two consecutive datetime objects and display the result in terms of, say, number of seconds elapsed between these timestamps.

Before we do this, though, it is useful to plot the timestamps to figure out if there are discontinuities that we can visually see:


In [ ]:
fig = plt.figure(figsize=(20,30)) # A 20 inch x 20 inch figure box

### What else?

As you may have seen, gaps were easily identifiable as discontinuities in the lines that were plotted. If no gaps existed, the plot would be a straight line.

But now let's get back to solving this using exact numbers...

First, you need to know that applying the difference operator (-) on two datetime objects results in a timedelta object. These objects (timedelta) describe time differences in terms of number of days, seconds and microseconds (see the link above for more details). Because of this, we can quickly convert any timedelta object (say dt) into the number of seconds by doing:

dt.days*3600*24+dt.seconds+dt.microseconds/1000000
In this case, however, our timestamps do not contain information about the microseconds, so we will skip that part of the converstion.

Using this knowledge, we can create a list of lists (a nested list) in a similar manner as we've done before (i.e. using list comprehensions), and in it store the timedeltas in seconds for each meter. In other words, the outer list is a list of the same length as pointNames, and each element is a list of timedeltas for the corresponding meter.

One more thing comes in handy for this task: the np.diff function, which takes an array (or a list) and returns the difference between any two consecutive items of the list.

Now, in a single line of code we can get the nested list we talked about:


In [ ]:
delta_t =

Now we need to be able to print out the exact times during which there are gaps. We will define gaps to be any timedelta that is longer than the median timedelta for a meter.

We will achieve this as follows:

  • first we will create a for loop to iterate over every item in the list delta_t (which means we will iterate over all meters).
  • then, inside the loop, we will calculate the median value for the delta_t that corresponds to each meter
  • following this, we will find the indeces of delta_t where its value is greater than the median
  • lastly, we will iterate over all the indeces found in the previous step and print out their values

In [ ]:
np.set_printoptions(threshold=np.nan)

Task #7

First, we will define a new variable containing the weekday for each of the timestamps.


In [102]:
wd = lambda d : d.weekday()
weekDays = np.array(map(wd,data['Time']))

Monday = data[np.where(weekDays==0)]
Tuesday = data[np.where(weekDays==1)]
Wednesday = data[np.where(weekDays==2)]
Thursday = data[np.where(weekDays==3)]
Friday = data[np.where(weekDays==4)]
Saturday = data[np.where(weekDays==5)]
Sunday = data[np.where(weekDays==6)]

Then we can do logical indexing to segment the data:


In [107]:
plt.plot(Sunday['Time'][np.where(Sunday['Point_name']==pointNames[0])],Sunday['Value'][np.where(Sunday['Point_name']==pointNames[0])],'rd')


Out[107]:
[<matplotlib.lines.Line2D at 0x12e189690>]

Task #8

In this task we basically use two for loops and a the subplot functionality of PyPlot to do visualize the data contained in the variables we declared above.

The main trick is that we need to create a time index that only contains information about the hours, minutes and seconds (i.e. it completely disregards the exact day of the measurement) so that all of the measurements can be displayed within a single 24-hour period.


In [ ]:
Days = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']

fig = plt.figure(figsize=(20,20))
for i in range(len(pointNames)): # iterate over meters
    for j in range(7): # iterate over days of the week
        plt.subplot(7,7,i*7+j+1)
        # Data from the day being plotted = All[j]
        # Data from the meter being plotted = All[j][All[j]['Point_name']==pointNames[i]]
        time = np.array([t.hour*3600+t.minute*60+t.second for t in All[j][All[j]['Point_name']==pointNames[i]]['Time']])
        # plot the power vs the hours in a day
        plt.plot(time/3600.,All[j][All[j]['Point_name']==pointNames[i]]['Value'],'.')
        if i==6:
            plt.xlabel('hours in a day')
        if j==0: 
            plt.ylabel(pointNames[i].split('-')[0]+'\n'+pointNames[i].split('-')[1])
        if i==0:
            plt.title(Days[j])
fig.tight_layout()
plt.show()

Task #9

Serveral findings: (more to be added)

  • Campus consume more energy during weekdays than weekends.
  • Higher energy consumption during working hours.
  • Many meters report a bi-modal distribution of the measurements, possibly due to seasonal effects.
  • Some meters (e.g., Porter Hall) show more erratic behavior during weekends.

In [ ]: