This is an attempt to complete the tasks laid out on Assignment #2 from this class in 2014.
We begin by importing all of the libraries that are necessary, and setting up the plotting environment:
In [4]:
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
%matplotlib inline
Then, for this first task, we import the csv file into variable called data. We leverage a new lambda function that will allow the importer to convert the timestamp strings into datetime objects:
In [101]:
anewdate = '2014/11/10 17:34:28'
dateConverter = lambda d : dt.datetime.strptime(d,'%Y/%m/%d %H:%M:%S')
data = np.genfromtxt('../../../data/campusDemand.csv',delimiter=",",names=True,dtype=('a255',type(dt),float,),converters={1: dateConverter})
In [42]:
data[0]
Out[42]:
In [53]:
data['Point_name']
Out[53]:
To make sure that the import succeeded, we print the contents of the variable. Also, because we wan't to make sure the full meter names appear in the printed output, we modify Numpy's printoptions by using the method np.set_printoptions:
In [58]:
np.set_printoptions(threshold=8) # make sure all the power meter names will be printed
To find the unique number of point names, we use the unique function from Numpy, and apply it to the 'Point_name' column in data:
In [59]:
pointNames = np.unique(data['Point_name'])
print "There are {} unique meters.".format(pointNames.shape[0])
We now print the contents of the pointNames array:
In [60]:
print pointNames
In [92]:
#extractedData = np.extract(data['Point_name']==pointNames[6],data)
plt.plot(data['Time'][np.where(data['Point_name']==pointNames[0])],'rd')
Out[92]:
To count the numer of samples present on each power meter, there are many ways to achieve it. For instance, we can use an iterator to loop over all pointNames and create a list of tuples in the process (this is formally called a List Comprehension). Every tuple will then contain two elements: the meter name, and the number of samples in it:
In [ ]:
First, we can use another List Comprehension to iterate over the point names and create a new list whose elements are in turn tuples with the indeces for the samples corresponding to this meter:
In [87]:
idx = [np.where(data['Point_name']==meter) for meter in pointNames]
print "idx is now a {0:s} of {1:d} items.".format(type(idx),len(idx))
print "Each item in idx is of {0:s}.".format(type(idx[0]))
[(meter,(data[idxItem]['Time'][-1]-data[idxItem]['Time'][0]).days) for meter,idxItem in zip(pointNames,idx)]
Out[87]:
And then use yet another list comprehension to calculate the differences between the first and last timestamp:
In [76]:
help(zip)
For this task, we are going to directly take the difference between any two consecutive datetime objects and display the result in terms of, say, number of seconds elapsed between these timestamps.
Before we do this, though, it is useful to plot the timestamps to figure out if there are discontinuities that we can visually see:
In [ ]:
fig = plt.figure(figsize=(20,30)) # A 20 inch x 20 inch figure box
### What else?
As you may have seen, gaps were easily identifiable as discontinuities in the lines that were plotted. If no gaps existed, the plot would be a straight line.
But now let's get back to solving this using exact numbers...
First, you need to know that applying the difference operator (-) on two datetime objects results in a timedelta object. These objects (timedelta) describe time differences in terms of number of days, seconds and microseconds (see the link above for more details). Because of this, we can quickly convert any timedelta object (say dt) into the number of seconds by doing:
dt.days*3600*24+dt.seconds+dt.microseconds/1000000In this case, however, our timestamps do not contain information about the microseconds, so we will skip that part of the converstion.
Using this knowledge, we can create a list of lists (a nested list) in a similar manner as we've done before (i.e. using list comprehensions), and in it store the timedeltas in seconds for each meter. In other words, the outer list is a list of the same length as pointNames, and each element is a list of timedeltas for the corresponding meter.
One more thing comes in handy for this task: the np.diff function, which takes an array (or a list) and returns the difference between any two consecutive items of the list.
Now, in a single line of code we can get the nested list we talked about:
In [ ]:
delta_t =
Now we need to be able to print out the exact times during which there are gaps. We will define gaps to be any timedelta that is longer than the median timedelta for a meter.
We will achieve this as follows:
In [ ]:
np.set_printoptions(threshold=np.nan)
First, we will define a new variable containing the weekday for each of the timestamps.
In [102]:
wd = lambda d : d.weekday()
weekDays = np.array(map(wd,data['Time']))
Monday = data[np.where(weekDays==0)]
Tuesday = data[np.where(weekDays==1)]
Wednesday = data[np.where(weekDays==2)]
Thursday = data[np.where(weekDays==3)]
Friday = data[np.where(weekDays==4)]
Saturday = data[np.where(weekDays==5)]
Sunday = data[np.where(weekDays==6)]
Then we can do logical indexing to segment the data:
In [107]:
plt.plot(Sunday['Time'][np.where(Sunday['Point_name']==pointNames[0])],Sunday['Value'][np.where(Sunday['Point_name']==pointNames[0])],'rd')
Out[107]:
In this task we basically use two for loops and a the subplot functionality of PyPlot to do visualize the data contained in the variables we declared above.
The main trick is that we need to create a time index that only contains information about the hours, minutes and seconds (i.e. it completely disregards the exact day of the measurement) so that all of the measurements can be displayed within a single 24-hour period.
In [ ]:
Days = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
fig = plt.figure(figsize=(20,20))
for i in range(len(pointNames)): # iterate over meters
for j in range(7): # iterate over days of the week
plt.subplot(7,7,i*7+j+1)
# Data from the day being plotted = All[j]
# Data from the meter being plotted = All[j][All[j]['Point_name']==pointNames[i]]
time = np.array([t.hour*3600+t.minute*60+t.second for t in All[j][All[j]['Point_name']==pointNames[i]]['Time']])
# plot the power vs the hours in a day
plt.plot(time/3600.,All[j][All[j]['Point_name']==pointNames[i]]['Value'],'.')
if i==6:
plt.xlabel('hours in a day')
if j==0:
plt.ylabel(pointNames[i].split('-')[0]+'\n'+pointNames[i].split('-')[1])
if i==0:
plt.title(Days[j])
fig.tight_layout()
plt.show()
Serveral findings: (more to be added)
In [ ]: