12-752: Data-Driven Building Energy Management

Fall 2016, Carnegie Mellon University

Assignment #1

Task 1 [0%]: Making sure everything is installed correctly

Please double click the code cell below and click 'Cell' -> 'Run Cells' in the drop-down bar above.

You should get an output similar to this (the version numbers should be the same):

Python version:
3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Numpy version:
1.11.1
Sklearn version:
0.17.1

In [1]:
%matplotlib inline 
import numpy as np
import matplotlib.pyplot as plt

import sys
print('Python version:')
print(sys.version)
print('Numpy version:')
print(np.__version__)
import sklearn
print('Sklearn version:')
print(sklearn.__version__)


Python version:
3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Numpy version:
1.11.1
Sklearn version:
0.17.1

Short Introduction to Python and Jupyter

Jupyter notebooks consist of cells. This cell is a Markdown cell. Try double-clicking this cell. You can write pretty text and even Latex is supported!

A short Latex example: $\sum_i x_i = 42$

Task #2 [10%]: Quick exercise

Create a new cell under this cell. Change the cell type to 'Markdown'. Make the cell display Euler's formula (https://en.wikipedia.org/wiki/Euler%27s_formula) in Latex. Do not forget to run the cell you just created.


In [ ]:
#This is a code cell
#Jupyter allows you to run code within the browser

#try running this cell
x = 5+15+1000
print(x)

Task #3 [10%]: Another quick exercise

Create a cell under this. Make sure the type of the cell is 'Code'. Compute the sum of 1 to 20 and print it to the console. Hint: np.sum and np.arange will be your friends.

Plotting

Jupyer notebooks allow you to interactively explore data. Let's plot a sine-wave.


In [2]:
x = np.arange(0,2*np.pi,2*np.pi/80.0)
y = np.sin(x)
plt.plot(x,y)


Out[2]:
[<matplotlib.lines.Line2D at 0x10e5b5470>]

Task #4 [10%]: Plotting exercise

Write numpy code that plots two periods of a cosine-wave in the cell below.


In [ ]:
#Your code goes here

Data structures

The two most widely used data structures in Python are lists and dictionaries.

Lists

Here are some simple examples how to use lists. If you want to learn more about Python lists, check out https://www.tutorialspoint.com/python/python_lists.htm

Adding data to lists:


In [3]:
l = [] #creating an empty list
print('Empty list:')
print(l)

l.append(5) #appending 5 to the end of the list
print('List containing 5:')
print(l)

l = [1,2,3,'hello','world'] #creating a list containing 5 items
print('List with items:')
print(l)

l.extend([4,5,6]) #appending elements from another list to l
print('List with more items:')
print(l)


Empty list:
[]
List containing 5:
[5]
List with items:
[1, 2, 3, 'hello', 'world']
List with more items:
[1, 2, 3, 'hello', 'world', 4, 5, 6]

Accessing data in list:


In [4]:
print('Printing third element in list:')
print(l[3]) #counting starts at 0

print('Printing all elements up until third element in list:')
print(l[:3])

print('Print the last 3 elements in list:')
print(l[-3:])


Printing third element in list:
hello
Printing all elements up until third element in list:
[1, 2, 3]
Print the last 3 elements in list:
[4, 5, 6]

Dictionaries

Dictionaries are key-value pairs. We will give some short examples on how to used dictionaries. For a more thorough introduction, see https://www.tutorialspoint.com/python/python_dictionary.htm


In [5]:
d = {} #creating empty dictionary
print('Empty dictionary:')
print(d)

d['author'] = 'Shakespeare' #adding an item to the dictionary
print('Dictionary with one element')
print(d)

#adding more items:
d['year'] = 1596
d['title'] = 'The merchant of Venice'

#Accessing items in dictionary:
print_string = d['title'] + ' was written by ' + d['author'] + ' in the year ' + str(d['year'])
print(print_string)


Empty dictionary:
{}
Dictionary with one element
{'author': 'Shakespeare'}
The merchant of Venice was written by Shakespeare in the year 1596

Loops

A couple of example on how to use loops. For more info see https://www.tutorialspoint.com/python/python_for_loop.htm or Google.


In [7]:
list_of_numbers = [1.,2.,3.,4.,5.,4.,3.,2.,1.]

incremented_list_of_numbers = []
for i in range(len(list_of_numbers)):
    number = list_of_numbers[i]
    incremented_list_of_numbers.append(number+1)
print('Incremented list:')
print(incremented_list_of_numbers)

#More elegantly
incremented_list_of_numbers2 = []
for number in list_of_numbers:
    incremented_list_of_numbers2.append(number+1)
print('Second incremented list:')
print(incremented_list_of_numbers2)

#We can express the for-loop above also so-called in-line:
#Most elegantly
incremented_list_of_numbers3 = [number + 1 for number in list_of_numbers]
print('Third incremented list:')
print(incremented_list_of_numbers3)

#looping over dictionaries
for key in d:
    value = d[key]
    print(key,value)


Incremented list:
[2.0, 3.0, 4.0, 5.0, 6.0, 5.0, 4.0, 3.0, 2.0]
Second incremented list:
[2.0, 3.0, 4.0, 5.0, 6.0, 5.0, 4.0, 3.0, 2.0]
Third incremented list:
[2.0, 3.0, 4.0, 5.0, 6.0, 5.0, 4.0, 3.0, 2.0]
year 1596
title The merchant of Venice
author Shakespeare

Quick exercises:

In the cell below, complete the following tasks:

  • Task #4 [5%]: Using a for-loop and len(), compute the mean of $list\_of\_numbers$
  • Task #5 [10%]: Using an in-line for-loop, create a list that contains each number squared
  • Task #6 [5%]: Using an in-line for-loop, create a list containing all keys of $d$
  • Task #7 [10%]:using an in-line for-loop, create a list containing all values of $d$

In [ ]:
# your code goes here

Loading data

Since you will be dealing with data, you need to know how to read and parse data. Numpy can automatically parse some csv files. Let's assume however, that we need to parse a file that numpy cannot parse out-of-the-box.


In [8]:
f = open('testdata.txt')
parsed_lines = []
for line in f:
    l = line.split(',') #create a list by splitting the string line at every ','
    l = [float(x) for x in l] #in-line for-loop that casts strings to floats
    parsed_lines.append(l)
    
plt.plot(parsed_lines[0])


Out[8]:
[<matplotlib.lines.Line2D at 0x10eb7d5c0>]

In [9]:
plt.imshow(np.array(parsed_lines).T)


Out[9]:
<matplotlib.image.AxesImage at 0x10eca6898>

Numpy arrays

Numpy array is a data structure very suitable to store matrices. A list of list (like parsed_lines) can be converted to a numpy array by:


In [10]:
data_matrix = np.array(parsed_lines)
print(data_matrix[:12,:10]) #print the 10 first columns of the 12 first rows


[[ 4.36559707  4.88034702  5.169525    5.26668172  5.38248543  5.3576269
   5.72086446  6.09339502  5.93644615  6.07683626]
 [ 4.36799013  4.88636065  5.14163203  5.25218793  5.36726373  5.35659734
   5.7017407   6.08372304  5.95090016  6.06070116]
 [ 4.32805413  4.86963551  5.1210082   5.2658921   5.34133794  5.35772862
   5.70563373  6.04926641  5.9607355   6.05007442]
 [ 4.3284308   4.81398112  5.10502867  5.21498186  5.32893938  5.3285473
   5.64840264  6.04807406  5.90623961  6.05525157]
 [ 4.26442105  4.7935739   5.07532694  5.19326554  5.31396529  5.26360997
   5.62107491  5.97614089  5.85022783  5.97167786]
 [ 4.27164542  4.80519772  5.05345579  5.21941057  5.29634089  5.31713707
   5.6134232   5.99325825  5.86648859  5.9367593 ]
 [ 4.22088235  4.78683913  5.07930488  5.15733008  5.32425634  5.21638357
   5.64916784  5.97341979  5.83253904  6.01113749]
 [ 4.32942662  4.77238746  5.06954109  5.17771221  5.29508506  5.26575149
   5.58287255  6.04661992  5.83316824  6.01204404]
 [ 4.32199552  4.79374654  5.03198176  5.18673157  5.25968982  5.30443263
   5.54423131  6.01265025  5.85381385  5.93669634]
 [ 4.30304392  4.83480927  5.03384962  5.19752434  5.28253246  5.31804991
   5.58918799  5.97903744  5.86899296  5.94286336]
 [ 4.36323349  4.80872625  5.08384078  5.22878608  5.33359282  5.30104261
   5.54137447  6.05043411  5.78706942  5.99287622]
 [ 4.34130987  4.7695852   5.09186478  5.20758836  5.35904939  5.27004736
   5.55376734  6.0913206   5.74918738  6.03926767]]

In [11]:
plt.plot(data_matrix[0,:] - data_matrix[-1,:]) #plots the difference between the first and last row


Out[11]:
[<matplotlib.lines.Line2D at 0x10ee0a6a0>]

In [34]:
print(data_matrix.shape) #shows the dimensions of the data_matrix, 200 rows, 80 columns


(200, 80)

Quick exercises:

  • Task #8 [10%]: Plot the 25th row of data_matrix in the cell below. (The rows are very similar to each other)
  • Task #9 [10%]: Plot the 25th column of data_matrix in the cell below. Note: the columns don't have as much structure as the rows so don't expect a pretty plot)

Broadcasting over axis


In [12]:
plt.plot(np.mean(data_matrix, axis=0)) #mean row


Out[12]:
[<matplotlib.lines.Line2D at 0x10ee724a8>]

In [13]:
plt.plot(np.mean(data_matrix, axis=1)) #mean column


Out[13]:
[<matplotlib.lines.Line2D at 0x10eed6b70>]

Quick exercises:

  • Task #10 [10%]: Plot the maximum value (np.max) over columns
  • Task #11 [10%]: Plot the minimum value (np.min) over rows

In [2]:
# your code goes here

DON'T FORGET: Save the Jupyter notebook as a Notebook (.ipynb) and submit it via Blackboard.


In [ ]: