JITcode 1, lesson 2

The first Just-in-Time (JIT) module for teaching computing to engineers, in context, lays the foundations for building computational skills.

Did you complete lesson 1? If so, you now know how to read a data file, make some plots to study the data, and do some basic data analysis. These are all very useful skills! You will likely soon have a lab report to do where you can apply them.

What next? There are so many choices! Let's do something creative. In this lesson, you will learn to create your own functions!

What can you do with functions that you create? Anything, really. Watch this funky video for an introducton to creating functions.



In [1]:

    
from IPython.display import YouTubeVideo
YouTubeVideo('gTwU8JPgu5E')









    Out[1]:

Context -- Turbine Placement

In the first module, we investigated global surface temperature anomalies and discovered a pretty distinct upward trend in the data. In this module, our goal is to examine wind speed data collected at a few different places and determine which of these locations is the best candidate for our wind turbine farm.

Placement criteria

There are a lot of factors that go into choosing a good turbine site, even after a bunch of wind data has been collected.

Check out the slideshow on energy.gov that details the inner workings of a standard wind turbine setup.

Some of the factors listed on the energy.gov site that we have to keep in mind include:

Average site wind speed

The higher the average wind speed, the more power we can extract.

Minimum wind speed

The turbine controller won't release the blade to turn unless the wind speed is at least around 8 mph (or around 3.6 m/s).

Maximum wind speed

Turbine blades have certain limits and if the wind is too strong or too unpredictable, it can damage the whole turbine assembly. The energy.gov site states that the turbine controller will stop the turbine from spinning if the wind speed exceeds 55 mph.

Potential issues

Our data was collected at several sites by a few different contractors and has, unfortunately, been delivered in several different unit systems. Our data files are either in meters-per-second (which we want), in miles-per-hour (which we don't want) or in knots (which we also don't want).

To get all of our data in m/s, we're going to need the appropriate conversion factors.

$$1 \rm{mph} = .447 \rm{m/s}$$$$1 \rm{knot} = .514 \rm{m/s}$$

if we want to convert things back into mph or knots, we can use

$$1 \rm{m/s} = 2.236 \rm{mph}$$$$1 \rm{m/s} = 1.943 \rm{knots}$$

Create a function

Can we create a function that will make the conversion from miles per hour to meters per second? Sure! First, decide what you will call it. How about MilesPerHourtoMetersPerSecond? That's a bit long. mph2ms? That's a little bit hard to parse -- maybe we can make it a little easier to read. Let's try mph_to_ms.

Next, we decide what the input and output of our function will be, and we use the def command to create it. The input goes in parenthesis next to the function name, and the definition line has to end with a colon:



In [2]:

    
def mph_to_ms(speed):
    return speed*.447



In [3]:

    
def knot_to_ms(speed):
    return speed*.514

Let's try these functions out. What's 65 mph in meters per second?



In [4]:

    
mph_to_ms(65)









    Out[4]:





29.055

We can also create two functions to "undo" the conversion.



In [5]:

    
def ms_to_mph(speed):
    return speed*2.236

def ms_to_knot(speed):
    return speed*2.943

If we import NumPy, we can try sending an array of values to our conversion function. Let's try it



In [6]:

    
import numpy

speedarray = numpy.array([5, 10, 15, 20, 25])
mph_to_ms(speedarray)









    Out[6]:





array([  2.235,   4.47 ,   6.705,   8.94 ,  11.175])

Ok! We put in an array of values and we got back an array of converted values.

Function Composition

Like in mathematics, we can compose functions. Let's see if we get the expected result when we compose our mph_to_ms and ms_to_mph functions.



In [7]:

    
ms_to_mph(mph_to_ms(65))









    Out[7]:





64.96698

Ok! It's not perfect but we do get out the same thing that we put in, which is what should happen when we compose two functions which are the opposites of each other.

Of course, we don't have to just compose functions which undo each other, we can convert from miles per hour to knots using the functions we already have.



In [8]:

    
ms_to_knot(mph_to_ms(65))









    Out[8]:





85.508865

That's a little bit roundabout, converting from mph to m/s, then from m/s to knots, but it works!

It probably makes more sense to just create an mph_to_knot function if we really need to perform that conversion, but this should give you a taste of how you might combine some of the tools you create moving forward!

Working with multiple files

In the first unit, we told you the name and location of the file to load, but many times engineers and scientists will be working with many different data files. You could look at the contents of a folder on your computer and then manually type in the names of every data file you find, but why not let Python do some of the lifting for us?

There's a directory with all of our wind data and we want to use Python to get a list of the files available to us. To do this, we're going to import a library called glob that's purpose built for just this task.



In [9]:

    
import glob

Now the data files are all CSV (Comma separated values) files, so we're going to ask glob for a list of all CSV files in the /resources/winddata/ directory. Note in the cell below that we start the directory path with a single .

That's terminal shorthand for the current directory.

And we search for all files that match *.csv where the * is a wildcard symbol that will match anything, so the code below returns any file that ends in .csv



In [10]:

    
filenames = glob.glob('./resources/winddata/*.csv')
print filenames









    



['./resources/winddata/site2mph.csv', './resources/winddata/site1ms.csv', './resources/winddata/site3knot.csv']

Hmm, that's a little bit annoying. For some reason the site 2 file is first on our list of files. Let's sort that list to make it more intuitive.



In [11]:

    
filenames = sorted(filenames)
print filenames









    



['./resources/winddata/site1ms.csv', './resources/winddata/site2mph.csv', './resources/winddata/site3knot.csv']

If we want to use a specific file, we can access it's filename the same way we look at NumPy arrays, by using square brackets to pick out the file we want. So if we want to deal with site 2, we can type:



In [12]:

    
filenames[1]









    Out[12]:





'./resources/winddata/site2mph.csv'

and that's the location of our site 2 data file, ready to be opened in Python.

If statements

Now that we have these three files to examine, we can use if statements to help automate our data analysis. We're going to create a new function called checkandconvert but instead of just performing one arithmetic operation, it's going to check which units the data file uses and then perform the appropriate conversion.

The syntax for an if statement is pretty straight forward. We decide on a condition and if that condition is true, then the indented code that follows the if statement is executed. Otherwise it gets skipped. Just like defining a function, the first line is terminated with a colon.

if condition:
    do something

Let's take a look at a simple example. From the energy.gov website, we know that a windspeed of less than 8 mph is too slow and that a windspeed greater than 55 mph is too fast.

First, let's convert those values into meters-per-second.



In [13]:

    
mph_to_ms(numpy.array([8, 55]))









    Out[13]:





array([  3.576,  24.585])



In [14]:

    
windspeed = 2

if windspeed < 3.576:
    print "Wind is too slow for the turbine"
    
if windspeed > 24.585:
    print "Shut it down!"









    



Wind is too slow for the turbine

See? Only the code under the first if statement was executed because the second condition is clearly false.



In [15]:

    
windspeed = 50

if windspeed < 3.576:
    print "Wind is too slow for the turbine"
    
if windspeed > 24.585:
    print "Shut it down!"









    



Shut it down!

And this time only the second statement executes. What about if neither of the conditions are met?



In [16]:

    
windspeed = 20

if windspeed < 3.576:
    print "Wind is too slow for the turbine"
    
if windspeed > 24.585:
    print "Shut it down!"

Nothing happens. Which is exactly what we expect to happen.

Now let's see if we can't apply an if statement to help with converting our different data files.

Since the filenames have the units of measurement in them, let's ask Python to check if the file is in miles per hour, knots or meters per second and then convert everything to meters per second.

How?

Python is very good at comparing strings and numbers. If we want to check if a certain word, phrase or random letter combination is contained within another string, it's very straightforward.



In [17]:

    
'car' in 'scare'









    Out[17]:





True

Since the string 'car' is contained within the string 'scare', the line

'car' in 'scare'

evaluates as True. Since the filenames of our data files contain the units used, we can create a function to check if the strings 'knot', 'mph' or 'ms' appear in the filenames and then perform the appropriate conversions for each unit.

Importing data

We'll use the same code we used in Lesson 1 to load our data files. Specifically, we want to use the NumPy function loadtxt. One difference between our wind data files and the temperature data files is that the wind data files have column titles, so if we try to import them as numbers then things will be a little weird. So we're going to add one more instruction to our file loading command. We're going to tell NumPy to ignore the first line of the file using the skiprows parameter.



In [18]:

    
def check_and_convert(fname):
    if 'mph' in fname:
        speeddata = numpy.loadtxt(fname,delimiter=',', skiprows=1)
        print "Converting from mph to ms"
        return mph_to_ms(speeddata)
    
    if 'knot' in fname:
        speeddata = numpy.loadtxt(fname,delimiter=',', skiprows=1)
        print "Converting from knot to ms"
        return knot_to_ms(speeddata)
    
    if 'ms' in fname:
        speeddata = numpy.loadtxt(fname,delimiter=',', skiprows=1)
        print "No conversion needed"
        return speeddata

Nothing happened...

We've created our function, but we have to send it a command to actually run it. Let's just try a quick test using one of our files. Let's try to open the site 2 data file, the location of which should be in our filenames list, under filenames[1].



In [19]:

    
check_and_convert(filenames[1])









    



Converting from mph to ms






    Out[19]:





array([[  5.0974092,  14.2927356,  17.8909068],
       [  3.5981712,   7.1963424,   8.0958852],
       [  4.5976632,   8.8954788,  10.2947676],
       ..., 
       [  7.1963424,  16.0918212,  19.6899924],
       [  5.2973076,  13.3931928,  16.991364 ],
       [  4.1978664,  10.6945644,  12.993396 ]])

Ok! We successfully loaded a file, and it detected that it had mph in the filename and so it used our mph_to_ms function. But we didn't assign the data to a variable, so let's do that now for all three of the data files.



In [20]:

    
site1 = check_and_convert(filenames[0])
site2 = check_and_convert(filenames[1])
site3 = check_and_convert(filenames[2])









    



No conversion needed
Converting from mph to ms
Converting from knot to ms

Ok it works. Now let's try comparing two sites to see which is better suited to wind turbine placement. Let's compare site 1 and site 3.

Data to work with

How big are these data sets? Let's take a look!



In [21]:

    
print site1.shape, site2.shape, site3.shape









    



(731, 3) (731, 3) (731, 3)

The data covers from January 1st, 2012 to December 31st, 2013. The three columns hold data on

AWND - Average daily wind speed
WSF2 - Fastest 2-minute wind speed
WSF5 - Fastest 5-second wind speed

Let's start by looking at the average daily wind speed. First, to plot the data, we'll need to set up matplotlib and import our plotting function, just like we did in Lesson 1.



In [22]:

    
%matplotlib inline
import matplotlib.pyplot as plt

Now let's just plot the first column of our site 1 and site 3 datasets (the column of the average daily wind speed).



In [23]:

    
plt.figure(figsize=(11,8))
plt.plot(site1[:,0]);
plt.plot(site3[:,0]);
plt.legend(["Site 1","Site 3"]);

Alright! We've plotted our average windspeed data and we can use that to compare which site is best suited for a turbine farm, but the plot is a little hard to decipher. Let's use another trick from Lesson 1, specifically, data smoothing using the numpy convolve function.



In [24]:

    
N = 30
window = numpy.ones(N)/N
site1smooth = numpy.convolve(site1[:,0],window,'same')
site3smooth = numpy.convolve(site3[:,0],window,'same')
plt.figure(figsize=(11,8))
plt.ylabel('Wind Speed (m/s)')

plt.plot(site1smooth, 'k-')
plt.plot(site3smooth, 'b-');
plt.legend(["Site 1","Site 3"]);

Ok, with our data smoothed, a quick look at the plot shows us that Site 1 appears to have the stronger winds overall and so should generate more power, but let's also check to make sure the wind there isn't too fast.

Operating conditions

Even if the average windspeed is in the "safe" range, the maximum wind speed on a given day can still exceed the maximum allowable windspeed and damage the wind turbine.

If we consider stopping and starting the turbine to be "expensive" in terms of the energy lost and the inconvenience, then a turbine that has to stop more frequently might not be worth it, even if the average wind speed is generally high.

The energy.gov site gave us a maximum windspeed of 55 mph, so let's get that into m/s and then check to make sure Site 1 is actually the best candidate.



In [25]:

    
mph_to_ms(55)









    Out[25]:





24.585

Ok, let's start by looking at our second data set, the fastest 2-minute wind speed. We can also create a 'danger line' that's equal to our maximum wind speed so we can see if the wind speed goes over our allowable maximum.



In [26]:

    
plt.figure(figsize=(11,8))
plt.plot(numpy.ones(731)*mph_to_ms(55))
plt.plot(site1[:,1])
plt.plot(site2[:,1])
plt.title('Fastest 2-minute Wind Speed')
plt.legend(['Max Wind Speed','Site1 2min', 'Site3 2min']);

Ok, so there's definitely one or two days where Site 1 will have to be shut down due to high winds, but it looks like Site 3 also crosses the line, too. Given that Site 1 has a faster average wind speed, that seems like it's probably still our best bet.

How about for the 5 second bursts? Do you think those will also cause trouble if it's only for 5 seconds? If you invested millions of dollars in a wind farm, would you want to risk damaging your equipment? We should probably take a look, then.



In [27]:

    
plt.figure(figsize=(11,8))
plt.plot(numpy.ones(731)*mph_to_ms(55))
plt.plot(site1[:,2])
plt.plot(site2[:,2])
plt.title('Fastest 5-second Wind Speed')
plt.legend(['Max Wind Speed','Site1 5sec', 'Site3 5sec']);

Wow! Despite the high peaks at site 1, it looks like site 3 has more problems with the 5 second maximum wind speed measurements. Between these two sites, which do you think is the better candidate?

What about site 2? How much work would it be to include site 2 in these calculations?

Dig Deeper

If the turbines have to be stopped whenever the wind speed is greater than 55 mph, you might want to stop them if the wind speed was getting close to that level, just to be on the safe side.

Can you create a function to move the "danger line" based on what percentage of the maximum speed is acceptable? (e.g. 95% of 55mph, 90% of 55mph, etc...)

What about winds that aren't fast enough? We don't really have measurements for "slowest" wind speed, but if the average wind speed for a day is close to the minimum speed that the turbine can use, is it worth starting up the turbine? (Keep in mind that in low winds, turbines can be kept spinning by their own motors, but this uses electricity)

More concrete decisions

We eyeballed which sites were exceeding our maximum wind speed values, but how could we get a detailed count of the days when any of our three wind speed columns exceeded the maximum?

Here are two little Python snippets that might help you on your way to answering this. See if you can come up with a function that can count the number of days that the turbine might have to shut down.



In [28]:

    
myspeed = numpy.array([1, 2, 3, 4, 5])
myspeed > 2









    Out[28]:





array([False, False,  True,  True,  True], dtype=bool)



In [29]:

    
sum([False, True, True])









    Out[29]:





2

Please ignore the code below. It just loads a style.



In [30]:

    
from IPython.core.display import HTML
def css_styling():
    styles = open("../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()









    Out[30]: