Understand: Do your pseudo-code and comments show evidence that you recall and understand technical concepts?
Apply: Are you able to execute code (using the supplied examples) that performs the required functionality on supplied or generated data sets?
By the end of this notebook you will be expected to:
- Identify basic concepts of programming;
- Understand Python syntax;
- Distinguish between the different native data types used in Python;
- Use flow control structures to implement conditional and repetitive logic;
- Understand functions as a block of organized, reusable code that is used to perform a single, related action, as being necessary for improving the modularity of an application, as well as allowing for code reuse within and across applications; and
- Understand a Python package as a file consisting of Python code that defines functions, classes, variables, and may also include runnable code. You should be able to import a Python package or specific functions from a package.
- Exercise 1: Printing strings.
- Exercise 2: Rolling dice.
- Exercise 3: Coin flip distribution.
In the Orientation Module, you were given a number of links that provided additional Python documentation and tutorials. This section will serve as a summarized version of useful commands that are needed to get non-technical users started, and equip them with the basics required to complete this course.
This course is not intended to be a Python training course, but rather to showcase how Python can be utilized in analysis projects and, more specifically, in analyzing social data. You will be provided with links that you can explore in your own time in order to further your expertise. You are welcome to offer additional suggestions to your peers in the online forums.
You should execute the cells with sample code and write your own code in the indicated cells in this notebook. When complete, ensure that you save and checkpoint the notebook, download a copy to your local workstation, and submit a copy to the Online Campus.
You can also visit the Python Language Reference for a more complete reference manual that describes the syntax and "core semantics" of the language.
Note to new Python users:
This notebook will introduce a significant amount of new content. You do not need to be able to master all of the components, but you are urged to work through the examples below, and to attempt following the logic. In this notebook, you will be introduced to Python syntax. The second notebook in this module will start to introduce various components of data analysis basics, and the third will guide you through a basic example from beginning to end. The focus will shift from programming to subject-related examples in subsequent modules.
Python is generally defined as a high-level, general-purpose, interpreted, dynamic computer programming language.
Let’s try to break this statement down, for those new to computer programming. High-level means that we can express instructions that computers are able to execute in a language that is closer to a human language, and not the machine language that a computer requires to run the instructions. The interpreter is a program that converts the high-level programming instructions into low-level programming instructions that a computer understands. Dynamic means that they do not enforce or check type-safety at “compile-time”, in favor of deferring such checks until “run-time”. This has many advantages, including lower development costs, rapid prototyping, and flexibility required by specific domains, such as data processing and analysis. This makes Python a perfect language for this course, as it is easy to understand (the expressions used are very similar to day-to-day conversations), it has a simple syntax, and provides objects that are appropriate for data processing and analysis.
Because it’s a programming language, there are some basic computer science concepts that are important to understand before you start coding.
Variables are the backbone of any program and, therefore, the backbone of any programming language. A variable is used to store information to be referenced and manipulated in a set of instructions commonly referred to as a computer program. Variables also provide a way of labeling data with a descriptive name, in order for programs to be understood more clearly by the reader and ourselves (recall high-level from above). It is helpful to think of variables as containers that hold information. Their sole purpose is to label and store data in the computer's memory. This data can then be used throughout a program.
Variables can store various pieces of information such as the name of a person, that person's age or weight, or their date of birth. Each of these pieces of information contain different types of information. The data type of the variable containing the name of a person would be a string, while the age is stored as an integer, and the date of birth as a datetime object. Fortunately, in Python, unlike in Java or C, you do not need to tell the computer what type a variable is before you assign an object to it. This means that you can begin your program by referring to your name, say, as “X”.
>>> X = 'Mary'
In the middle of your program, you might change your mind, and reassign your “X” variable to your age.
>>> X = 21
This provides you with flexibility that you can exploit to quickly test ideas without worrying that your code will break when executed. As with anything, "with great flexibility, comes great responsibility!" It is best practice to use names that are intuitive for the data that the variable stores. The only exception is using names which are reserved by the programming language, called keywords. Keywords define the language's rules and structure, and they cannot be used as variable names. Inside Jupyter, a Python reserved name is highlighted differently to other names, as will shortly become clear.
A variable’s type also defines what you can do, and the behaviors you can expect, when you perform certain operations on it. This is illustrated later in this notebook.
When we put together a set of instructions for the computer to execute, the computer reads the instructions sequentially. This is known as the "flow of the code". However, it is often the case that an instruction in the code requires a decision to be made. For example, say you have written a program to accept interns into your organization. To comply with the labor regulations governing your organization, you ask each candidate to enter their age amongst other details. Before accepting them as an intern, your program checks if the person's age is within the prescribed limits. If not, the application of a potential intern is rejected. Such decision points that affect the flow of the code occur frequently in computer programming and are known as flow control structures. Specifically:
A control structure is a block of programming that analyzes variables and chooses a direction in which to go based on given parameters. The term flow control details the direction the program takes (which way program control “flows”). Hence it is the basic decision-making process in computing; flow control determines how a computer will respond when given certain conditions and parameters.
Control structures make the programs function properly and, without control structures, your program’s code would only flow linearly from top to bottom. Changing what action a program takes, based on a variable's value, is what makes programming useful. Below, you will be introduced to the different control structures available in Python.
A data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently. Previously, you were introduced to the concept of variables using a person's name, age, and date of birth as examples. Imagine doing that for everyone who is enrolled in the MIT Big Data and Social Analytics certificate course. That would be a lot of variables that you would struggle to keep track of. A data structure is a way to get around having to create millions of variables. Python implements many data structures that you will be using, such as lists, dicts, tuples, and sequences. Moreover, other data structures can be built on top of these for efficiently working with data. In Python, the NumPy and Pandas modules provide data structures that are intuitive and expressive enough for many of our needs in data analysis.
Each programming language is designed and implemented differently, and using it requires adherence to its syntax, which is the set of rules that define the combinations of symbols that are considered to be correctly-structured programs in that language. When these rules are followed, the programming language can understand your instructions and, therefore, you are able to create software that can be run. If you do not adhere to the rules of programming languages’ syntax, you will unfortunately run into errors and your programs will not be executed. As it turns out, the syntax of any programming language tends to be the biggest obstacle many people face when being introduced to programming for the first time, or when learning a new programming language. (Note that syntax errors are only one type of error among many that are possible.) Fortunately, Python will correctly identify syntax errors as it encounters them. This will be useful when trying to find errors in your code. Some key things to remember when using Python are highlighted below.
Case sensitivity
All variables are case-sensitive. Python treats “number” and “Number” as separate, unrelated entities.
Comments
Commented lines start with the hashtag symbol # and are ignored when interpreting the code at run-time.
Indentation
Code blocks are demarcated by their indentation. In particular, control structures end with a colon : at the decision point. Code blocks indicating action that must be taken when the decision evaluates to "True" are demarcated by indentation.
Note: Spaces and tabs do not mix
Because whitespace is significant, remember that spaces and tabs don't mix, so use only one or the other when indenting your programs. A common error is to conflate them. While they may look the same during editing, the interpreter will read them differently, and it will result in either an error or unexpected behavior. Most decent text editors can be configured to let the tab key emit spaces instead.
In summary, this section described what a variable is, how you can store information in it, and then retrieve that information at a later stage. The variable can have a name, and this name is usually informed by the kind of content you’ll be storing in the variable. So, if you’re storing your name in the variable, you would name the variable “yourName”. You would not have to give it that name, you could name the variable “WowImProgramming”, but that wouldn’t make much sense considering you are trying to store a person’s name. Finally, variables have types, and these types are used to help us organize what can and cannot be stored in the variable.
Hint: Having a type will help to open up what kinds of things we can do with the information inside the variable.
Example: If you have two integers (let’s say 50 and 32), you would be able to subtract one variable from the other (i.e., 50 – 32 = 18). But, if you had two variables that stored names (e.g., “Trevor” and “Geoff”) it wouldn’t make sense to subtract one from the other (i.e., “Trevor” – “Geoff”), because that does not mean anything.
Therefore, types are a powerful thing, and they help you make sense of what you can and cannot do with your variables.
Now that the basic concepts have been introduced, let’s make these concrete using illustrative examples. This notebook uses Jupyter's code cells (in light gray background) to demonstrate these ideas. As you learned in the Orientation Module, you can execute instructions in a code cell and the output will be displayed below the cell, where applicable.
In most cases, a comment is included in the very first line of the cell to help in following the material. A comment is any line in the set of instructions that is preceded by a hashtag symbol. Inside Jupyter, once you have used the comment indicator, the line is highlighted in a different color and the font italicized, which is a very helpful visual aid for both readers and developers. See the example below:
Note:
The code cell below will not produce any output when executed.
In [ ]:
# I am a comment and I'm ignored by the interpreter.
Python works with "values" that can take different forms depending on the data type. For example, both a person's first name and age constitute different, valid values inside Python.
In [ ]:
# A valid value of type string that can be manipulated.
"Bob"
In [ ]:
# Alternatively, you can also use single quotes to specify the value.
'Bob'
You can use the function “type()” and include the name of the value in between the parentheses to find out what type the value is. (Functions in Python are explained in more detail later in this notebook.)
In [ ]:
# Find the type of a valid value that can be manipulated.
type('Bob')
From the above, you can see that our value 'Bob'
has type str
. The value of a string data type is a sequence of characters. Strings are typically used for text processing.
You can perform concatenation and a number of other functions on strings.
In [ ]:
# String concatenation using the '+' operator.
'He' + ' ' + 'is' + ' ' + 'Bob'
In [ ]:
# How long is the string 'Bob'.
len('Bob')
In [ ]:
# Print 'Bob' 3 times with no space.
'Bob'*3
In [ ]:
# Get the 1st element of the string 'Bob'.
'Bob'[0]
In [ ]:
# Get the second element of the string 'Bob'.
'Bob'[1]
In [ ]:
# Get the third element of the string 'Bob'.
'Bob'[2]
In [ ]:
# Get the last element of the string 'Bob'.
'Bob'[-1]
In [ ]:
# Get the second last element of the string 'Bob'.
'Bob'[-2]
In [ ]:
# Slicing: Remove the last element of the string 'BobBobBob' and return the remainder of the string.
('Bob'*3)[0:-1]
Another example of a value is Bob’s age. Assuming Bob is 40 years old, the value can be specified as follows:
In [ ]:
# A valid value of type integer.
40
In [ ]:
# Type of a valid value for Bob's age.
type(40)
Note the difference in how the two values are specified: for string values the name is included between single or double quotes, whereas this is not needed for the age value. Had quotes been used in the latter, it would have expressed a different value of a different type.
In [ ]:
# A valid value but not of type integer.
'40'
Let's introduce three new ideas with this example: how to compare two values, another valid data type in Python, and the concept of casting. First, let’s compare the two values (40 and '40') to see if they are of the same type.
In [ ]:
type(40) == type('40')
What is happening here? We have used an operator (==
) that accepts one value on either side, and compares these values to see if they are the same. The values being compared in this case are not 40
and '40'
, but their corresponding value types (i.e., integer, string, etc.). (If our intention is to compare the original values we would have used 40 == '40'
)
What type is the resulting False value?
In [ ]:
# Type of value from a comparison.
type(type(40) == type('40'))
The "False" value is of the “Boolean” type, which is another fundamental data type in programming. Boolean values can only take two values: "True" or "False".
It is often the case that, instead of working with the native data type, such as the Boolean data type that you have just seen, you want to use a different but equivalent data type. For example, if you accept that a "False" value represents the lack of something in a given context, you may want to represent that as an integer with the value 0. This is achieved using the concept of casting, which allows certain data types to be cast as a different type. In Python, you cast a value by calling the type that you want to cast it to, and including the value as an argument to that call. Thus, in this case, you achieve the following.
In [ ]:
# Casting a bool to an int.
int(type(40) == type('40'))
Floating point numbers are numbers that contain floating decimal points.
In [ ]:
# Example of a floating number.
2.6
In [ ]:
# Find the type of a floating number.
type(2.6)
Remember that data types define what operations are allowed between values of the same type. However, certain data types, such as integers and floating numbers, can be used together in expressions. In this case, casting occurs without the need for the user to do it manually. The resulting value takes the type of the more encompassing data type of the involved values, as can be seen in the example below:
In [ ]:
# Adding an integer to a float results in a value of type float.
40 + 2.6
In [ ]:
type(40 + 2.6)
What happens when dividing two integer values?
In [ ]:
# Divide two integers.
11 / 40
In [ ]:
# Type of resulting value from dividing two integers.
type(11 / 40)
In [ ]:
# Example of casting integer prior to performing math operations required for Python 2.7 and earlier.
# 11/float(40)
Complex numbers are important in many scientific and engineering disciplines. Python can also represent and perform calculations with complex numbers.
In [ ]:
# Complex numbers are represented by a real + imaginary part. The imaginary part is indicated by adding a suffix j.
1.5 + 2j
In [ ]:
type((1.5 + 2j))
Our expressions, for example "type(40) == type('40')
", can quickly become unwieldy and a source of errors that are difficult to debug. To help in manipulating values elegantly (and programmatically), it is common to use variables to store the value of interest. The variable is then used to reference the value of interest. To assign a value to a variable, we use the assignment operator "=
" as in the following example:
In [ ]:
a_boolean_value = (type(40) == type('40'))
Here, the value on the right-hand side of the assignment operator “=
” has been assigned to the variable that has been cleverly named “a_boolean_value
”. You can now use the variable in other parts of your code.
Let’s introduce another function, called “print” (another keyword), which prints the value associated with your variable.
In [ ]:
print(a_boolean_value)
You can also use print for other expressions you have met before:
In [ ]:
print(11/40)
The print function allows you to perform other formatting and interesting value referencing.
In [ ]:
print('{} is {} years old and spent {} in school, that is {}% of his life.'.format('Bob',40, 11,100*11/40))
In this print statement, the str.format()
form has been used as argument, where “placeholders” or format fields indicated by {}
, are replaced by arguments specified in format argument. The arguments are used in the order used inside the parentheses, and must be equal to the number of the {}
in the str
part. You can also specify which positional argument should go where by including numbering in the format fields.
In [ ]:
print('{0} is {1} years old and spent {2} in school, that is {3}% of his life.'.format('Bob',40, 11,100*11/40))
By using numbered fields, the arguments are used depending on which number they are, with the first argument in format numbered as 0. Using print in this way is especially useful when you want to combine different data types in your print statement as illustrated above. You can also control how the formatting should be done instead of relying on Python's default behavior. For example, if you don't care about the decimal in the percentage, you can call print as follows:
In [ ]:
print('{0} is {1} years old and spent {2} in school, that is {3:.0f}% of his life.'.format('Bob',40,11,100*11/40))
Notice the value has only been rounded in the print output but not its representation in memory.
str.format()
" form discussed above. Include a whitespace character between your first and last name in the print output.
In [ ]:
# Your code here
Exercise complete:
This is a good time to "Save and Checkpoint".
Variables store information that may change over time. When you need to store longer lists of information, there are additional options available. You will be introduced to lists and tuples as basic elements, and can read more about native Python data structures in the Python documentation. During this course, you will be exposed to examples using the Pandas and NumPy libraries, which offer additional options for data analysis and manipulation.
Lists are changeable sequences of values. They are stored within square brackets, and items within a list are separated by commas. Lists can also be modified, appended, and sorted, among other methods.
In [ ]:
# Lists.
lst = ['this', 'is', 'a', 'list']
print(lst)
# Print the number of elements in the list.
print('The list contains {} elements.'.format(len(lst)))
In [ ]:
# Print the first element in the list.
print(lst[0])
# Print the third element in the list.
print(lst[2])
# Print the last element in the list.
print(lst[-1])
In [ ]:
# Appending a list.
print(lst)
lst.append('with')
lst.append('appended')
lst.append('elements')
print(lst)
# Print the number of elements in the list.
print('The updated list contains {} elements.'.format(len(lst)))
Note:
When selecting and executing the cell again, you will continue to add values to the list.
Try this: Select the cell above again and execute it to see how the input and output content changes.
This can come in handy when working with loops.
Changing a list
This course will not cover string methods in detail. You can read more about string methods in the Python documentation, if you are interested.
In [ ]:
# Changing a list.
# Note: Remember that Python starts with index 0.
lst[0] = 'THIS'
lst[3] = lst[3].upper()
print(lst)
In [ ]:
# Define a list of numbers.
numlist = [0, 10, 2, 7, 8, 5, 6, 3, 4, 1, 9]
print(numlist)
In [ ]:
# Sort and filter list.
sorted_and_filtered_numlist = sorted(i for i in numlist if i >= 5)
print(sorted_and_filtered_numlist)
In [ ]:
# Remove the last element from the list.
list.pop(sorted_and_filtered_numlist)
print(sorted_and_filtered_numlist)
Tuples are similar to lists, except that they are defined in parentheses and are unchangeable, which means that their values cannot be modified.
In [ ]:
tup = ('this', 'is', 'a', 'bigger', 'tuple')
print(tup)
In [ ]:
tup[3]
In [ ]:
# Tuples cannot be changed and will fail with an error if you try to change an element.
tup[3] = 'new'
Range (start, stop, step) is used to create lists containing arithmetic progressions. If you call a range with only one argument specified, it will use the value as the stop value and default to zero as the start value. The step argument is optional and can be a positive or negative integer.
In [ ]:
# Generate a list of 10 values.
myrange = list(range(10))
myrange
In [ ]:
# Generate a list with start value equal to one, stop value equal to ten that increments by three.
myrange2 = list(range(1, 10, 3))
myrange2
In [ ]:
# Generate a negative list.
myrange3 = list(range(0, -10, -1))
myrange3
Python uses for
and while
loops to repeat sections of statements. A for
loop runs for a set number of times, whereas a while
loop repeats a statement until a certain condition is met. The statements to be repeated must be indented for the interpreter to recognise them as such.
The example below demonstrates how to do the following:
In [ ]:
# An example of a loop that uses the "for" construct.
# You can specify the list manually or use a variable containing the list as input.
# The syntax for manual input is: `for item in [1, 2, 3]:`
for item in myrange:
print(item)
print(item * 'X')
print('End of loop (not included in the loop)')
In [ ]:
# An example of a loop that uses the "while" construct.
# Firstly, set a counter variable to check whether the end of the list has been reached.
i = 0
while i < len(myrange):
print(myrange[i])
print(myrange[i] * 'X')
i = i + 1
print('End of loop (not included in the loop)')
Conditionals are used to determine which statements are executed. In the example below, we import the "random" library and generate a random number smaller than 2. Assume 0 means heads and 1 means tails. The conditional statement is then used to print the result of the coin flip, as well as "Heads" or "Tails".
In [ ]:
# Flip a coin.
import random
coin_result = random.randrange(0,2)
print(coin_result)
if coin_result == 0:
print('Heads')
else:
print('Tails')
Code reuse is an integral component in programming, and is critical for extending basic functionality of an underlying programmming environment. In Python, modules exists that contain functions, classes, variables and other code that we can reuse in our own code. In most cases, these modules are provided under permissive free software license such as the MIT license. Assuming you know the name of the module you want to use, there are a number of ways you can tell Python that you want to access that module.
import X
This imports the module X, and creates a reference to that module in the current namespace. After running this statement, you can use X.name to refer to things defined in module X. In the above example, we import the random module, and used the
random.randrange(0,2)
to call a function calledrandrange
in that module. It is also common to use an alias forX
by importing the module asimport X as alias
. Now we usealias.name
to refer things defined in module X.
from X import *
This statement imports the module X, and creates references in the current namespace to all public objects defined by that module (that is, everything that doesn’t have a name starting with "
_
"). After you’ve run this statement, you can simply use a name to refer to things defined in module X without prependingX.
to it. Although there are cases where this is necessary, it is best to avoid this in the majority of cases as it can lead to unexpected behaviour.
from X import a, b, c
This works like the previous statement by importing the module X, but creates references in the current namespace only to the objects provided (implying you should know that these are defined in the module). Thus, you can now use
a
andb
andc
in your program.
Functions allow for code reuse within and across applications. When defining a function, you would use the “def
” statement. The desired function code is then placed into the body of the statement. A function usually takes in an argument and returns a value as determined by the code in the body of the function. Whenever referring to the function, outside of the function body itself, this action is known as a function call.
In [ ]:
# Function 'myfirstfunction' with argument 'x'.
def myfirstfunction(x):
y = x * 6
return y
You can now call the function as in the next example.
In [ ]:
# Call your function.
z = myfirstfunction(6)
print(z)
Function definitions, loops, and conditionals can be combined to produce something useful. The example below will simulate a variable number of coin flips and then produce the summary of results as output.
In [ ]:
import random
In [ ]:
def coinflip(total_number_of_flips):
'''
function 'coinflip' with argument 'total_number_of_flips' for the number of repetitions
that returns the number of 'tail' occurrences
'''
# Set all starting variables to 0.
heads = 0
tails = 0
current_number_of_flips = 0
# Start a loop that executes statements while the conditional specified results in 'True'.
while current_number_of_flips < total_number_of_flips:
# Generate a random number smaller than 2.
current_flip = random.randrange(0,2)
# Increment heads by 1 if the generated number is 0.
if current_flip == 0:
heads = heads + 1
# Increment tails by 1 if the generated number is larger than 0.
if current_flip == 1:
tails = tails + 1
# Increment the flip variable by 1.
current_number_of_flips += 1
return [heads, tails]
In the above function definition, there is a “docstring” to describe what the function does in lieu of the comment that has been used. This is considered best practice when defining functions in Python. The docstring is included just after the def
statement, and is included between a pair of three single or double quotation marks.
In the "coinflip
" function, you are simulating the number of times an unbiased coin returns heads or tails on being flipped. To track these values, initialize the counts to zero for both the heads and tails variables. Moreover, it is also important to include a variable flip that tracks the number of flips in the simulation. The simulation ends when the number of flips required, as specified in the function argument, is reached. To simulate the actual coin flip or flip event, call the function “randrange
” from the Python "random
" module, which accepts a minimum of two parameters: the start of the range from which you want to simulate, and the end of the range (stop) from which point and above you do not care about. You can see what this "randrange
" function does by calling it for the given parameters.
In [ ]:
import datetime
now = datetime.datetime.now()
random.seed(now)
for k in range(0,10):
print(random.randrange(0,2))
Every time you call "randrange
", either a 0 or 1 is returned with equal probability. Hence, it is a good approximation to flipping an unbiased coin in our example. This example also does something that is slightly complex to describe. Before entering the loop, set your random seed generator to the current time. This ensures you do not always get the same sequence, as random number generation inside the program is not truly random, but follows a deterministic algorithm.
Note:
Later modules will exploit this determinism by setting the random seed for reproducibility of expected outcomes.
Going back to the coin flip function, when you flip the coin, a 0 or 1 is obtained as a value. You check whether this is 0, and if so, you increment your count of heads by 1. If it is a 1, you increment the count of tails by 1. Recall that these two states are mutually exclusive, and only one can occur on each flip. After these checks, you also increase your count of flips tracked in “current_number_of_flips
”. You repeat this until you have reached the “total_number_of_flips
”.
Finally, the function returns the counts of both heads and tails obtained from your simulation as an array, where the first entry is number of heads. Let’s invoke the function for 100 flips.
In [ ]:
# Set the number of repetitions.
num_flips = 100
# Call the function and set the output to the variable 'tails'.
coinflip_results = coinflip(num_flips)
# Output the values returned.
print(coinflip_results)
Is this coin balanced? You would need to run more simulations for a more definitive answer. Try 10000 simulations. Since you have a function already written, you can simply reuse it, as long as you specify a different value for the argument. This is the power of writing functions for tasks that follow the same design pattern.
In [ ]:
num_flips = 10000
coinflip_results = coinflip(num_flips)
print('Heads returned: {:.0f}%'.format(100*float(coinflip_results[0])/sum(coinflip_results)))
Notice how this example has used a number of the ideas introduced in this notebook in computing the proportion of heads.
Use the "coinflip" example above and change it to simulate rolling dice. Your output should contain summary statistics for the number of times you rolled the dice, and occurrences of each of the 6 sides.
Hints:
- Replace "Heads" and "Tails" with six new variables: "Side_1", "Side_2", “Side_3”, etc.
- Replace
random.randrange(0,2)
withrandom.randrange(1,7)
- Test for whether each state of the random variable (which side comes up after a die throw), and increase the counter for the relevant variable.
- Return final states of each variable as a vector (i.e., [Side_1, Side_2, Side_3, etc.]).
In [ ]:
# Your code here.
Exercise complete:
This is a good time to "Save and Checkpoint".
The basic random function was used in the example above, but you can visit Big Data Examiner to see an example of implementing other probability distributions.
The examples below come from the Matplotlib Screenshots page.
Note:
One of the options that you will need to remember to set when you use Matplotlib is "%matplotlib inline", which instructs the notebook to plot inline instead of opening a separate window for your graph.
You can find additional information on Matplotlib through the following resources:
In [ ]:
# Import matplotlib library and set notebook plotting options.
import matplotlib.pyplot as plt
import numpy as np
# Instruct the notebook to plot inline.
%matplotlib inline
plt.rcParams['figure.figsize'] = (15, 9)
plt.rcParams['axes.titlesize'] = 'large'
# Generate data.
t = np.arange(0.0, 2.0, 0.01)
s = np.sin(2*np.pi*t)
# Create plot.
plt.plot(t, s)
# Set plot options.
plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
# Saving as file can be achieved by uncommenting the line below.
# plt.savefig("test.png")
# Display the plot in the notebook.
# The '%matplotlib inline' option set earlier ensure that the plot is displayed inline.
plt.show()
In [ ]:
"""
Simple demo with multiple subplots.
"""
import numpy as np
import matplotlib.pyplot as plt
x1 = np.linspace(0.0, 5.0)
x2 = np.linspace(0.0, 2.0)
y1 = np.cos(2 * np.pi * x1) * np.exp(-x1)
y2 = np.cos(2 * np.pi * x2)
plt.subplot(2, 1, 1)
plt.plot(x1, y1, 'yo-')
plt.title('A tale of 2 subplots')
plt.ylabel('Damped oscillation')
plt.subplot(2, 1, 2)
plt.plot(x2, y2, 'r.-')
plt.xlabel('time (s)')
plt.ylabel('Undamped')
plt.show()
In [ ]:
"""
Demo of the histogram (hist) function with a few features.
In addition to the basic histogram, this demo shows a few optional features:
* Setting the number of data bins
* The ``normed`` flag, which normalizes bin heights so that the integral of
the histogram is 1. The resulting histogram is a probability density.
* Setting the face color of the bars
* Setting the opacity (alpha value).
"""
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
# Example data.
mu = 100 # Mean of distribution.
sigma = 15 # Standard deviation of distribution.
x = mu + sigma * np.random.randn(10000)
num_bins = 50
# The histogram of the data.
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# Add a 'best fit' line.
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')
# Tweak spacing to prevent clipping of ylabel.
plt.subplots_adjust(left=0.15)
plt.show()
This is a continuation of your coin flip simulation from the previous section. Although you already have a fairly accurate idea of whether or not the coin is biased, in statistics, the typical approach requires sampling from the underlying distribution multiple times, and then plotting the sampling distribution. To this end, a handle is required on the actual value for the probability of landing heads between 0 and 1 (probability of landing tails will then be 1 minus that value), which can be generated using the "random()
" method from the random module, that is the call "random.random()
". In the case of a fair coin, designate as heads anytime the value generated is below 0.5, otherwise it is assigned as tails. An unfair coin can be simulated by adjusting this threshold; the closer the value is to 0 (and below 0.5) the less likely the chance of getting heads in multiple trials. This is an involved complex statistical procedure, and is captured here for illustrative purposes.
In the next code cell, we simulate flipping a fair coin by specifying the threshold as 0.5. You will flip the coin 10 times and note the number of times it falls heads in those trials. This is repeated 1000 times, from which a plot of the sampling distribution of the underlying distribution (known as the binomial distribution) is generated.
In [ ]:
threshold = 0.5
ntrials = 10 # Flips per trial.
size = 1000 # Repetitions.
M = [0 for x in range(0,size)]
for i in range(0,size):
M[i] = sum([random.random()< threshold for x in range(0,ntrials)])
In [ ]:
# Cross tabulation for frequency of occurence of values.
M_xtab = [0 for i in range(0, ntrials+1)]
for i in range(0,ntrials+1):
M_xtab[i] = sum([x == i for x in M])
In [ ]:
print(M_xtab)
In [ ]:
# The histogram of the data.
plt.hist(M, range=(0,ntrials),bins=ntrials)
plt.xlabel('Number of heads')
plt.ylabel('Frequency')
plt.title('Coin flip distribution')
As before, the chance of our coin landing heads is as one would expect from an unbiased coin.
Using the "coinflip" example above, simulate the distribution one would get for a biased coin that has an 80% probability of landing tails for 1000 simulations, where the coin is flipped 20 times at each simulation. Plot the histogram using the Matplotlib.
Hints:
- Change "ntrials", "size", and "threshold" in the coin simulation code from above.
- The probability of getting tails is given. You need to compute the probability of getting heads from this value, which you will then use as the threshold value.
In [ ]:
# Your code here.
Python Software Foundation. 2016. “General Python FAQ - Python 3.6.2 documentation.”Last accessed August 20, 2017. https://docs.python.org/3/faq/general.html#what-is-python.
MIT OpenCourseWare. 2011. “6.096 Introduction to C++.” Accessed September 20. https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-096-introduction-to-c-january-iap-2011/lecture-notes/MIT6_096IAP11_lec02.pdf.