This is a 4 week (8 hour) course that will introduce you to the basics of handling, manipulating, exploring, and modeling data with Python.
I'm a data scientist at Automated Insights. Previously, I was PhD student in Physics at Duke, doing research in machine learning and complex systems. I like running, cooking, and curling.
The goal of this class is to introduce you to some concepts that form the foundations of modern data science, and to put those concepts to use using the Python data science ecosystem. I'm expecting that you know the basics of programming, but not necessarily that you've programmed in Python before. In other words, I'm going to introduce how to write a for
loop in Python, but I won't explain what a for
loop is.
This class is going to:
This class is not going to:
This class is meant to be interactive. Instead of me lecturing for the full 8 hours, we'll alternate between walking through materials together and working in small groups on prompts that will solidify concepts. At the end of each week, there will be a few take-home prompts for you to work on before the next class.
The environment you're in right now is called a Jupyter notebook. Project Jupyter is an interactive environment that data scientists use for collaboration and communication. Each cell in the notebook can either contain text or code (often Python, but R, Julia, and lots of other languages are supported). This allows you to seamlessly weave explanations and plots into your code.
The Jupyter front-end is called the notebook or dashboard. This is the part that you interact with directly. The back-end, where your code is actually run, is called the kernel. In particular, this notebook uses a kernel for executing Python code, but kernels for many other languages also exist. Since Jupyter is an open-source project, anyone with the time and dedication can make a kernel for executing code in their favorite language.
Each cell in a notebook can be executed independently, but declarations persist across cells. For example, I can define a variable in one cell...
In [ ]:
my_variable = 10
... and then access that variable in a later cell:
In [ ]:
print(my_variable)
Jupyter has two fundamental modes: command mode and edit mode. In edit mode, we can make changes to the content of specific cells. When you're in edit mode, the cell you're currently working in will be surrounded by a green box. Press Enter
or double click on a cell to enter edit mode.
Command mode is used to switch between cells, or to make changes to the notebook structure. For example, if you want to add a new cell to your notebook, you do this from command mode. Press Esc
to leave edit mode and enter command mode.
As I mentioned above, there are two fundamental types of cells in a notebook - text (i.e. markdown) and code. When you click on a code cell, you should see a cursor appear in the cell that allows you to edit the code in that cell. A cell can have multiple lines - to begin a new line, press Enter
. When you want to run the cell's code, press Shift
+Enter
.
Try changing the values of the numbers that are added together in the cell below, and observe how the output changes:
In [ ]:
a = 11
b = 19
print(a + b)
You can also edit the text in markdown cells. To display the editable, raw markdown in a text cell, double click on the cell. You can now put your cursor in the cell and edit it directly. When you're done editing, press Shift
+Enter
to render the cell into a more readable format.
Try editing text cell below with your name:
Make some edits here -> Hello, my name is Nick Haynes!
To change whether a cell contains text or code, use the drop-down in the toolbar. When you're in a code cell, it will look like this:
and when you're in a text cell, it will look like this:
Good programmers are efficient programmers! Jupyter has a large number of idiomatic keyboard shortcuts that are helpful to know. A few of my favorite are:
Command mode
a
, b
: insert a cell above or below the current one, respectively.Esc
: exit cell editor modedd
: delete the current cellm
: change cell type to markdowny
: change cell type to codeEdit mode
Tab
: code completionShift+Tab
: documentation tool tipThere's a full list of Jupyter's keyboard shortcuts here.
In [ ]:
Your turn
Without using your mouse:
In [ ]:
%lsmagic
Jupyter gives you access to so-called "magic" commands that aren't part of official Python syntax, but can make your life a lot easier. All magic commands are preceded with a %
(a single %
for single-line expressions, double %%
for multi-line expressions). For example, many of the common bash commands are built in:
In [ ]:
%ls # list the files and folders in the current directory
In [ ]:
%cd images
In [ ]:
%ls
Another very helpful magic command we'll use quite a bit is the %timeit
command:
In [ ]:
%cd ..
In [ ]:
%%timeit
my_sum = 0
for i in range(100000):
my_sum += i
This is actually a surprisingly tricky question! There are (at least) two answers:
Python is an open source programming language that is extremely popular in the data science and web development communities. The roots of its current popularity in data science and scientific computing have an interesting history, but suffice to say that it's darn near impossible to be a practicing data scientist these days without at least being familiar with Python.
The guiding principles behind the design of the Python language specification are described in "The Zen of Python", which you can find here or by executing:
In [ ]:
import this
Python syntax should be easy to write, but most importantly, well-written Python code should be easy to read. Code that follows these norms is called Pythonic. We'll touch a bit more on what it means to write Pythonic code in class.
A unique feature of Python is that whitespace matters, because it defines scope. Many other programming languages use braces or begin
/end
keywords to define scope. For example, in Javascript, you write a for
loop like this:
var count;
for(count = 0; count < 10; count++){
console.log(count);
console.log("<br />");
}
The curly braces here define the code executed in each iteration of the for loop. Similarly, in Ruby you write a for
loop like this:
for count in 0..9
puts "#{count}"
end
In this snippet, the code executed in each iteration of the for
loop is whatever comes between the first line and the end
keyword.
In Python, for
loops look a bit different:
In [ ]:
print('Entering the for loop:\n')
a = 0
for count in range(10):
print(count)
a += count
print('Still in the for loop.')
print("\nNow I'm done with the for loop.")
print(a)
Note that there is no explicit symbol or keyword that defines the scope of code executed during each iteration - it's the indentation that defines the scope of the loop. When you define a function or class, or write a control structure like a for
look or if
statement, you should indent the next line (4 spaces is customary). Each subsequent line at that same level of indentation is considered part of the scope. You only escape the scope when you return to the previous level of indentation.
If you open up the terminal on your computer and type python
, it runs a program that looks something like this:
This is a program called CPython (written in C, hence the name) that parses, interprets, and executes code written to the Python language standard. CPython is known as the "reference implementation" of Python - it is an open source project (you can download and build the source code yourself if you're feeling adventurous) run by the Python Software Foundation and led by Guido van Rossum, the original creator and "Benevolent Dictator for Life" of Python.
When you type simply python
into the command line, CPython brings up a REPL (Read Execute Print Loop, pronounced "repple"), which is essentially an infinite loop that takes lines as you write them, interprets and executes the code, and prints the result.
For example, try typing
>>> x = 'Hello world"
>>> print(x)
in the REPL. After you hit Enter
on the first line, the interpreter assigns the value "Hello world" to a string variable x
. After you hit Enter
on the second line, it prints the value of x
.
We can accomplish the same result by typing the same code
x = "Hello world"
print(x)
into a file called test.py
and running python test.py
from the command line. The only difference is that when you provide the argument test.py
to the python
command, the REPL doesn't appear. Instead, the CPython interpreter interprets the contents of test.py
line-by-line until it reaches the end of the file, then exits. We won't use the REPL much in this course, but it's good to be aware that it exists. In fact, behind the pretty front end, this Jupyter notebook is essentially just wrapping the CPython interpreter, executing commands line by line as we enter them.
So to review, "Python" sometimes refers to a language specification and sometimes refers to an interpreter that's installed on your computer. We will use the two definitions interchangeably in this course; hopefully, it should be obvious from context which definition we're referring to.
Above, we talked about the concept of Pythonic code, which emphasizes an explicit, readable coding style. In practice, there are also a number of conventions codified in a document called PEP 8 (PEP = Python Enhancement Proposal, a community suggestion for possible additions to the Python language). These conventions make Python code written by millions of developers easier to read and comprehend, so sticking to them as closely as is practical is a very good idea.
A few useful conventions that we'll see in this class are:
snake_case
(all lowercase letters, words separated by underscores).I'll introduce other conventions as they arise.
As you may have heard, there's a bit of a rift in the Python community between Python 2 and Python 3.
Python 3.0 was released in 2008, introducing a few new features that were not backwards compatible with Python 2.X. Since then, the core Python developers have released several new versions of 3.X (3.6 is the most recent, released in December 2016), and they have announced that Python 2.X will no longer be officially supported after 2020. We'll be using Python 3.5 for this class:
Long story short - I firmly believe that 3.X is the clear choice for anyone who isn't supporting a legacy project.
One fundamental idea in Python is that everything is an object. This is different than some other languages like C and Java, which have fundamental, primitive data types like int
and char
. This means that things like integers and strings have attributes and methods that you can access. For example, if you want to read some documentation about an object my_thing
, you can access its __doc__
attribute like this:
In [ ]:
thing_1 = 47 # define an int object
print(thing_1.__doc__)
In [ ]:
thing_1 = 'blah' # reassign thing_1 to an string object
print(thing_1.__doc__)
In [ ]:
print(thing_1)
To learn more about what attributes and methods a given object has, you can call dir(my_object)
:
In [ ]:
dir(thing_1)
That's interesting - it looks like the string object has a method called __add__
. Let's see what it does -
In [ ]:
thing_2 = 'abcd'
thing_3 = thing_1.__add__(thing_2)
print(thing_3)
So calling __add__
with two strings creates a new string that is the concatenation of the two originals. As an aside, there are a lot more methods we can call on strings - split
, upper
, find
, etc. We'll come back to this.
The +
operator in Python is just syntactic sugar for the __add__
method:
In [ ]:
thing_4 = thing_1 + thing_2
print(thing_4)
print(thing_3 == thing_4)
Any object you can add to another object in Python has an __add__
method. With integer addition, this works exactly as we would expect:
In [ ]:
thing_1 = '1'
thing_2 = '2'
int(thing_1) + int(thing_2)
In [ ]:
int_1 = 11
int_2 = 22
sum_1 = int_1.__add__(int_2)
sum_2 = int_1 + int_2
print(sum_1)
print(sum_2)
print(sum_1 == sum_2)
But it's unclear what to do when someone tries to add an int
to a str
:
In [ ]:
thing_1 + int_1
There are a few native Python data types, each of which we'll use quite a bit. The properties of these types work largely the same way as they do in other languages. If you're ever confused about what type a variable my_var
is, you can always call type(my_var)
.
Just like in other languages, bool
s take values of either True
or False
. All of the traditional Boolean operations are present:
In [ ]:
bool_1 = True
type(bool_1)
In [ ]:
dir(bool_1)
In [ ]:
bool_2 = False
In [ ]:
bool_1 == bool_2
In [ ]:
type(bool_1 + bool_2)
In [ ]:
type(bool_1 and bool_2)
In [ ]:
bool_1 * bool_2
In [ ]:
int_1 = 2
type(int_1)
In [ ]:
dir(int_1)
In [ ]:
int_2 = 3
print(int_1 - int_2)
In [ ]:
int_1.__pow__(int_2)
In [ ]:
int_1 ** int_2
One change from Python 2 to Python 3 is the default way that integers are divided. In Python 2, the result of 2/3
is 0
, the result of 4/3
is 1
, etc. In other words, dividing integers in Python 2 always returned an integer with any remainder truncated. In Python 3, the result of the division of integers is always a float
, with a decimal approximation of the remainder included. For example:
In [ ]:
int_1 / int_2
In [ ]:
type(int_1 / int_2)
In [ ]:
int_1.__truediv__(int_2)
In [ ]:
int_1.__divmod__(int_2)
In [ ]:
int_1 % int_2
In [ ]:
float_1 = 23.46
type(float_1)
In [ ]:
dir(float_1)
In [ ]:
float_2 = 3.0
type(float_2)
In [ ]:
float_1 / float_2
With int
s and float
s, we can also do comparison operators like in other languages:
In [ ]:
int_1 < int_2
In [ ]:
float_1 >= int_2
In [ ]:
float_1 == float_2
In [ ]:
int_1 = 1
float_1 = 1.0
In [ ]:
str_1 = 'hello'
type(str_1)
In [ ]:
dir(str_1)
We already saw that the +
operator concatenates two strings. Generalizing from this, what do you expect the *
operator to do?
In [ ]:
a = 'Hi'
print(a*5)
There are a number of very useful methods built into Python str
objects. A few that you might find yourself needing to use when dealing with text data include:
In [ ]:
# count the number of occurances of a sub-string
"Hi there I'm Nick".count('i')
In [ ]:
# Find the next index of a substring
"Hi there I'm Nick".find('i')
In [ ]:
"Hi there I'm Nick".find('i', 2)
In [ ]:
# Insert variables into a string
digit = 7
'The digit "7" should appear at the end of this sentence: {}.'.format(digit)
In [ ]:
another_digit = 15
'This sentence will have two digits at the end: {} and {}.'.format(digit, another_digit)
In [ ]:
# Replace a sub-string with another sub-string
my_sentence = "Hi there I'm Nick"
my_sentence.replace('e', 'E')
In [ ]:
my_sentence.replace('N', '')
There are plenty more useful string functions - use either the dir()
function or Google to learn more about what's available.
In [ ]:
missing_val = None
type(missing_val)
In [ ]:
missing_val is None
In [ ]:
print(missing_val and True)
In [ ]:
missing_val + 1
None
is helpful for passing optional values in function arguments, or to make it explicitly clear that you're not passing data that has any value. None
is a different concept than NaN
, which we'll see next week.
So, to sum it up - basic data types like bool
, int
, float
, and str
are all objects in Python. The methods in each of these object classes define what operations can be done on them and how those operations are performed. For the sake of readability, however, many of the common operations like + and < are provided as syntactic sugar.
When we were looking at the methods in the various data type classes above, we saw a bunch of methods like __add__
and __pow__
with double leading underscores and double trailing underscores (sometimes shorted to "dunders"). As it turns out, underscores are a bit of a thing in Python. Idiomatic use dictates a few important uses of underscores in variable and function names:
my_variable
) rather than camelCase (myVariable
)._my_function
or _my_variable
) denotes a function or variable that is not meant for end users to access directly. Python doesn't have a sense of strong encapsulation, i.e. there are no strictly "private" methods or variables like in Java, but a leading underscore is a way of "weakly" signaling that the entity is for private use only.type_
) is used to avoid conflict with Python built-in functions or keywords. In my opinion, this is often poor style. Try to come up with a more descriptive name instead.__init__
, __add__
) correspond to special variables or methods that correspond to some sort of "magic" syntax. As we saw above, the __add__
method of an object describes what the result of some_object + another_object
is.For lots more detail on the use of underscores in Python, check out this post.
Single variables can only take us so far. Eventually, we're going to way to have ways of storing many individual variables in a single, structured format.
The list is one of the most commonly used Python data structures. A list is an ordered collection of (potentially heterogeneous) objects. Similar structures that exist in other languages are often called arrays.
In [ ]:
my_list = ['a', 'b', 'c', 'a']
In [ ]:
len(my_list)
In [ ]:
my_list.append(1)
print(my_list)
To access individual list elements by their position, use square brackets:
In [ ]:
my_list[0] # indexing in Python starts at 0!
In [ ]:
my_list[4]
In [ ]:
my_list[-1] # negative indexes count backward from the end of the list
Lists can hold arbitrary objects!
In [ ]:
type(my_list[0])
In [ ]:
type(my_list[-1])
In [ ]:
# let's do something crazy
my_list.append(my_list)
type(my_list[-1])
In [ ]:
my_list
In [ ]:
my_list[-1]
In [ ]:
my_list[-1][-1]
In [ ]:
my_list[-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1][-1]
Lists are also mutable objects, meaning that any part of them can be changed at any time. This makes them very flexible objects for storing data in a program.
In [ ]:
my_list = ['a', 'b', 1]
In [ ]:
my_list[0] = 'c'
my_list
In [ ]:
my_list.remove(1)
my_list
In [ ]:
my_tuple = ('a', 'b', 1, 'a')
print(my_tuple)
In [ ]:
my_tuple
In [ ]:
my_tuple[2]
In [ ]:
my_tuple[0] = 'c'
In [ ]:
my_tuple.append('c')
In [ ]:
my_tuple.remove(1)
In [ ]:
list_1 = [1, 2, 3]
list_2 = [4, 5, 6]
list_1 + list_2
In [ ]:
my_set = {'a', 'b', 1, 'a'}
print(my_set) # note that order
In [ ]:
my_set.add('c')
print(my_set)
Note above that the order of items in a set doesn't have the same meaning as in lists and tuples.
In [ ]:
my_set[0]
Sets are used for a couple reasons. Sometimes, finding the number of unique items in a list or tuple is important. In this case, we can convert the list/tuple to a set, then call len
on the new set. For example,
In [ ]:
my_list = ['a', 'a', 'a', 'a', 'b', 'b', 'b']
my_list
In [ ]:
my_set = set(my_list)
len(my_set)
The other reason is that the in
keyword for testing a collection for membership of an object is much faster for a list than a set.
In [ ]:
my_list = list(range(1000000)) # list of numbers 0 - 999,999
my_set = set(my_list)
In [ ]:
%%timeit
999999 in my_list
In [ ]:
%%timeit
999999 in my_set
Any idea why there's such a discrepancy?
In [ ]:
my_dict = {'name': 'Nick',
'birthday': 'July 13',
'years_in_durham': 4}
my_dict
In [ ]:
my_dict['name']
In [ ]:
my_dict['years_in_durham']
In [ ]:
my_dict['favorite_restaurants'] = ['Mateo', 'Piedmont']
my_dict['favorite_restaurants']
In [ ]:
my_dict['age'] # hey, that's personal. Also, it's not a key in the dictionary.
In addition to accessing values by keys, you can retrieve the keys and values by themselves as lists:
In [ ]:
my_dict.keys()
In [ ]:
my_dict.values()
Note that if you're using Python 3.5 or earlier, the order that you insert key/value pairs into the dictionary doesn't correspond to the order they're stored in by default (we inserted favorite_restaurant
after years_in_durham
!). This default behavior was just recently changed in Python 3.6 (released in December 2016).
As data scientists, we're data-driven people, and we want our code to be data-driven, too. Control structures are a way of adding a logical flow to your programs, making them reactive to different conditions. These concepts are largely the same as in other programming languages, so I'll quickly introduce the syntax here for reference without much comment.
Like most programming languages, Python provides a way of conditionally evaluating lines of code.
In [ ]:
x = 3
if x < 2:
print('x less than 2')
elif x < 4:
print('x less than 4, greater than or equal to 2')
else:
print('x greater than or equal to 4')
In [ ]:
my_list = ['a', 'b', 'c']
for element in my_list:
print(element)
To iterate for a specific number of times, you can create an iterator object with the range
function:
In [ ]:
for i in range(5): # iterate over all integers (starting at 0) less than 5
print(i)
In [ ]:
for i in range(2, 6, 3): # iterate over integers (starting at 2) less than 6, increasing by 3
print(i)
In [ ]:
my_list = ['a', 'b', 'c']
idx = 0
while idx < len(my_list):
print(my_list[idx])
idx += 1
In [ ]:
my_list = ['a', 'b', 'c']
for element in my_list:
print(element)
There are occasionally other reasons for using while loops (waiting for an external input, for example), but we won't make extensive use of them in this course.
Your turn
my_dict = {
'a': 3,
'b': 2,
'c': 10,
'd': 7,
'e': 9,
'f' : 12,
'g' : 13
}
Print out:
In [ ]:
my_dict = {
'a': 3,
'b': 2,
'c': 10,
'd': 7,
'e': 9,
'f' : 12,
'g' : 13
}
for key, val in my_dict.items():
print(key, val)
In [ ]:
Of course, as data scientists, one of our most important jobs is to manipulate data in a way that provides insight. In other words, we need ways of taking raw data, doing some things to it, and returning nice, clean, processed data back. This is the job of functions!
It turns out that Python has a ton of functions built in already. When we have a task that can be accomplished by a built-in function, it's almost always a good idea to use them. This is because many of the Python built-in functions are actually written in C, not Python, and C tends to be much faster for certain tasks.
In [1]:
my_list = list(range(1000000))
In [2]:
%%timeit
sum(my_list)
In [3]:
%%timeit
my_sum = 0
for element in my_list:
my_sum += element
my_sum
Some common mathematical functions that are built into Python:
sum
divmod
round
abs
max
min
And some other convenience functions, some of which we've already seen:
int
, float
, str
, set
, list
, dict
: for converting between data structureslen
: for finding the number of elements in a data structuretype
: for finding the type that an object belongs to
In [4]:
def double_it(x):
return x * 2
In [5]:
double_it(5)
Out[5]:
Python has dynamic typing, which (in part) means that the arguments to functions aren't assigned a specific type:
In [6]:
double_it('hello') # remember 'hello' * 2 from before?
Out[6]:
In [7]:
double_it({'a', 'b'}) # but there's no notion of multiplication for sets
When defining a function, you can add defaults to arguments that you want to be optional. When defining and providing arguments, required arguments always go first, and the order they're provided in matters. Optional arguments follow, and can be passed by their keyword in any order.
In [8]:
def multiply_them(x, y, extra_arg1=None, extra_arg2=None):
if extra_arg1 is not None:
print(extra_arg1)
if extra_arg2 is not None:
print(extra_arg2)
print('multiplying {} and {}...'.format(x, y))
return x * y
In [9]:
multiply_them(3, 5)
Out[9]:
In [10]:
multiply_them(3, 5, extra_arg1='hello')
Out[10]:
In [11]:
multiply_them(3, 5, extra_arg2='world', extra_arg1='hello')
Out[11]:
In [12]:
multiply_them(extra_arg2='world', extra_arg1='hello', 3, 5)
Your turn
len
function). Now, use %%timeit
to compare the speed to len
for a list of 100,000 elements.
In [16]:
my_list = [1, 2, 3]
for el in my_list:
print(el)
In [15]:
def count_elements(my_list):
counter = 0
for el in my_list:
counter += 1
return counter
In [17]:
count_elements(my_list)
Out[17]:
In [21]:
my_list = list(range(1000))
In [22]:
%%timeit
count_elements(my_list)
In [23]:
%%timeit
len(my_list)
min
function). Include an optional argument that specifies whether to take the absolute values of the number first, with a default value of False
.
In [24]:
def get_min(my_list):
potential_min = my_list[0]
for el in my_list[1:]:
if el < potential_min:
potential_min = el
return potential_min
In [25]:
my_list = [3, 2, 1]
print(get_min(my_list))
Knowing how to create your own functions can be a rabbit hole - once you know that you can make Python do whatever you want it to do, it can be easy to go overboard. Good data scientists are efficient data scientists - you shouldn't reinvent the wheel by reimplementing a bunch of functionality that someone else worked hard on. Doing anything nontrivial can take a ton of time, and without spending even more time to write tests, squash bugs, and address corner cases, your code can easily end up being much less reliable than code that someone else has spent time perfecting.
Python has a very robust standard library of external modules that come with every Python installation. For even more specialized work, the Python community has also open-sourced tens of thousands of packages, any of which is a simple pip install
away.
The Python standard library is a collection of packages that ships with Python itself. In other words, it contains a bunch of code that you can import into code you're writing, but that you don't have to download separately after downloading Python.
Here are a few examples -
In [30]:
import random # create (pseudo-) random numbers|
random.random() # choose a float between 0 and 1 (uniformly)
Out[30]:
In [31]:
import math # common mathematical functions that aren't built into base Python
print(math.factorial(5))
In [32]:
math.log10(100)
Out[32]:
In [33]:
import statistics # some basic summary statistics
my_list = [1, 2, 3, 4, 5]
statistics.mean(my_list)
Out[33]:
In [34]:
statistics.median(my_list)
Out[34]:
In [35]:
statistics.stdev(my_list)
Out[35]:
In [36]:
dir(statistics)
Out[36]:
There are dozens of packages in the standard library, so if you find yourself writing a function for something lots of other people might want to do, it's definitely worth checking whether that function is already implemented in the Python standard library.
We'll use a handful of packages from the standard library in this course, which I'll introduce as they appear.
Nonetheless, the standard library can't contain functionality that covers everything people use Python for. For more specialized packages, the Python Software Foundation runs the Python Package Index (PyPI, pronounced pie-pee-eye). PyPI is a package server that is free to upload and download from - anyone can create a package and upload it to PyPI, and anyone can download any package from PyPI at any time.
To download and install a package from PyPI, you typically use a program called pip
(pip installs packages) by running the command pip install <package name>
from the command line.
import
work?Above, we saw some examples of importing external modules to use in our code. In general, a single Python file or a directory of files can be import
ed.
When Python sees the import my_module
command, it first searches in the current working directory. If the working directory contains a Python script my_module.py
or a directory of Python files my_module/
, the functions and classes in those files are loaded into the current namespace, accessible under my_module
. If nothing in the working directory is called my_module
, Python checks the directory on your computer where external modules from PyPI are installed. If it doesn't find anything there, it returns ImportError
.
There are several ways of arranging the namespace for imports:
In [37]:
import statistics
statistics.median(my_list)
Out[37]:
In [40]:
import statistics as nick
nick.median(my_list)
Out[40]:
In [41]:
from statistics import median
median(my_list)
Out[41]:
In [43]:
mean(my_list)
In [48]:
from statistics import *
def median(x):
return x
median(my_list)
Out[48]:
In [45]:
mean(my_list)
Out[45]:
from *
imports are almost always a bad idea and should be avoided at all costs. Can you think of why that is?
Your turn
Write a function that calculates the median of a list of numbers (without using statistics
). Use the randint
function from the random
module to create a list of integers to test your function.
In [ ]:
This notebook is fairly information-dense, especially if you haven't used Python before. Keep it close by for reference as the course goes along! Thankfully, Python syntax is fairly friendly toward beginners, so picking up the basics usually doesn't take too long. I hope you'll find as the course goes along that the Python syntax starts to feel more natural. Don't get discouraged; know when to ask for help, and look online for resources. And remember - the Python ecosystem is deep, and it can take years to master!
Write a function that takes as its arguments a string and a character, and for each occurrence of that character in the string, replaces the character with '#' and returns the new string. For example, replace_chars('this sentence starts with "t"', 't')
should return #his sen#ence s#ar#s with "#"'
. Try doing this by hand as well as using a built-in Python function.
In [ ]:
Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.
In [ ]:
Using Python's random
module, write a program that rolls a die (i.e. generates a random integer between 1 and 6) 100,000 times. Write pure Python functions to calculate the mean and variance (look up the formulas if you can't remember them) of the rolls.
In [ ]: