The text and code are released under the CC0 license; see also the companion project, the Python Data Science Handbook.
So far, our scripts have been simple, single-use code blocks.
One way to organize our Python code and to make it more readable and reusable is to factor-out useful pieces into reusable functions.
Here we'll cover two ways of creating functions: the def
statement, useful for any type of function, and the lambda
statement, useful for creating short anonymous functions.
In [1]:
print('abc')
Here print
is the function name, and 'abc'
is the function's argument.
In addition to arguments, there are keyword arguments that are specified by name.
One available keyword argument for the print()
function (in Python 3) is sep
, which tells what character or characters should be used to separate multiple items:
In [2]:
print(1, 2, 3)
In [3]:
print(1, 2, 3, sep='--')
When non-keyword arguments are used together with keyword arguments, the keyword arguments must come at the end.
Functions become even more useful when we begin to define our own, organizing functionality to be used in multiple places.
In Python, functions are defined with the def
statement.
For example, we can encapsulate a version of our Fibonacci sequence code from the previous section as follows:
In [4]:
def fibonacci(N):
L = []
a, b = 0, 1
while len(L) < N:
a, b = b, a + b
L.append(a)
return L
Now we have a function named fibonacci
which takes a single argument N
, does something with this argument, and return
s a value; in this case, a list of the first N
Fibonacci numbers:
In [5]:
fibonacci(10)
Out[5]:
If you're familiar with strongly-typed languages like C
, you'll immediately notice that there is no type information associated with the function inputs or outputs.
Python functions can return any Python object, simple or compound, which means constructs that may be difficult in other languages are straightforward in Python.
For example, multiple return values are simply put in a tuple, which is indicated by commas:
In [6]:
def real_imag_conj(val):
return val.real, val.imag, val.conjugate()
r, i, c = real_imag_conj(3 + 4j)
print(r, i, c)
Often when defining a function, there are certain values that we want the function to use most of the time, but we'd also like to give the user some flexibility.
In this case, we can use default values for arguments.
Consider the fibonacci
function from before.
What if we would like the user to be able to play with the starting values?
We could do that as follows:
In [7]:
def fibonacci(N, a=0, b=1):
L = []
while len(L) < N:
a, b = b, a + b
L.append(a)
return L
With a single argument, the result of the function call is identical to before:
In [8]:
fibonacci(10)
Out[8]:
But now we can use the function to explore new things, such as the effect of new starting values:
In [9]:
fibonacci(10, 0, 2)
Out[9]:
The values can also be specified by name if desired, in which case the order of the named values does not matter:
In [10]:
fibonacci(10, b=3, a=1)
Out[10]:
In [11]:
def catch_all(*args, **kwargs):
print("args =", args)
print("kwargs = ", kwargs)
In [12]:
catch_all(1, 2, 3, a=4, b=5)
In [13]:
catch_all('a', keyword=2)
Here it is not the names args
and kwargs
that are important, but the *
characters preceding them.
args
and kwargs
are just the variable names often used by convention, short for "arguments" and "keyword arguments".
The operative difference is the asterisk characters: a single *
before a variable means "expand this as a sequence", while a double **
before a variable means "expand this as a dictionary".
In fact, this syntax can be used not only with the function definition, but with the function call as well!
In [14]:
inputs = (1, 2, 3)
keywords = {'pi': 3.14}
catch_all(*inputs, **keywords)
In [15]:
add = lambda x, y: x + y
add(1, 2)
Out[15]:
This lambda function is roughly equivalent to
In [16]:
def add(x, y):
return x + y
So why would you ever want to use such a thing? Primarily, it comes down to the fact that everything is an object in Python, even functions themselves! That means that functions can be passed as arguments to functions.
As an example of this, suppose we have some data stored in a list of dictionaries:
In [17]:
data = [{'first':'Guido', 'last':'Van Rossum', 'YOB':1956},
{'first':'Grace', 'last':'Hopper', 'YOB':1906},
{'first':'Alan', 'last':'Turing', 'YOB':1912}]
Now suppose we want to sort this data.
Python has a sorted
function that does this:
In [18]:
sorted([2,4,3,5,1,6])
Out[18]:
But dictionaries are not orderable: we need a way to tell the function how to sort our data.
We can do this by specifying the key
function, a function which given an item returns the sorting key for that item:
In [19]:
# sort alphabetically by first name
sorted(data, key=lambda item: item['first'])
Out[19]:
In [20]:
# sort by year of birth
sorted(data, key=lambda item: item['YOB'])
Out[20]:
While these key functions could certainly be created by the normal, def
syntax, the lambda
syntax is convenient for such short one-off functions like these.