Functions

Overview

  • Data basics
    • Theory of types
    • Basic types
    • Composite types: Tuples, lists and dictionaries
  • Functions
    • range() function
    • read and write files
  • Flow control, part 1
    • If
    • for

Function Basics

  • A function has 0 or more inputs and returns 0 or more outputs.
  • A python function is called (executed) by placing "(", ")" after the function name with the list of arguments.

In [1]:
# A first function. Find the length of a list.
a_list = [1, 2, 3]
len(a_list)


Out[1]:
3

In [1]:
len({"a": [1, 2, 3], "b": 4})


Out[1]:
2

Builtin Functions

Functions that are supplied in the python base.

len()

Takes an object as input and returns an integer.

range()

Takes an integer as input and outputs a list of successive integers starting with 0 to the length specified as an argument.


In [2]:
range(3)


Out[2]:
range(0, 3)

Notice that Python (in the newest versions, e.g. 3+) has an object type that is a range. This saves memory and speeds up calculations vs. an explicit representation of a range as a list - but it can be automagically converted to a list on the fly by Python. To show the contents as a list we can use the type case like with the tuple above.

Sometimes, in older Python docs, you will see xrange. This used the range object back in Python 2 and range returned an actual list. Beware of this!


In [2]:
# Experiment with the builtin function all
all([1, "first", 3.4])


Out[2]:
True

In [6]:
any([False, False])


Out[6]:
False

In [ ]:
list(range(3))

In [7]:
fd = open("t.txt", "w")

In [9]:
fd.write("a line")


Out[9]:
6

In [10]:
fd.close()

In [11]:
!ls -l


total 56
-rw-r--r-- 1 ubuntu ubuntu 26401 Mar 31 19:36 data_basics.ipynb
-rw-r--r-- 1 ubuntu ubuntu  6587 Mar 31 11:01 flow_of_control.ipynb
-rw-r--r-- 1 ubuntu ubuntu 16281 Mar 31 19:52 functions.ipynb
-rw-r--r-- 1 ubuntu ubuntu     6 Mar 31 19:52 t.txt

In [8]:
help(fd)


Help on TextIOWrapper object:

class TextIOWrapper(_TextIOBase)
 |  Character and line based layer over a BufferedIOBase object, buffer.
 |  
 |  encoding gives the name of the encoding that the stream will be
 |  decoded or encoded with. It defaults to locale.getpreferredencoding(False).
 |  
 |  errors determines the strictness of encoding and decoding (see
 |  help(codecs.Codec) or the documentation for codecs.register) and
 |  defaults to "strict".
 |  
 |  newline controls how line endings are handled. It can be None, '',
 |  '\n', '\r', and '\r\n'.  It works as follows:
 |  
 |  * On input, if newline is None, universal newlines mode is
 |    enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
 |    these are translated into '\n' before being returned to the
 |    caller. If it is '', universal newline mode is enabled, but line
 |    endings are returned to the caller untranslated. If it has any of
 |    the other legal values, input lines are only terminated by the given
 |    string, and the line ending is returned to the caller untranslated.
 |  
 |  * On output, if newline is None, any '\n' characters written are
 |    translated to the system default line separator, os.linesep. If
 |    newline is '' or '\n', no translation takes place. If newline is any
 |    of the other legal values, any '\n' characters written are translated
 |    to the given string.
 |  
 |  If line_buffering is True, a call to flush is implied when a call to
 |  write contains a newline character.
 |  
 |  Method resolution order:
 |      TextIOWrapper
 |      _TextIOBase
 |      _IOBase
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __getstate__(...)
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  close(self, /)
 |      Flush and close the IO object.
 |      
 |      This method has no effect if the file is already closed.
 |  
 |  detach(self, /)
 |      Separate the underlying buffer from the TextIOBase and return it.
 |      
 |      After the underlying buffer has been detached, the TextIO is in an
 |      unusable state.
 |  
 |  fileno(self, /)
 |      Returns underlying file descriptor if one exists.
 |      
 |      OSError is raised if the IO object does not use a file descriptor.
 |  
 |  flush(self, /)
 |      Flush write buffers, if applicable.
 |      
 |      This is not implemented for read-only and non-blocking streams.
 |  
 |  isatty(self, /)
 |      Return whether this is an 'interactive' stream.
 |      
 |      Return False if it can't be determined.
 |  
 |  read(self, size=-1, /)
 |      Read at most n characters from stream.
 |      
 |      Read from underlying buffer until we have n characters or we hit EOF.
 |      If n is negative or omitted, read until EOF.
 |  
 |  readable(self, /)
 |      Return whether object was opened for reading.
 |      
 |      If False, read() will raise OSError.
 |  
 |  readline(self, size=-1, /)
 |      Read until newline or EOF.
 |      
 |      Returns an empty string if EOF is hit immediately.
 |  
 |  seek(self, cookie, whence=0, /)
 |      Change stream position.
 |      
 |      Change the stream position to the given byte offset. The offset is
 |      interpreted relative to the position indicated by whence.  Values
 |      for whence are:
 |      
 |      * 0 -- start of stream (the default); offset should be zero or positive
 |      * 1 -- current stream position; offset may be negative
 |      * 2 -- end of stream; offset is usually negative
 |      
 |      Return the new absolute position.
 |  
 |  seekable(self, /)
 |      Return whether object supports random access.
 |      
 |      If False, seek(), tell() and truncate() will raise OSError.
 |      This method may need to do a test seek().
 |  
 |  tell(self, /)
 |      Return current stream position.
 |  
 |  truncate(self, pos=None, /)
 |      Truncate file to size bytes.
 |      
 |      File pointer is left unchanged.  Size defaults to the current IO
 |      position as reported by tell().  Returns the new size.
 |  
 |  writable(self, /)
 |      Return whether object was opened for writing.
 |      
 |      If False, write() will raise OSError.
 |  
 |  write(self, text, /)
 |      Write string to stream.
 |      Returns the number of characters written (which is always equal to
 |      the length of the string).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  buffer
 |  
 |  closed
 |  
 |  encoding
 |      Encoding of the text stream.
 |      
 |      Subclasses should override.
 |  
 |  errors
 |      The error setting of the decoder or encoder.
 |      
 |      Subclasses should override.
 |  
 |  line_buffering
 |  
 |  name
 |  
 |  newlines
 |      Line endings translated so far.
 |      
 |      Only line endings translated during reading are considered.
 |      
 |      Subclasses should override.
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from _IOBase:
 |  
 |  __del__(...)
 |  
 |  __enter__(...)
 |  
 |  __exit__(...)
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  readlines(self, hint=-1, /)
 |      Return a list of lines from the stream.
 |      
 |      hint can be specified to control the number of lines read: no more
 |      lines will be read if the total size (in bytes/characters) of all
 |      lines so far exceeds hint.
 |  
 |  writelines(self, lines, /)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from _IOBase:
 |  
 |  __dict__

Packages with functions

  • A package is a collection of reusable python codes.
  • To get access to a package you must:
    • Install the package (e.g., using pip)
    • Your code must import the package, which then becomes a python object.

numpy

Package of codes for numerical processing in python.


In [14]:
data = [1, 2, 3, 4]

In [15]:
import numpy as np
np.mean(data)


Out[15]:
2.5

In [16]:
np.median(data)


Out[16]:
2.5

In [17]:
arr = np.array(data)
arr


Out[17]:
array([1, 2, 3, 4])

In [18]:
np.reshape(arr, (2,2))


Out[18]:
array([[1, 2],
       [3, 4]])

In [7]:
np.std(data)


Out[7]:
1.118033988749895

How do you find out what's in a package?

Python's Data Science Ecosystem

In addition to Python's built-in modules like the math module we explored above, there are also many often-used third-party modules that are core tools for doing data science with Python. Some of the most important ones are: numpy: Numerical Python

Numpy is short for "Numerical Python", and contains tools for efficient manipulation of arrays of data. If you have used other computational tools like IDL or MatLab, Numpy should feel very familiar. scipy: Scientific Python

Scipy is short for "Scientific Python", and contains a wide range of functionality for accomplishing common scientific tasks, such as optimization/minimization, numerical integration, interpolation, and much more. We will not look closely at Scipy today, but we will use its functionality later in the course. pandas: Labeled Data Manipulation in Python

Pandas is short for "Panel Data", and contains tools for doing more advanced manipulation of labeled data in Python, in particular with a columnar data structure called a Data Frame. If you've used the R statistical language (and in particular the so-called "Hadley Stack"), much of the functionality in Pandas should feel very familiar. matplotlib: Visualization in Python

Matplotlib started out as a Matlab plotting clone in Python, and has grown from there in the 15 years since its creation. It is the most popular data visualization tool currently in the Python data world (though other recent packages are starting to encroach on its monopoly).

Machine Learning Exploration


In [19]:
from pandas import read_csv
from pandas.plotting import scatter_matrix
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC