In [ ]:

    
# Software Engineering for Data Scientists

## *Python Basics*
## DATA 515 A

Today's Objectives

0. Cloning LectureNotes

1. Opening & Navigating the Jupyter Notebook

2. Data type basics

3. Loading data with `pandas`

4. Cleaning and Manipulating data with `pandas`

5. Visualizing data with `pandas` & `matplotlib`

0. Cloning Lecture Notes

The course materials are maintained on github. The next lecture will discuss github in detail. Today, you'll get minimal instructions to get access to today's lecture materials.

Open a terminal session
Type 'git clone https://github.com/UWSEDS/LectureNotes.git'
Wait until the download is complete
cd LectureNotes
cd 02_Procedural_Python

1. Opening and Navigating the IPython Notebook

We will start today with the interactive environment that we will be using often through the course: the Jupyter Notebook.

We will walk through the following steps together:

Download miniconda (be sure to get Version 3.6) and install it on your system (hopefully you have done this before coming to class)
Use the conda command-line tool to update your package listing and install the IPython notebook:

Update conda's listing of packages for your system:
```
$ conda update conda
```
Install IPython notebook and all its requirements
```
$ conda install jupyter notebook
```
Navigate to the directory containing the course material. For example:
```
$ cd LectureNotes/02_Procedural_Python
```
You should see a number of files in the directory, including these:
```
$ ls
```
Type jupyter notebook in the terminal to start the notebook
```
$ jupyter notebook
```
If everything has worked correctly, it should automatically launch your default browser
Click on Lecture-Python-And-Data.ipynb to open the notebook containing the content for this lecture.

With that, you're set up to use the Jupyter notebook!

2. Data Types Basics

2.1 Data type theory

Components with the same capabilities are of the same type.
- For example, the numbers 2 and 200 are both integers.
A type is defined recursively. Some examples.
- A list is a collection of objects that can be indexed by position.
- A list of integers contains an integer at each position.
A type has a set of supported operations. For example:
- Integers can be added
- Strings can be concatented
- A table can find the name of its columns
  - What type is returned from the operation?
In python, members (components and operations) are indicated by a '.'
- If a is a list, the a.append(1) adds 1 to the list.

2.2 Primitive types

The primitive types are integers, floats, strings, booleans.

2.2.1 Integers



In [1]:

    
# Integer arithematic
1 + 1









    Out[1]:





2



In [2]:

    
# Integer division version floating point division
print (6 // 4, 6/ 4)

2.2.2 Floats



In [3]:

    
# Have the full set of "calculator functions" but need the numpy package
import numpy as np
print (6.0 * 3, np.sin(2*np.pi))









    



18.0 -2.4492935982947064e-16



In [4]:

    
# Floats can have a null value called nan, not a number
a = np.nan
3*a









    Out[4]:





nan

2.2.3 Strings



In [5]:

    
# Can concatenate, substring, find, count, ...



In [6]:

    
a = "The lazy"
b = "brown fox"
print ("Concatenation: ", a + b)
print ("First three letters: " + a[0:3])
print ("Index of 'z': " + str(a.find('z')))









    



Concatenation:  The lazybrown fox
First three letters: The
Index of 'z': 6

2.3 Tuples

A tuple is an ordered sequence of objects. Tuples cannot be changed; they are immuteable.



In [7]:

    
a_tuple = (1, 'ab', (1,2))
a_tuple









    Out[7]:





(1, 'ab', (1, 2))



In [8]:

    
a_tuple[2]









    Out[8]:





(1, 2)

2.4 Lists

A list is an ordered sequence of objects that can be changed.



In [9]:

    
a_list = [1, 'a', [1,2]]



In [10]:

    
a_list[0]









    Out[10]:





1



In [11]:

    
a_list.append(2)
a_list









    Out[11]:





[1, 'a', [1, 2], 2]



In [12]:

    
a_list









    Out[12]:





[1, 'a', [1, 2], 2]



In [13]:

    
dir(a_list)









    Out[13]:





['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']



In [14]:

    
help (a_list)









    



Help on list object:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __mul__(self, value, /)
 |      Return self*value.n
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __reversed__(...)
 |      L.__reversed__() -- return a reverse iterator over the list
 |  
 |  __rmul__(self, value, /)
 |      Return self*value.
 |  
 |  __setitem__(self, key, value, /)
 |      Set self[key] to value.
 |  
 |  __sizeof__(...)
 |      L.__sizeof__() -- size of L in memory, in bytes
 |  
 |  append(...)
 |      L.append(object) -> None -- append object to end
 |  
 |  clear(...)
 |      L.clear() -> None -- remove all items from L
 |  
 |  copy(...)
 |      L.copy() -> list -- a shallow copy of L
 |  
 |  count(...)
 |      L.count(value) -> integer -- return number of occurrences of value
 |  
 |  extend(...)
 |      L.extend(iterable) -> None -- extend list by appending elements from the iterable
 |  
 |  index(...)
 |      L.index(value, [start, [stop]]) -> integer -- return first index of value.
 |      Raises ValueError if the value is not present.
 |  
 |  insert(...)
 |      L.insert(index, object) -- insert object before index
 |  
 |  pop(...)
 |      L.pop([index]) -> item -- remove and return item at index (default last).
 |      Raises IndexError if list is empty or index is out of range.
 |  
 |  remove(...)
 |      L.remove(value) -> None -- remove first occurrence of value.
 |      Raises ValueError if the value is not present.
 |  
 |  reverse(...)
 |      L.reverse() -- reverse *IN PLACE*
 |  
 |  sort(...)
 |      L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __hash__ = None



In [15]:

    
a_list.count(1)









    Out[15]:





1

2.5 Dictionaries

A dictionary is a kind of associates a key with a value. A value can be any object, even another dictionary.



In [16]:

    
dessert_dict = {}  # Empty dictionary
dessert_dict['Dave'] = "Cake"
dessert_dict["Joe"] = ["Cake", "Pie"]
print (dessert_dict)









    



{'Dave': 'Cake', 'Joe': ['Cake', 'Pie']}



In [17]:

    
dessert_dict["Dave"]









    Out[17]:





'Cake'



In [18]:

    
# This produces an error
dessert_dict["Bernease"] = {}
dessert_dict









    Out[18]:





{'Bernease': {}, 'Dave': 'Cake', 'Joe': ['Cake', 'Pie']}



In [19]:

    
dessert_dict["Bernease"] = {"Favorite": ["sorbet", "cobbler"], "Dislike": "Brownies"}

2.7 A Shakespearean Detour: "What's in a Name?"

Deep vs. Shallow Copies

A deep copy can be manipulated separately. A shallow copy is a pointer to the same data as the original.



In [20]:

    
# A first name shell game
first_int = 1
second_int = first_int
second_int += 1
second_int









    Out[20]:





2



In [21]:

    
# What is first_int?
first_int









    Out[21]:





1



In [22]:

    
# A second name shell game
a_list = ['a', 'aa', 'aaa']
b_list = a_list
b_list.append('bb')
b_list









    Out[22]:





['a', 'aa', 'aaa', 'bb']



In [23]:

    
# What is a_list?
a_list









    Out[23]:





['a', 'aa', 'aaa', 'bb']



In [24]:

    
# Create a deep copy
import copy
# A second name shell game
a_list = ['a', 'aa', 'aaa']
b_list = copy.deepcopy(a_list)
b_list.append('bb')
print("b_list = %s" % str(b_list))
print("a_list = %s" % str(a_list))









    



b_list = ['a', 'aa', 'aaa', 'bb']
a_list = ['a', 'aa', 'aaa']

Key insight: Deep vs. Shallow Copies

A deep copy can be manipulated separately from the original.
A shallow copy cannot.
Assigning a python immutable creates a deep copy. Non-immutables are shallow copies.

Name Resolution

The most common errors that you'll see in your python codes are:

NameError
AttributeError A common error when using the bash shell is command not found.

Name resolution: Associating a name with code or data.

Resolving a name in the bash shell is done by searching the directories in the PATH environment variable. The first executable with the name is run.



In [25]:

    
# Example 1 of name resolution in python
var = 10
def func(val):
    var = val + 1
    return val



In [26]:

    
# What is returned?
print("func(2) = %d" % func(2))
# What is var?
print("var = %d" % var)









    



func(2) = 2
var = 10



In [27]:

    
# Example 2 of name resolution in python
var = 10
def func(val):
    return val + var



In [28]:

    
# What is returned?
print("func(2) = %d" % func(2))
# What is var?
print("var = %d" % var)









    



func(2) = 12
var = 10

Insights on python name resolution

Names are assigned within a context.
Context changes with the function and module.
- Assigning a name in a function creates a new name.
- Referencing an unassigned name in function uses an existing name.

2.7 Object Essentials

Objects are a "packaging" of data and code. Almost all python entities are objects.



In [29]:

    
# A list and a dict are objects.
# dict has been implemented so that you see its values when you type
# the instance name.
# This is done with many python objects, like list.
a_dict = {'a': [1, 2], 'b': [3, 4, 5]}
a_dict









    Out[29]:





{'a': [1, 2], 'b': [3, 4, 5]}



In [30]:

    
# You access the data and methods (codes) associated with an object by
# using the "." operator. These are referred to collectively
# as attributes. Methods are followed by parentheses;
# values (properties) are not.
a_dict.keys()









    Out[30]:





dict_keys(['a', 'b'])



In [31]:

    
# You can discover the attributes of an object using "dir"
dir(a_dict)









    Out[31]:





['__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

2.8 Summary

type	description
primitive	int, float, string, bool
tuple	An immutable collection of ordered objects
list	A mutable collection of ordered objects
dictionary	A mutable collection of named objects
object	A packaging of codes and data