Data Basics

Overview

  • Data basics
    • Theory of types
    • Basic types
    • Composite types: Tuples, lists and dictionaries
  • Functions
    • range() function
    • read and write files
  • Flow of control
    • If
    • for

Data Basics

Core concepts

  • An object is an element used in and possibly maniuplated by software.
  • Objects have two kinds of attributes
    • Properties ("values")
    • Methods ("functions")
  • To access the attributes of a python object, use the dot (".") operator.
  • A type (or class) specifies a set of attributes supported by an object.

Basic types

  • int, float, string, bool
  • relating types

In [2]:
# Let's do something with strings
a_string = "Happy birthday"
type(a_string)


Out[2]:
str

In [3]:
a_string.count("p")


Out[3]:
2

In [4]:
a = 3

In [5]:
# We can't count strings in an integer
a.count("p")


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-ec2e978d8aa6> in <module>
      1 # We can't count strings in an integer
----> 2 a.count("p")

AttributeError: 'int' object has no attribute 'count'

In [6]:
x = True

In [7]:
type(x)


Out[7]:
bool

In [8]:
x = a > 4

In [9]:
x


Out[9]:
False

In [10]:
x = "3"
x + 2


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-610aa973b1c1> in <module>
      1 x = "3"
----> 2 x + 2

TypeError: must be str, not int

Advanced types

Tuples

Let's begin by creating a tuple called my_tuple that contains three elements.


In [11]:
my_tuple = ('I', 'like', 'cake')
my_tuple


Out[11]:
('I', 'like', 'cake')

In [13]:
my_tuple[1]


Out[13]:
'like'

In [15]:
my_tuple[1:3]


Out[15]:
('like', 'cake')

Tuples are simple containers for data. They are ordered, meaining the order the elements are in when the tuple is created are preserved. We can get values from our tuple by using array indexing, similar to what we were doing with pandas.


In [ ]:
my_tuple[0]

Recall that Python indexes start at 0. So the first element in a tuple is 0 and the last is array length - 1. You can also address from the end to the front by using negative (-) indexes, e.g.


In [ ]:
my_tuple[-1]

You can also access a range of elements, e.g. the first two, the first three, by using the : to expand a range. This is called slicing.


In [ ]:
my_tuple[0:2]

In [ ]:
my_tuple[0:3]

What do you notice about how the upper bound is referenced?

Without either end, the : expands to the entire list.


In [ ]:
my_tuple[1:]

In [ ]:
my_tuple[:-1]

In [ ]:
my_tuple[:]

Tuples have a key feature that distinguishes them from other types of object containers in Python. They are immutable. This means that once the values are set, they cannot change.


In [ ]:
my_tuple[2]

So what happens if I decide that I really prefer pie over cake?


In [16]:
my_tuple[2] = 'pie'


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-1c13e20368c3> in <module>
----> 1 my_tuple[2] = 'pie'

TypeError: 'tuple' object does not support item assignment

Facts about tuples:

  • You can't add elements to a tuple. Tuples have no append or extend method.
  • You can't remove elements from a tuple. Tuples have no remove or pop method.
  • You can also use the in operator to check if an element exists in the tuple.

So then, what are the use cases of tuples?

  • Speed
  • Write-protects data that other pieces of code should not alter

Lists

Let's begin by creating a list called my_list that contains three elements.


In [17]:
my_list = ['I', 'like', 'cake']
a_tuple = ('I', 'like', 'cake')

In [19]:
my_list[2]


Out[19]:
'cake'

At first glance, tuples and lists look pretty similar. Notice the lists use '[' and ']' instead of '(' and ')'. But indexing and refering to the first entry as 0 and the last as -1 still works the same.


In [ ]:
my_list[0]

In [ ]:
my_list[-1]

In [ ]:
my_list[0:3]

Lists, however, unlike tuples, are mutable.


In [ ]:
my_list[2] = 'pie'
my_list

In [7]:
# Some methods for lists
my_list.index("cake")


Out[7]:
2

In [20]:
my_list


Out[20]:
['I', 'like', 'cake']

In [23]:
my_list.append("very")

In [24]:
my_list


Out[24]:
['I', 'like', 'cake', 'very']

In [29]:
my_list.sort()

In [30]:
my_list


Out[30]:
['I', 'cake', 'like', 'very']

There are other useful methods on lists, including:

methods description
list.append(obj) Appends object obj to list
list.count(obj) Returns count of how many times obj occurs in list
list.extend(seq) Appends the contents of seq to list
list.index(obj) Returns the lowest index in list that obj appears
list.insert(index, obj) Inserts object obj into list at offset index
list.pop(obj=list[-1]) Removes and returns last object or obj from list
list.remove(obj) Removes object obj from list
list.reverse() Reverses objects of list in place
list.sort([func]) Sort objects of list, use compare func, if given

Try some of them now.

my_list.count('I')
my_list

my_list.append('I')
my_list

my_list.count('I')
my_list

#my_list.index(42)

my_list.index('puppies')
my_list

my_list.insert(my_list.index('puppies'), 'furry')
my_list

Dictionaries


Dictionaries are similar to tuples and lists in that they hold a collection of objects. Dictionaries, however, allow an additional indexing mode: keys. Think of a real dictionary where the elements in it are the definitions of the words and the keys to retrieve the entries are the words themselves.

word definition
tuple An immutable collection of ordered objects
list A mutable collection of ordered objects
dictionary A mutable collection of named objects

Let's create this data structure now. Dictionaries, like tuples and elements use a unique referencing method, '{' and its evil twin '}'.


In [31]:
my_dict = {"birthday": "Jan 1", "present": "vacine"}

In [33]:
my_dict.keys()


Out[33]:
dict_keys(['birthday', 'present'])

In [34]:
my_dict.values()


Out[34]:
dict_values(['Jan 1', 'vacine'])

In [ ]:


In [ ]:
my_dict = { 'tuple' : 'An immutable collection of ordered objects',
            'list' : 'A mutable collection of ordered objects',
            'dictionary' : 'A mutable collection of objects' }
my_dict

We access items in the dictionary by name, e.g.


In [ ]:
my_dict['dictionary']

Since the dictionary is mutable, you can change the entries.


In [ ]:
my_dict['dictionary'] = 'A mutable collection of named objects'
my_dict

Notice that ordering is not preserved!

And we can add new items to the list.


In [ ]:
my_dict['cabbage'] = 'Green leafy plant in the Brassica family'
my_dict

To delete an entry, we can't just set it to None


In [ ]:
my_dict['cabbage'] = None
my_dict

To delete it propery, we need to pop that specific entry.


In [ ]:
my_dict.pop('cabbage', None)
my_dict

You can use other objects as names, but that is a topic for another time. You can mix and match key types, e.g.


In [ ]:
my_new_dict = {}
my_new_dict[1] = 'One'
my_new_dict['42'] = 42
my_new_dict

You can get a list of keys in the dictionary by using the keys method.


In [ ]:
my_dict.keys()

Similarly the contents of the dictionary with the items method.


In [ ]:
my_dict.items()

We can use the keys list for fun stuff, e.g. with the in operator.


In [ ]:
'dictionary' in my_dict.keys()

This is a synonym for in my_dict


In [ ]:
'dictionary' in my_dict

Notice, it doesn't work for elements.


In [ ]:
'A mutable collection of ordered objects' in my_dict

Other dictionary methods:

methods description
dict.clear() Removes all elements from dict
dict.get(key, default=None) For key key, returns value or default if key doesn't exist in dict
dict.items() Returns a list of dicts (key, value) tuple pairs
dict.keys() Returns a list of dictionary keys
dict.setdefault(key, default=None) Similar to get, but set the value of key if it doesn't exist in dict
dict.update(dict2) Add the key / value pairs in dict2 to dict
dict.values Returns a list of dictionary values

In [36]:
help( dict)


Help on class dict in module builtins:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      True if D has a key k, else False.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __setitem__(self, key, value, /)
 |      Set self[key] to value.
 |  
 |  __sizeof__(...)
 |      D.__sizeof__() -> size of D in memory, in bytes
 |  
 |  clear(...)
 |      D.clear() -> None.  Remove all items from D.
 |  
 |  copy(...)
 |      D.copy() -> a shallow copy of D
 |  
 |  fromkeys(iterable, value=None, /) from builtins.type
 |      Returns a new dict with keys from iterable and values equal to value.
 |  
 |  get(...)
 |      D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.
 |  
 |  items(...)
 |      D.items() -> a set-like object providing a view on D's items
 |  
 |  keys(...)
 |      D.keys() -> a set-like object providing a view on D's keys
 |  
 |  pop(...)
 |      D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
 |      If key is not found, d is returned if given, otherwise KeyError is raised
 |  
 |  popitem(...)
 |      D.popitem() -> (k, v), remove and return some (key, value) pair as a
 |      2-tuple; but raise KeyError if D is empty.
 |  
 |  setdefault(...)
 |      D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
 |  
 |  update(...)
 |      D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
 |      If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]
 |      If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
 |      In either case, this is followed by: for k in F:  D[k] = F[k]
 |  
 |  values(...)
 |      D.values() -> an object providing a view on D's values
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __hash__ = None