Python for Humanist (Part I)

This workshop is licensed under a Creative Commons Attribution 4.0 International License. Download Code & Data here: git repo

What is Python?

  • General programming language
  • Open and free environment
    • Lots of community support

Why use Python?

General programming language...good at most tasks

  • It's free
  • Common Uses
    • Web Scraping (Text & Data Mining)
    • Web Applications
    • Repetitive tasks & task automation
    • Transforming & Manipulating data
    • Row by Row

Python Setup

Installing Python

  • Download & install manually
    • Usually through command line (shell)
  • Part of IDE or package

Creating a Jupyter Notebook

  • Open the Anaconda navigator
  • Click 'launch' under Jupyter Notebook
  • Opens in your default web browser
  • Navigate to the location on your machine where you'd like the notebook to be saved
  • Click 'New'
    • Select 'Python 3' from the dropdown list

Use the Jupyter Notebook for editing and running Python.

  • While it's common to write Python scripts using a text editor, we are going to use the Jupyter Notebook for the remainder of this workshop.
  • This has several advantages:
    • You can easily type, edit, and copy and paste blocks of code.
    • Tab complete allows you to easily access the names of things you are using and learn more about them.
    • It allows you to annotate your code with links, different sized text, bullets, etc. to make it more accessible to you and your collaborators.
    • It allows you to display figures next to the code that produces them to tell a complete story of the analysis.
  • Each notebook contains one or more cells that contain code, text, or images.

Running python code in jupyter

  • Code vs Text
    • Jupyter mixes code and text in different types of blocks, called cells.
  • Markdown can be used to style Text cells
  • Executable code is written in Code cells
  • CTRL + Enter will run a cell; Shift + Enter will run a cell and highlight the next cell.
    • Running a Text/Markdown cell will render the markdown
    • Running a Code cell will execute the python code

In [1]:
5 - 2


Out[1]:
3

Variables & Data Types

Use variables to store values.

  • Variables are names for values.
  • In Python the = symbol assigns the value on the right to the name on the left.
  • The variable is created when a value is assigned to it.
  • Here, Python assigns an age to a variable age and a name in quotes to a variable first_name.

In [2]:
age = 42
first_name = 'Ahmed'
  • Variable names
    • can only contain letters, digits, and underscore _ (typically used to separate words in long variable names)
    • cannot start with a digit
  • Variable names that start with underscores like __alistairs_real_age have a special meaning so we won't do that until we understand the convention.

Use print to display values.

  • Python has a built-in function called print that prints things as text.
  • Call the function (i.e., tell Python to run it) by using its name.
  • Provide values to the function (i.e., the things to print) in parentheses.
  • To add a string to the printout, wrap the string in single or double quotes.
  • The values passed to the function are called 'arguments'

In [3]:
print(first_name, 'is', age, 'years old')


Ahmed is 42 years old
  • print automatically puts a single space between items to separate them.
  • And wraps around to a new line at the end.

Variables must be created before they are used.

  • If a variable doesn't exist yet, or if the name has been mis-spelled, Python reports an error.
    • Unlike some languages, which "guess" a default value.

In [4]:
print(last_name)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-a637c5453549> in <module>
----> 1 print(last_name)

NameError: name 'last_name' is not defined

Variables can be used in calculations.

  • We can use variables in calculations just as if they were values.
    • Remember, we assigned 42 to age a few lines ago.

In [5]:
age = age + 3
print('Age in three years:', age)


Age in three years: 45

Every value has a type.

  • Every value in a program has a specific type.
  • Integer (int): represents positive or negative whole numbers like 3 or -512.
  • Floating point number (float): represents real numbers like 3.14159 or -2.5.
  • Character string (usually called "string", str): text.
    • Written in either single quotes or double quotes (as long as they match).
    • The quote marks aren't printed when the string is displayed.

Use the built-in function type to find the type of a value.

  • Use the built-in function type to find out what type a value has.
  • Works on variables as well.
    • But remember: the value has the type --- the variable is just a label.

In [6]:
print(type(52))


<class 'int'>

In [7]:
height = 'average'
print(type(height))


<class 'str'>

Types control what operations (or methods) can be performed on a given value.

  • A value's type determines what the program can do to it.

In [8]:
print(5 - 3)


2

In [9]:
print('hello' - 'h')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-35e8597b28d6> in <module>
----> 1 print('hello' - 'h')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

You can use the "+" and "*" operators on strings.

  • "Adding" character strings concatenates them.

In [10]:
full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)


Ahmed Walsh
  • Multiplying a character string by an integer N creates a new string that consists of that character string repeated N times.
    • Since multiplication is repeated addition.

In [11]:
separator = '=' * 10
print(separator)


==========

Strings have a length (but numbers don't).

  • The built-in function len counts the number of characters in a string.

In [12]:
print(len(full_name))


11
  • But numbers don't have a length (not even zero).

In [13]:
print(len(52))


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-8e77a6522867> in <module>
----> 1 print(len(52))

TypeError: object of type 'int' has no len()

Must convert numbers to strings or vice versa when operating on them.

  • Cannot add numbers and strings.

In [14]:
print(1 + '2')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-013270d67d3d> in <module>
----> 1 print(1 + '2')

TypeError: unsupported operand type(s) for +: 'int' and 'str'
  • Not allowed because it's ambiguous: should 1 + '2' be 3 or '12'?
  • Some types can be converted to other types by using the type name as a function.

In [15]:
print(1 + int('2'))
print(str(1) + '2')


3
12

Can mix integers and floats freely in operations.

  • Integers and floating-point numbers can be mixed in arithmetic.
    • Python 3 automatically converts integers to floats as needed. (Integer division in Python 2 will return an integer, the floor of the division.)

In [16]:
print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)


half is 0.5
three squared is 9.0

Variables only change value when something is assigned to them.

  • If we make one cell in a spreadsheet depend on another, and update the latter, the former updates automatically.
  • This does not happen in programming languages.

In [17]:
first = 1
second = 5 * first
first = 2
print('first is', first, 'and second is', second)


first is 2 and second is 5
  • The computer reads the value of first when doing the multiplication, creates a new value, and assigns it to second.
  • After that, second does not remember where it came from.

Built-In Functions & Libraries

Use comments to add documentation to programs.


In [18]:
# This sentence isn't executed by Python.
adjustment = 0.5   # Neither is this - anything after '#' is ignored.
print(adjustment)


0.5

A function may take zero or more arguments.

  • We have seen some functions already --- now let's take a closer look.
  • An argument is a value passed into a function.
  • len takes exactly one.
  • int, str, and float create a new value from an existing one.
  • print takes zero or more.
  • print with no arguments prints a blank line.
    • Must always use parentheses, even if they're empty, so that Python knows a function is being called.

In [19]:
print('before')
print()
print('after')


before

after

Commonly-used built-in functions include max, min, and round.

  • Use max to find the largest value of one or more values.
  • Use min to find the smallest.
  • Both work on character strings as well as numbers.
    • "Larger" and "smaller" use (0-9, A-Z, a-z) to compare letters.

In [20]:
print(max(1,2,3))
print(min('a', 'A', '0'))


3
0

Functions may only work for certain (combinations of) arguments.

  • max and min must be given at least one argument.
    • "Largest of the empty set" is a meaningless question.
  • And they must be given things that can meaningfully be compared.

In [21]:
print(max(1, 'a'))


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-3f049acf3762> in <module>
----> 1 print(max(1, 'a'))

TypeError: '>' not supported between instances of 'str' and 'int'

Functions may have default values for some arguments.

  • round will round off a floating-point number.
  • By default, rounds to zero decimal places.

In [22]:
round(3.712)


Out[22]:
4
  • We can specify the number of decimal places we want.

In [23]:
round(3.712, 1)


Out[23]:
3.7

Use the built-in function help to get help for a function.

  • Every built-in function has online documentation.

In [24]:
help(round)


Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.

Python objects also have built-in methods

  • Methods are similar to functions except they are called on an object.
  • Methods are depentdent on the object type.
    • str objects in python have many built-in methods including startswith(), title(), and replace().

In [25]:
text = 'Hello there, nice to meet you!'
print(text)
print(text.startswith('hell'))
print(text.replace('Hello','Goodbye'))
print(text.title().replace('!','?'))


Hello there, nice to meet you!
False
Goodbye there, nice to meet you!
Hello There, Nice To Meet You?

Most of the power of a programming language is in its libraries.

  • A library is a collection of files (called modules) that contains functions for use by other programs.
    • May also contain data values (e.g., numerical constants) and other things.
    • Library's contents are supposed to be related, but there's no way to enforce that.
  • The Python standard library is an extensive suite of modules that comes with Python itself.
  • Many additional libraries are available from [PyPI][pypi] (the Python Package Index).
  • We will see later how to write new libraries.

A program must import a library module before using it.

  • Use import to load a library module into a program's memory.
  • Then refer to things from the module as module_name.thing_name.
    • Python uses . to mean "part of".
  • Using datetime, one of the modules in the standard library:

In [26]:
import datetime

today = datetime.datetime.now()
nextYear = today + datetime.timedelta(days=365)

print(today)
print(nextYear)


2019-09-05 16:58:01.117005
2020-09-04 16:58:01.117005
  • Have to refer to each item with the module's name.

Use help to learn about the contents of a library module.

  • Works just like help for a function.

In [27]:
help(datetime)


Help on module datetime:

NAME
    datetime - Fast implementation of the datetime type.

MODULE REFERENCE
    https://docs.python.org/3.7/library/datetime
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

CLASSES
    builtins.object
        date
            datetime
        time
        timedelta
        tzinfo
            timezone
    
    class date(builtins.object)
     |  date(year, month, day) --> date object
     |  
     |  Methods defined here:
     |  
     |  __add__(self, value, /)
     |      Return self+value.
     |  
     |  __eq__(self, value, /)
     |      Return self==value.
     |  
     |  __format__(...)
     |      Formats self with strftime.
     |  
     |  __ge__(self, value, /)
     |      Return self>=value.
     |  
     |  __getattribute__(self, name, /)
     |      Return getattr(self, name).
     |  
     |  __gt__(self, value, /)
     |      Return self>value.
     |  
     |  __hash__(self, /)
     |      Return hash(self).
     |  
     |  __le__(self, value, /)
     |      Return self<=value.
     |  
     |  __lt__(self, value, /)
     |      Return self<value.
     |  
     |  __ne__(self, value, /)
     |      Return self!=value.
     |  
     |  __radd__(self, value, /)
     |      Return value+self.
     |  
     |  __reduce__(...)
     |      __reduce__() -> (cls, state)
     |  
     |  __repr__(self, /)
     |      Return repr(self).
     |  
     |  __rsub__(self, value, /)
     |      Return value-self.
     |  
     |  __str__(self, /)
     |      Return str(self).
     |  
     |  __sub__(self, value, /)
     |      Return self-value.
     |  
     |  ctime(...)
     |      Return ctime() style string.
     |  
     |  isocalendar(...)
     |      Return a 3-tuple containing ISO year, week number, and weekday.
     |  
     |  isoformat(...)
     |      Return string in ISO 8601 format, YYYY-MM-DD.
     |  
     |  isoweekday(...)
     |      Return the day of the week represented by the date.
     |      Monday == 1 ... Sunday == 7
     |  
     |  replace(...)
     |      Return date with new specified fields.
     |  
     |  strftime(...)
     |      format -> strftime() style string.
     |  
     |  timetuple(...)
     |      Return time tuple, compatible with time.localtime().
     |  
     |  toordinal(...)
     |      Return proleptic Gregorian ordinal.  January 1 of year 1 is day 1.
     |  
     |  weekday(...)
     |      Return the day of the week represented by the date.
     |      Monday == 0 ... Sunday == 6
     |  
     |  ----------------------------------------------------------------------
     |  Class methods defined here:
     |  
     |  fromisoformat(...) from builtins.type
     |      str -> Construct a date from the output of date.isoformat()
     |  
     |  fromordinal(...) from builtins.type
     |      int -> date corresponding to a proleptic Gregorian ordinal.
     |  
     |  fromtimestamp(...) from builtins.type
     |      timestamp -> local date from a POSIX timestamp (like time.time()).
     |  
     |  today(...) from builtins.type
     |      Current date or datetime:  same as self.__class__.fromtimestamp(time.time()).
     |  
     |  ----------------------------------------------------------------------
     |  Static methods defined here:
     |  
     |  __new__(*args, **kwargs) from builtins.type
     |      Create and return a new object.  See help(type) for accurate signature.
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  day
     |  
     |  month
     |  
     |  year
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |  
     |  max = datetime.date(9999, 12, 31)
     |  
     |  min = datetime.date(1, 1, 1)
     |  
     |  resolution = datetime.timedelta(days=1)
    
    class datetime(date)
     |  datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])
     |  
     |  The year, month and day arguments are required. tzinfo may be None, or an
     |  instance of a tzinfo subclass. The remaining arguments may be ints.
     |  
     |  Method resolution order:
     |      datetime
     |      date
     |      builtins.object
     |  
     |  Methods defined here:
     |  
     |  __add__(self, value, /)
     |      Return self+value.
     |  
     |  __eq__(self, value, /)
     |      Return self==value.
     |  
     |  __ge__(self, value, /)
     |      Return self>=value.
     |  
     |  __getattribute__(self, name, /)
     |      Return getattr(self, name).
     |  
     |  __gt__(self, value, /)
     |      Return self>value.
     |  
     |  __hash__(self, /)
     |      Return hash(self).
     |  
     |  __le__(self, value, /)
     |      Return self<=value.
     |  
     |  __lt__(self, value, /)
     |      Return self<value.
     |  
     |  __ne__(self, value, /)
     |      Return self!=value.
     |  
     |  __radd__(self, value, /)
     |      Return value+self.
     |  
     |  __reduce__(...)
     |      __reduce__() -> (cls, state)
     |  
     |  __reduce_ex__(...)
     |      __reduce_ex__(proto) -> (cls, state)
     |  
     |  __repr__(self, /)
     |      Return repr(self).
     |  
     |  __rsub__(self, value, /)
     |      Return value-self.
     |  
     |  __str__(self, /)
     |      Return str(self).
     |  
     |  __sub__(self, value, /)
     |      Return self-value.
     |  
     |  astimezone(...)
     |      tz -> convert to local time in new timezone tz
     |  
     |  ctime(...)
     |      Return ctime() style string.
     |  
     |  date(...)
     |      Return date object with same year, month and day.
     |  
     |  dst(...)
     |      Return self.tzinfo.dst(self).
     |  
     |  isoformat(...)
     |      [sep] -> string in ISO 8601 format, YYYY-MM-DDT[HH[:MM[:SS[.mmm[uuu]]]]][+HH:MM].
     |      sep is used to separate the year from the time, and defaults to 'T'.
     |      timespec specifies what components of the time to include (allowed values are 'auto', 'hours', 'minutes', 'seconds', 'milliseconds', and 'microseconds').
     |  
     |  replace(...)
     |      Return datetime with new specified fields.
     |  
     |  time(...)
     |      Return time object with same time but with tzinfo=None.
     |  
     |  timestamp(...)
     |      Return POSIX timestamp as float.
     |  
     |  timetuple(...)
     |      Return time tuple, compatible with time.localtime().
     |  
     |  timetz(...)
     |      Return time object with same time and tzinfo.
     |  
     |  tzname(...)
     |      Return self.tzinfo.tzname(self).
     |  
     |  utcoffset(...)
     |      Return self.tzinfo.utcoffset(self).
     |  
     |  utctimetuple(...)
     |      Return UTC time tuple, compatible with time.localtime().
     |  
     |  ----------------------------------------------------------------------
     |  Class methods defined here:
     |  
     |  combine(...) from builtins.type
     |      date, time -> datetime with same date and time fields
     |  
     |  fromisoformat(...) from builtins.type
     |      string -> datetime from datetime.isoformat() output
     |  
     |  fromtimestamp(...) from builtins.type
     |      timestamp[, tz] -> tz's local time from POSIX timestamp.
     |  
     |  now(tz=None) from builtins.type
     |      Returns new datetime object representing current time local to tz.
     |      
     |        tz
     |          Timezone object.
     |      
     |      If no tz is specified, uses local timezone.
     |  
     |  strptime(...) from builtins.type
     |      string, format -> new datetime parsed from a string (like time.strptime()).
     |  
     |  utcfromtimestamp(...) from builtins.type
     |      Construct a naive UTC datetime from a POSIX timestamp.
     |  
     |  utcnow(...) from builtins.type
     |      Return a new datetime representing UTC day and time.
     |  
     |  ----------------------------------------------------------------------
     |  Static methods defined here:
     |  
     |  __new__(*args, **kwargs) from builtins.type
     |      Create and return a new object.  See help(type) for accurate signature.
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  fold
     |  
     |  hour
     |  
     |  microsecond
     |  
     |  minute
     |  
     |  second
     |  
     |  tzinfo
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |  
     |  max = datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)
     |  
     |  min = datetime.datetime(1, 1, 1, 0, 0)
     |  
     |  resolution = datetime.timedelta(microseconds=1)
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from date:
     |  
     |  __format__(...)
     |      Formats self with strftime.
     |  
     |  isocalendar(...)
     |      Return a 3-tuple containing ISO year, week number, and weekday.
     |  
     |  isoweekday(...)
     |      Return the day of the week represented by the date.
     |      Monday == 1 ... Sunday == 7
     |  
     |  strftime(...)
     |      format -> strftime() style string.
     |  
     |  toordinal(...)
     |      Return proleptic Gregorian ordinal.  January 1 of year 1 is day 1.
     |  
     |  weekday(...)
     |      Return the day of the week represented by the date.
     |      Monday == 0 ... Sunday == 6
     |  
     |  ----------------------------------------------------------------------
     |  Class methods inherited from date:
     |  
     |  fromordinal(...) from builtins.type
     |      int -> date corresponding to a proleptic Gregorian ordinal.
     |  
     |  today(...) from builtins.type
     |      Current date or datetime:  same as self.__class__.fromtimestamp(time.time()).
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors inherited from date:
     |  
     |  day
     |  
     |  month
     |  
     |  year
    
    class time(builtins.object)
     |  time([hour[, minute[, second[, microsecond[, tzinfo]]]]]) --> a time object
     |  
     |  All arguments are optional. tzinfo may be None, or an instance of
     |  a tzinfo subclass. The remaining arguments may be ints.
     |  
     |  Methods defined here:
     |  
     |  __eq__(self, value, /)
     |      Return self==value.
     |  
     |  __format__(...)
     |      Formats self with strftime.
     |  
     |  __ge__(self, value, /)
     |      Return self>=value.
     |  
     |  __getattribute__(self, name, /)
     |      Return getattr(self, name).
     |  
     |  __gt__(self, value, /)
     |      Return self>value.
     |  
     |  __hash__(self, /)
     |      Return hash(self).
     |  
     |  __le__(self, value, /)
     |      Return self<=value.
     |  
     |  __lt__(self, value, /)
     |      Return self<value.
     |  
     |  __ne__(self, value, /)
     |      Return self!=value.
     |  
     |  __reduce__(...)
     |      __reduce__() -> (cls, state)
     |  
     |  __reduce_ex__(...)
     |      __reduce_ex__(proto) -> (cls, state)
     |  
     |  __repr__(self, /)
     |      Return repr(self).
     |  
     |  __str__(self, /)
     |      Return str(self).
     |  
     |  dst(...)
     |      Return self.tzinfo.dst(self).
     |  
     |  isoformat(...)
     |      Return string in ISO 8601 format, [HH[:MM[:SS[.mmm[uuu]]]]][+HH:MM].
     |      
     |      timespec specifies what components of the time to include.
     |  
     |  replace(...)
     |      Return time with new specified fields.
     |  
     |  strftime(...)
     |      format -> strftime() style string.
     |  
     |  tzname(...)
     |      Return self.tzinfo.tzname(self).
     |  
     |  utcoffset(...)
     |      Return self.tzinfo.utcoffset(self).
     |  
     |  ----------------------------------------------------------------------
     |  Class methods defined here:
     |  
     |  fromisoformat(...) from builtins.type
     |      string -> time from time.isoformat() output
     |  
     |  ----------------------------------------------------------------------
     |  Static methods defined here:
     |  
     |  __new__(*args, **kwargs) from builtins.type
     |      Create and return a new object.  See help(type) for accurate signature.
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  fold
     |  
     |  hour
     |  
     |  microsecond
     |  
     |  minute
     |  
     |  second
     |  
     |  tzinfo
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |  
     |  max = datetime.time(23, 59, 59, 999999)
     |  
     |  min = datetime.time(0, 0)
     |  
     |  resolution = datetime.timedelta(microseconds=1)
    
    class timedelta(builtins.object)
     |  Difference between two datetime values.
     |  
     |  Methods defined here:
     |  
     |  __abs__(self, /)
     |      abs(self)
     |  
     |  __add__(self, value, /)
     |      Return self+value.
     |  
     |  __bool__(self, /)
     |      self != 0
     |  
     |  __divmod__(self, value, /)
     |      Return divmod(self, value).
     |  
     |  __eq__(self, value, /)
     |      Return self==value.
     |  
     |  __floordiv__(self, value, /)
     |      Return self//value.
     |  
     |  __ge__(self, value, /)
     |      Return self>=value.
     |  
     |  __getattribute__(self, name, /)
     |      Return getattr(self, name).
     |  
     |  __gt__(self, value, /)
     |      Return self>value.
     |  
     |  __hash__(self, /)
     |      Return hash(self).
     |  
     |  __le__(self, value, /)
     |      Return self<=value.
     |  
     |  __lt__(self, value, /)
     |      Return self<value.
     |  
     |  __mod__(self, value, /)
     |      Return self%value.
     |  
     |  __mul__(self, value, /)
     |      Return self*value.
     |  
     |  __ne__(self, value, /)
     |      Return self!=value.
     |  
     |  __neg__(self, /)
     |      -self
     |  
     |  __pos__(self, /)
     |      +self
     |  
     |  __radd__(self, value, /)
     |      Return value+self.
     |  
     |  __rdivmod__(self, value, /)
     |      Return divmod(value, self).
     |  
     |  __reduce__(...)
     |      __reduce__() -> (cls, state)
     |  
     |  __repr__(self, /)
     |      Return repr(self).
     |  
     |  __rfloordiv__(self, value, /)
     |      Return value//self.
     |  
     |  __rmod__(self, value, /)
     |      Return value%self.
     |  
     |  __rmul__(self, value, /)
     |      Return value*self.
     |  
     |  __rsub__(self, value, /)
     |      Return value-self.
     |  
     |  __rtruediv__(self, value, /)
     |      Return value/self.
     |  
     |  __str__(self, /)
     |      Return str(self).
     |  
     |  __sub__(self, value, /)
     |      Return self-value.
     |  
     |  __truediv__(self, value, /)
     |      Return self/value.
     |  
     |  total_seconds(...)
     |      Total seconds in the duration.
     |  
     |  ----------------------------------------------------------------------
     |  Static methods defined here:
     |  
     |  __new__(*args, **kwargs) from builtins.type
     |      Create and return a new object.  See help(type) for accurate signature.
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  days
     |      Number of days.
     |  
     |  microseconds
     |      Number of microseconds (>= 0 and less than 1 second).
     |  
     |  seconds
     |      Number of seconds (>= 0 and less than 1 day).
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |  
     |  max = datetime.timedelta(days=999999999, seconds=86399, microseconds=9...
     |  
     |  min = datetime.timedelta(days=-999999999)
     |  
     |  resolution = datetime.timedelta(microseconds=1)
    
    class timezone(tzinfo)
     |  Fixed offset from UTC implementation of tzinfo.
     |  
     |  Method resolution order:
     |      timezone
     |      tzinfo
     |      builtins.object
     |  
     |  Methods defined here:
     |  
     |  __eq__(self, value, /)
     |      Return self==value.
     |  
     |  __ge__(self, value, /)
     |      Return self>=value.
     |  
     |  __getinitargs__(...)
     |      pickle support
     |  
     |  __gt__(self, value, /)
     |      Return self>value.
     |  
     |  __hash__(self, /)
     |      Return hash(self).
     |  
     |  __le__(self, value, /)
     |      Return self<=value.
     |  
     |  __lt__(self, value, /)
     |      Return self<value.
     |  
     |  __ne__(self, value, /)
     |      Return self!=value.
     |  
     |  __repr__(self, /)
     |      Return repr(self).
     |  
     |  __str__(self, /)
     |      Return str(self).
     |  
     |  dst(...)
     |      Return None.
     |  
     |  fromutc(...)
     |      datetime in UTC -> datetime in local time.
     |  
     |  tzname(...)
     |      If name is specified when timezone is created, returns the name.  Otherwise returns offset as 'UTC(+|-)HH:MM'.
     |  
     |  utcoffset(...)
     |      Return fixed offset.
     |  
     |  ----------------------------------------------------------------------
     |  Static methods defined here:
     |  
     |  __new__(*args, **kwargs) from builtins.type
     |      Create and return a new object.  See help(type) for accurate signature.
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |  
     |  max = datetime.timezone(datetime.timedelta(seconds=86340))
     |  
     |  min = datetime.timezone(datetime.timedelta(days=-1, seconds=60))
     |  
     |  utc = datetime.timezone.utc
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from tzinfo:
     |  
     |  __getattribute__(self, name, /)
     |      Return getattr(self, name).
     |  
     |  __reduce__(...)
     |      -> (cls, state)
    
    class tzinfo(builtins.object)
     |  Abstract base class for time zone info objects.
     |  
     |  Methods defined here:
     |  
     |  __getattribute__(self, name, /)
     |      Return getattr(self, name).
     |  
     |  __reduce__(...)
     |      -> (cls, state)
     |  
     |  dst(...)
     |      datetime -> DST offset as timedelta positive east of UTC.
     |  
     |  fromutc(...)
     |      datetime in UTC -> datetime in local time.
     |  
     |  tzname(...)
     |      datetime -> string name of time zone.
     |  
     |  utcoffset(...)
     |      datetime -> timedelta showing offset from UTC, negative values indicating West of UTC
     |  
     |  ----------------------------------------------------------------------
     |  Static methods defined here:
     |  
     |  __new__(*args, **kwargs) from builtins.type
     |      Create and return a new object.  See help(type) for accurate signature.

DATA
    MAXYEAR = 9999
    MINYEAR = 1
    datetime_CAPI = <capsule object "datetime.datetime_CAPI">

FILE
    /anaconda3/lib/python3.7/datetime.py


Import specific items from a library module to shorten programs.

  • Use from ... import ... to load only specific items from a library module.
  • Then refer to them directly without library name as prefix.

In [28]:
from datetime import datetime, timedelta

today = datetime.now()
nextYear = today + timedelta(days=365)

print(today)
print(nextYear)


2019-09-05 16:58:02.968072
2020-09-04 16:58:02.968072

Create an alias for a library module when importing it to shorten programs.

  • Use import ... as ... to give a library a short alias while importing it.
  • Then refer to items in the library using that shortened name.

In [29]:
import datetime as dt

today = dt.datetime.now()
nextYear = today + dt.timedelta(days=365)

print(today)
print(nextYear)


2019-09-05 16:58:28.395828
2020-09-04 16:58:28.395828
  • Commonly used for libraries that are frequently used or have long names.
    • E.g., matplotlib plotting library is often aliased as plt.
  • But can make programs harder to understand, since readers must learn your program's aliases.

Lists

A list stores many values in a single structure.

  • Doing calculations with a hundred variables called name_001, name_002, etc., would be at least as slow as doing them by hand.
  • Use a list to store many values together.
    • Contained within square brackets [...].
    • Values separated by commas ,.
  • Use len to find out how many values are in a list.

In [30]:
names = ['Cathy','Doug','Monica','Jake','Peter']
print(type(names))
print('names:', names)
print('length:', len(names))


<class 'list'>
names: ['Cathy', 'Doug', 'Monica', 'Jake', 'Peter']
length: 5

Use an item's index to fetch it from a list.

  • Just like strings.

In [31]:
print('zeroth item of names:', names[0])
print('fourth item of names:', names[4])


zeroth item of names: Cathy
fourth item of names: Peter

Lists' values can be replaced by assigning to them.

  • Use an index expression on the left of assignment to replace a value.

In [32]:
names[0] = 'Catherine'
print('names is now:', names)


names is now: ['Catherine', 'Doug', 'Monica', 'Jake', 'Peter']

Appending items to a list lengthens it.

  • Use list_name.append to add items to the end of a list.

In [33]:
primes = [2, 3, 5]
print('primes is initially:', primes)
primes.append(7)
primes.append(9)
print('primes has become:', primes)


primes is initially: [2, 3, 5]
primes has become: [2, 3, 5, 7, 9]
  • append is a method of lists.
    • Like a function, but tied to a particular object.
  • Use object_name.method_name to call methods.
    • Deliberately resembles the way we refer to things in a library.
  • We will meet other methods of lists as we go along.
    • Use help(list) for a preview.
  • extend is similar to append, but it allows you to combine two lists. For example:

In [34]:
teen_primes = [11, 13, 17, 19]
middle_aged_primes = [37, 41, 43, 47]
print('primes is currently:', primes)
primes.extend(teen_primes)
print('primes has now become:', primes)
primes.append(middle_aged_primes)
print('primes has finally become:', primes)


primes is currently: [2, 3, 5, 7, 9]
primes has now become: [2, 3, 5, 7, 9, 11, 13, 17, 19]
primes has finally become: [2, 3, 5, 7, 9, 11, 13, 17, 19, [37, 41, 43, 47]]

Note that while extend maintains the "flat" structure of the list, appending a list to a list makes the result two-dimensional.

Use del to remove items from a list entirely.

  • del list_name[index] removes an item from a list and shortens the list.
  • Not a function or a method, but a statement in the language.

In [35]:
print('primes before removing last item:', primes)
del primes[4]
print('primes after removing last item:', primes)


primes before removing last item: [2, 3, 5, 7, 9, 11, 13, 17, 19, [37, 41, 43, 47]]
primes after removing last item: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]

The empty list contains no values.

  • Use [] on its own to represent a list that doesn't contain any values.
    • "The zero of lists."
  • Helpful as a starting point for collecting values

Lists may contain values of different types.

  • A single list may contain numbers, strings, and anything else.

In [36]:
goals = []
goals.extend([1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.'])
print(goals)


[1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']

Lists can be sliced

  • We can slice a list to obtain a sub-section of the list
    • Use the index numbers separated by a colon : to designate which slice of the list to take

In [37]:
values = [1,3,4,7,9,13]
print(values[0:2])
print(values[:2])
print(values[2:])
print(values[:-1])


[1, 3]
[1, 3]
[4, 7, 9, 13]
[1, 3, 4, 7, 9]

Indexing beyond the end of the collection is an error.

  • Python reports an IndexError if we attempt to access a value that doesn't exist.
    • This is a kind of runtime error.
    • Cannot be detected as the code is parsed because the index might be calculated based on data.

In [38]:
print('99th element of element is:', element[99])


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-38-f178a3be0901> in <module>
----> 1 print('99th element of element is:', element[99])

NameError: name 'element' is not defined

Dictionaries

A dictionary stores many values in a nested structure.

  • Dictionaries store data in key, value pairs
    • Contained within curly brackets{...}.
    • Key and value seperated by colon "key":"value".
    • Each value followed by a comma ,.
  • Nested structure allow for more complicated data relationships
  • Very common if you're working with web data
    • Dictionaries match JSON file structure


In [39]:
students = {"firstName":"John","lastName":"Smith"}
print(type(students))
print(students)


<class 'dict'>
{'firstName': 'John', 'lastName': 'Smith'}

Adding content to and updating a dictionary

  • You can add content to an existing dictionary
    • Add the desired key name in square brackets ["key"]
    • Set that equal to the desired value for that key ["key"] = "value"

In [40]:
students["age"] = 19
print(students)
students["major"] = ["Art History","French"]
print(students)


{'firstName': 'John', 'lastName': 'Smith', 'age': 19}
{'firstName': 'John', 'lastName': 'Smith', 'age': 19, 'major': ['Art History', 'French']}

Access values by using the keys

  • We use the key names inside square brackets [' '] to access the value.
    • Nested data must be navigated from the top level key.
  • Lists inside of dictionaries are treated the same as a stand-alone list.

In [41]:
print(students["firstName"])
print(students['age'])
print(students['major'][1])


John
19
French

Dictionaries can contain many nested elements

  • Dictionaries can have multiple structured data elements.
  • It's useful to know the structure of your dictionary to access the values of nested elements.

In [42]:
courses = {"courses":[
    {
        "Title":"Intro to Economics",
        "Instructor": {
            "firstName": "Robert",
            "lastName":"Schiller",
        },
        "Number": "ECON 101",
        "Size": 65,
        "isFull": True,
            },
    {
        "Title":"Intro to French",
        "Instructor": {
            "firstName": "Marie",
            "lastName": "Gribouille",
        },
        "Number":"FREN 101",
        "Size": 15,
        "isFull": False,       
           }
           ]}

print(courses)


{'courses': [{'Title': 'Intro to Economics', 'Instructor': {'firstName': 'Robert', 'lastName': 'Schiller'}, 'Number': 'ECON 101', 'Size': 65, 'isFull': True}, {'Title': 'Intro to French', 'Instructor': {'firstName': 'Marie', 'lastName': 'Gribouille'}, 'Number': 'FREN 101', 'Size': 15, 'isFull': False}]}
  • Python starts from the top level element
    • In this example, the top level key is "courses".
    • "courses" is a list with two elements

In [43]:
print(courses.keys())
print(courses.values())


dict_keys(['courses'])
dict_values([[{'Title': 'Intro to Economics', 'Instructor': {'firstName': 'Robert', 'lastName': 'Schiller'}, 'Number': 'ECON 101', 'Size': 65, 'isFull': True}, {'Title': 'Intro to French', 'Instructor': {'firstName': 'Marie', 'lastName': 'Gribouille'}, 'Number': 'FREN 101', 'Size': 15, 'isFull': False}]])

Going deeper

  • To access the values nested inside a dictionary, we must navigate to their level
  • We must step through each level of our nested data
    • 'courses' is the top level or root element
    • courses['courses'] is a list; it's items are accessed using the index
  • .keys() & .values() lists the dictionary keys and values respectively at the current dictionary level

In [44]:
print(courses['courses'][0].keys())
print(courses['courses'][0].values())


dict_keys(['Title', 'Instructor', 'Number', 'Size', 'isFull'])
dict_values(['Intro to Economics', {'firstName': 'Robert', 'lastName': 'Schiller'}, 'ECON 101', 65, True])

In [45]:
print(courses['courses'][1]['Instructor']['lastName'])


Gribouille