Introduction to Python Basics

What is Python?

High-level, interpreted programming language (like R or Matlab)

  • High-level -- code almost looks like natural language
  • Interpreted -- you type code, and then you run it -- you don’t have to ‘compile’ it first into machine-readable code
  • </ul>

    
    
    In [7]:
    print("Hello world")
    
    
    
    
    Hello world
    

    Advantages: Python code is readable, intuitive to work with, fast to program in and iterate your projects
    Disadvantages: it can be slow, but there are ways around this (e.g., multi-core processing)

    So easy, a Russell can do it

    I took a ~16 hour summer class, and was immediately using it for my own projects

    So I'm also a novice! I may have to defer to Henry on some things!

    Python vs. R and Matlab

    1. Free, unlike Matlab!
    2. MANY more people use Python than R or Matlab
      1. Better online support (tutorials, faq’s, etc.)
      2. More pre-existing code that you can use. Many of the problems I google have already been solved.
    3. Python is simply more versatile than R or Matlab. With the right package, you can do anything, not just stats or visualization or modeling.

    Digging in

    Doing simple calculations from the command line

    
    
    In [8]:
    5 + 5
    
    
    
    
    Out[8]:
    10
    
    
    In [13]:
    3 + 2 ** -9
    
    
    
    
    Out[13]:
    3.001953125
    
    
    In [14]:
    'this' + '&' + 'that'
    
    
    
    
    Out[14]:
    'this&that'
    
    
    In [1]:
    print(5 + 5) # This is a comment
    
    
    
    
    10
    

    Data Structures

    Basic data structure types: numerics, strings, lists, tuples, dictionaries… (we'll talk about each)

    Fall into two classes: mutable and immutable

    Mutable: Can be changed in place, i.e., change variable without changing where it is stored in the computer's memory. Defining structure a and then saying b = a will point a and b to the same location in memory. If you define a in terms of b, and then changing one in place changes the other.</br>

    Immutable: Cannot be changed in place.

    Practical demonstration of this difference in a moment...

    Numerics

    Floats (numbers with decimals, roughly), integers. These are immutable.

    
    
    In [2]:
    x = 3 # x is an integer
    
    
    
    In [ ]:
    type(x)
    
    
    
    In [23]:
    y = 4.9
    
    
    
    In [ ]:
    type(y)
    
    
    
    In [ ]:
    y = x
    
    
    
    In [ ]:
    x += 1.2 #now x becomes a float
    
    
    
    In [ ]:
    y # what will the output be? Remember that numerics are immutable
    
    
    
    In [ ]:
    y = float(y) # this changes y's type to float
    
    
    
    In [ ]:
    type(y)
    

    Strings

    Sequences of characters (letters, numbers, punctuation, etc.). Also immutable.

    
    
    In [32]:
    'Hello world'
    
    
    
    
    Out[32]:
    'Hello world'
    
    
    In [34]:
    mystring = 'Hello World'
    
    
    
    In [ ]:
    mystring += ', how are you today?'
    
    
    
    In [ ]:
    mystring
    
    
    
    In [ ]:
    mystring.lower() # object.method()
    
    
    
    In [ ]:
    mystring.split() # returns a list...more on those in a bit
    
    
    
    In [35]:
    mystring.isalpha()
    
    
    
    
    Out[35]:
    False
    
    
    In [ ]:
    mystring = 'HellowWorld'
    
    
    
    In [ ]:
    mystring.isalpha()
    
    
    
    In [32]:
    "The sum of 1 + 2 is {} and not {}".format(1+2,99)
    
    
    
    
    Out[32]:
    'The sum of 1 + 2 is 3 and not 99'

    True and False are their own type: boolean. Equivalent to 1 and 0 (which can be extremely handy.)

    Tuples & Lists

    Tuples & lists: ordered containers of any combination of data structures (strings, integers, variables, other tuples or lists). Tuples are immutable, lists mutable.

    
    
    In [ ]:
    z = ('our', 'first', 'tuple', 9, x) # put paren around items for tuple
    
    
    
    In [ ]:
    z = ['our', 'first', 'list', 3.4, [3,'hi']] # brackets for list
    

    Some list methods

    
    
    In [ ]:
    z.append('eats') # append adds the object itself
    
    
    
    In [ ]:
    z
    
    
    
    In [ ]:
    z.extend('eats') # extend adds the pieces of the object, or iterable
    
    
    
    In [ ]:
    z
    
    
    
    In [ ]:
    z.index('eats')
    
    
    
    In [ ]:
    z.sort()
    
    
    
    In [ ]:
    z
    
    
    
    In [ ]:
    z.pop(2)
    
    
    
    In [ ]:
    z
    

    Tuples vs lists -- slicing, indexing, assignment-via-index

    
    
    In [34]:
    z = 'Russell'
    
    
    
    In [ ]:
    z[0] # indexes first element in iterable
    

    0 is the first index in Python (cf. R and Matlab)!!! </br></br>

    How to think of it...

    Why?

    See:

    http://en.wikipedia.org/wiki/Zero-based_numbering

    http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF <-- charming hand-written note by Edsger W. Dijkstra, a giant in computer science

    https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi <-- written by Guido van Rossum, inventor of Python

    
    
    In [51]:
    z[:3], z[3:] # slicing...3 is, in a manner of speaking, indexing the space between the two 's'
    
    
    
    
    Out[51]:
    ('Rus', 'sell')
    
    
    In [9]:
    y = ['my', 'list']
    
    
    
    In [10]:
    y[0]
    
    
    
    
    Out[10]:
    'my'
    
    
    In [11]:
    y[:1]
    
    
    
    
    Out[11]:
    ['my']
    
    
    In [ ]:
    x = ('my', 'tuple')
    
    
    
    In [ ]:
    x[0]
    
    
    
    In [ ]:
    x[:1]
    

    Can also index, slice, assign using negative numbers, which index from the back of the sequence

    
    
    In [45]:
    z[-1] # NOTE -- negative indexing does NOT start at 0!
    
    
    
    
    Out[45]:
    'l'
    
    
    In [41]:
    z[-3:]
    
    
    
    
    Out[41]:
    'ell'

    Assigning new value via indexing works for lists (which are mutable), but not strings or tuples (which are immutable)

    
    
    In [ ]:
    x[0] = 'your'
    
    
    
    In [ ]:
    z[0] = 'Z'
    
    
    
    In [ ]:
    y[0] = 'your'
    

    So why use tuple instead of list? Computer creates tuples faster, they use less memory, and, like other immutables, they can be used as keys in dictionary.

    Set

    Unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Sets support basic set operations like union, intersection, difference, etc. Sets are mutable.

    Disclaimer: I haven't used these that much yet (though I probably should)...

    
    
    In [42]:
    engineers = {'John', 'Jane', 'Jack', 'Janice'}
    
    
    
    In [44]:
    programmers = {'Jack', 'Sam', 'Susan', 'Janice'}
    

    In addition to set-literals using braces, you can make a set from any iterable. More on iterables next week, but for now, e.g. a list is an iterable.

    
    
    In [45]:
    managers = set(['Jane', 'Jack', 'Susan', 'Zack'])
    
    
    
    In [46]:
    employees = engineers | programmers | managers           # union
    
    
    
    In [47]:
    engineering_management = engineers & managers            # intersection
    
    
    
    In [48]:
    fulltime_management = managers - engineers - programmers # difference
    
    
    
    In [49]:
    engineers.add('Marvin')                                  # add element
    
    
    
    In [50]:
    print(engineers)
    
    
    
    
    {'Janice', 'Jane', 'Jack', 'Marvin', 'John'}
    
    
    
    In [51]:
    employees.issuperset(engineers)     # superset test
    
    
    
    
    Out[51]:
    False
    
    
    In [52]:
    employees.update(engineers)
    
    
    
    In [53]:
    employees.issuperset(engineers)
    
    
    
    
    Out[53]:
    True

    Dictionaries

    Key-value pairs. Keys can be any immutable (e.g., tuples, strings, numerics, but not lists). Members of dictionary are indexed with key, cf. lists which are indexed with a range of numbers. Like sets, checking for membership is fast, O(1) in average case. Cf. lists, for which membership checking is O(n). Dictionaries are mutable.

    
    
    In [3]:
    mydict = {'the':10, 'of': 6}
    
    
    
    In [ ]:
    mydict['the'] #lookup with key
    
    
    
    In [ ]:
    mydict['the'] = 20
    
    
    
    In [ ]:
    mydict
    

    Some useful methods...

    
    
    In [4]:
    mydict.get('cromulent',0) # (key, value to be returned if key not in dict)
    
    
    
    
    Out[4]:
    0
    
    
    In [55]:
    mydict.items()
    
    
    
    
    Out[55]:
    dict_items([('of', 6), ('the', 10)])
    
    
    In [ ]:
    mydict.values()
    
    
    
    In [ ]:
    mydict.keys()
    

    Knowing which data structure to use...

    Do you just need an ordered sequence of items?

      - Use a tuple
    Do you need an ordered sequence of items and need to be able to *change that sequence in place*?
      - Use a list
    Do you just need to know whether or not you've already *got* a particular value, but without ordering, and you don't need to store duplicates?
      - Use a set
    Do you need to associate values with keys, so you can look them up efficiently (by key) later on?
      - Use a dictionary

    Another way to decide between tuples and lists (from Henry):

    "By convention lists tend to be homogeneous. You don't know how many you'll end up with necessarily, but you have a bunch of the same thing. Tuples are not necessarily homogeneous, and the slots usually have some kind of predefined semantics, like ('Henry', 'Harrison', 27)."

    A couple basic functions

    
    
    In [56]:
    x, y, z = 2, 'hi', ['my', 'list']
    
    
    
    In [57]:
    print(x,y,z)
    
    
    
    
    2 hi ['my', 'list']
    
    
    
    In [ ]:
    len(z)
    
    
    
    In [ ]:
    range(10)
    

    Control Flow

    Conditionals (if/thens), loops, list comprehension

    Conditionals

    
    
    In [ ]:
    x = 3
    if x == 3: # notice == for computing truth value
        print('yes')
    
    
    
    In [ ]:
    y = 'tested'
    if len(y) > 6:
        print('yes')
    elif len(y) = 6:
        print('maybe')
    else:
        print('no')
    

    For loops

    For every item in some list, tuple, string, or other iterable, do something with that item

    
    
    In [ ]:
    x = list('Russell')
    for letter in x:
        print(x)
    
    
    
    In [ ]:
    for index, letter in enumerate(x): # enumerate is a generator, not an ordinary function, more next time on genereators
        print letter, index
    
    
    
    In [ ]:
    for index, letter in enumerate(x):
        if index == 0 or index == 1:
            print index, letter
    

    While Loops

    While something is True, do something else.

    
    
    In [141]:
    list1 = ['the','rain','in','spain','falls','mainly','on','the','plain']
    list2 = []
    while len(list1): # len(list1) is True if list1 is not empty, i.e., if len(list1) is > 1
         list2.append(list1.pop(0))
    list2
    
    
    
    
    Out[141]:
    ['the', 'rain', 'in', 'spain', 'falls', 'mainly', 'on', 'the', 'plain']

    List comprehension

    A compressed, elegant way to construct lists with loops

    
    
    In [ ]:
    x = [y**2 for y in range(10)]
    
    
    
    In [65]:
    my_name = 'Russell'
    consonants = set('bcdfghjklmnpqrstvwxz')
    my_consonants = [x for x in my_name if x.lower() in consonants]
    
    
    
    In [67]:
    ''.join(my_consonants) #a string method...take an iterable containing strings, and join them by the string object ('') in the first part of the line
    
    
    
    
    Out[67]:
    'Rssll'

    File input/output

    Let's first make a little text file in textedit, textwrangler, etc.

    
    
    In [103]:
    myfile = open('test_file.txt','r+') # r+ enables reading and writing
    
    
    
    In [104]:
    myfile.read()
    
    
    
    
    Out[104]:
    'This is a test file with some meaningless text in it.\nWe have a few lines of text in it.\nLike this one.\nAnd this one.\nThis is another line'

    Note: if we try to do this again, we get nothing, because the read method changes our position in the file object. We can read the file in again, or change the position with myfile.seek(0).

    
    
    In [105]:
    myfile.read()
    
    
    
    
    Out[105]:
    ''
    
    
    In [112]:
    myfile.seek(0)
    
    
    
    
    Out[112]:
    0
    
    
    In [107]:
    for line in myfile:
        print(line)
    
    
    
    
    This is a test file with some meaningless text in it.
    
    We have a few lines of text in it.
    
    Like this one.
    
    And this one.
    
    This is another line
    
    
    
    In [108]:
    firstline = myfile.readline()
    
    
    
    In [109]:
    print(firstline)
    
    
    
    
    
    
    
    
    In [113]:
    all_lines = myfile.readlines()
    
    
    
    In [114]:
    print(all_lines)
    
    
    
    
    ['This is a test file with some meaningless text in it.\n', 'We have a few lines of text in it.\n', 'Like this one.\n', 'And this one.\n', 'This is another line']
    
    
    
    In [115]:
    myfile.write('\nThis is another line\n')
    
    
    
    
    Out[115]:
    21
    
    
    In [116]:
    x = 'This is a line to be saved to the file'.split()
    
    
    
    In [117]:
    myfile.write(str(x))
    
    
    
    
    Out[117]:
    69
    
    
    In [118]:
    myfile.close() # to save some system resources...important if you have big files
    

    All the above works fine if your data are simple types like numerics or strings. But if you want a to save more complicated structures like dictionaries, and especially if you want to do it on the cheap, there are Python-specific files for doing this (e.g., pickles or pandas).

    
    
    In [121]:
    import pickle
    my_dict = dict([('jake', 4139), ('jack', 4127), ('john', 4098)])
    pickle.dump( my_dict, open( "save.p", "wb" ) )
    

    Defining your own functions (and some exercises)

    Many premade functions -- print, range, len, string.upper(), etc.

    You can also make your own. Let's say we wanted a function that took a string and capitalized every other letter in it. So 'russell' would become 'RuSsElL'

    
    
    In [130]:
    def make_so_dope(string):
        listed_string = list(string)
        for index, letter in enumerate(listed_string):
            if index % 2 == 0: # x % y returns the remainder when you divide x by 2. if x is even, then % should return 0
                listed_string[index] = letter.upper()
        completed_string = ''.join(listed_string)
        return completed_string
    
    
    
    In [131]:
    make_so_dope('Russell')
    
    
    
    
    Out[131]:
    'RuSsElL'

    Let's make a few functions to solve some simple problems (taken from codingbat.com and Google's online Python class). In doing so, think about what data structures we will need, and what kinds of operations or procedures.

    Ex. 1 We want make a package of goal kilos of chocolate. We have small bars (1 kilo each) and big bars (5 kilos each). Return the number of small bars to use, assuming we always use big bars before small bars. Return -1 if it can't be done.

    Ex. 2 Given a list of numbers, return a list where all adjacent == elements have been reduced to a single element, so [1, 2, 2, 3] returns [1, 2, 3]. You may create a new list or modify the passed in list.

    Ex. 3 Given two lists sorted in increasing order, create and return a merged list of all the elements in sorted order. You may modify the passed in lists. Ideally, the solution should work in "linear" time, making a single pass of both lists.

    
    
    In [ ]:
    def make_chocolate(small,big,goal):
        if (small + big * 5) < goal:
            return -1
        else:
            num_small = goal % (5 * big)
            return num_small
    
    
    
    In [142]:
    def remove_adjacent(nums):
      result = []
      for num in nums:
        if len(result) == 0 or num != result[-1]:
          result.append(num)
      return result
    
    
    
    In [ ]:
    def linear_merge(list1, list2):
      result = []
      while len(list1) and len(list2):
        if list1[0] < list2[0]:
          result.append(list1.pop(0))
        else:
          result.append(list2.pop(0))
      result.extend(list1)
      result.extend(list2)
      return result
    

    If we have extra time...

    Some other handy built-in functions and methods

    any() returns True if any element of iterable is True. all() returns True if all elements of an iterable are True

    
    
    In [52]:
    any([1,0,0,0,0])
    
    
    
    
    Out[52]:
    True
    
    
    In [29]:
    all([1,0,0,0,0])
    
    
    
    
    Out[29]:
    False

    Some other string and list methods...

    
    
    In [31]:
    ourstring = "it was a dark and stormy night 3 days ago, when i went to boston and found my dog"
    ourstring.capitalize()
    
    
    
    
    Out[31]:
    'It was a dark and stormy night 3 days ago, when i went to boston'
    
    
    In [33]:
    ourstring.replace('and','but')
    
    
    
    
    Out[33]:
    'it was a dark but stormy night 3 days ago, when i went to boston'
    
    
    In [47]:
    ourlist = ourstring.split()
    print(ourlist)
    
    
    
    
    ['it', 'was', 'a', 'dark', 'and', 'stormy', 'night', '3', 'days', 'ago,', 'when', 'i', 'went', 'to', 'boston']
    
    
    
    In [48]:
    ourlist.insert(3,'very')
    print(ourlist)
    
    
    
    
    ['it', 'was', 'a', 'very', 'dark', 'and', 'stormy', 'night', '3', 'days', 'ago,', 'when', 'i', 'went', 'to', 'boston']
    
    
    
    In [49]:
    ourlist.remove('very') # remove first matching value
    print(ourlist)
    
    
    
    
    ['it', 'was', 'a', 'dark', 'and', 'stormy', 'night', '3', 'days', 'ago,', 'when', 'i', 'went', 'to', 'boston']
    
    
    
    In [50]:
    del ourlist[7:10] # remove by index
    print(ourlist)
    
    
    
    
    ['it', 'was', 'a', 'dark', 'and', 'stormy', 'night', 'when', 'i', 'went', 'to', 'boston']
    
    
    
    In [ ]:
    ourlist.index('dark')
    

    zip() aggregates elements from each of the iterables it is passed.

    
    
    In [55]:
    words = ['the','dog','ate','my','homework']
    tags = ['det','noun','verb','poss','noun']
    freq = [1000,100,50,500,10]
    words_and_tags_and_freqs = zip(words,tags,freq)
    
    
    
    In [56]:
    words_and_tags_and_freqs # apparently in Python 3 (cf. Python 2.x), zip is now an iterator (more on those next time), so calling it doesn't give us the list
    
    
    
    
    Out[56]:
    <builtins.zip at 0x104f9d368>
    
    
    In [58]:
    list(words_and_tags_and_freqs) # to get the list of zipped tuples, have to either use list(), next(), or loop through, as in for x in words_and_tags_and_freqs
    
    
    
    
    Out[58]:
    [('the', 'det', 1000),
     ('dog', 'noun', 100),
     ('ate', 'verb', 50),
     ('my', 'poss', 500),
     ('homework', 'noun', 10)]