Introduction to Python Basics

What is Python?

High-level, interpreted programming language (like R or Matlab)

High-level -- code almost looks like natural language

Interpreted -- you type code, and then you run it -- you don’t have to ‘compile’ it first into machine-readable code

</ul>



In [7]:

    
print("Hello world")









    



Hello world

Advantages: Python code is readable, intuitive to work with, fast to program in and iterate your projects
Disadvantages: it can be slow, but there are ways around this (e.g., multi-core processing)

So easy, a Russell can do it

I took a ~16 hour summer class, and was immediately using it for my own projects

So I'm also a novice! I may have to defer to Henry on some things!

Python vs. R and Matlab

Free, unlike Matlab!
MANY more people use Python than R or Matlab
1. Better online support (tutorials, faq’s, etc.)
2. More pre-existing code that you can use. Many of the problems I google have already been solved.
Python is simply more versatile than R or Matlab. With the right package, you can do anything, not just stats or visualization or modeling.

Digging in

Doing simple calculations from the command line



In [8]:

    
5 + 5









    Out[8]:





10



In [13]:

    
3 + 2 ** -9









    Out[13]:





3.001953125



In [14]:

    
'this' + '&' + 'that'









    Out[14]:





'this&that'



In [1]:

    
print(5 + 5) # This is a comment

Data Structures

Basic data structure types: numerics, strings, lists, tuples, dictionaries… (we'll talk about each)

Fall into two classes: mutable and immutable

Mutable: Can be changed in place, i.e., change variable without changing where it is stored in the computer's memory. Defining structure a and then saying b = a will point a and b to the same location in memory. If you define a in terms of b, and then changing one in place changes the other.</br>

Immutable: Cannot be changed in place.

Practical demonstration of this difference in a moment...

Numerics

Floats (numbers with decimals, roughly), integers. These are immutable.



In [2]:

    
x = 3 # x is an integer



In [ ]:

    
type(x)



In [23]:

    
y = 4.9



In [ ]:

    
type(y)



In [ ]:

    
y = x



In [ ]:

    
x += 1.2 #now x becomes a float



In [ ]:

    
y # what will the output be? Remember that numerics are immutable



In [ ]:

    
y = float(y) # this changes y's type to float



In [ ]:

    
type(y)

Strings

Sequences of characters (letters, numbers, punctuation, etc.). Also immutable.



In [32]:

    
'Hello world'









    Out[32]:





'Hello world'



In [34]:

    
mystring = 'Hello World'



In [ ]:

    
mystring += ', how are you today?'



In [ ]:

    
mystring



In [ ]:

    
mystring.lower() # object.method()



In [ ]:

    
mystring.split() # returns a list...more on those in a bit



In [35]:

    
mystring.isalpha()









    Out[35]:





False



In [ ]:

    
mystring = 'HellowWorld'



In [ ]:

    
mystring.isalpha()



In [32]:

    
"The sum of 1 + 2 is {} and not {}".format(1+2,99)









    Out[32]:





'The sum of 1 + 2 is 3 and not 99'

True and False are their own type: boolean. Equivalent to 1 and 0 (which can be extremely handy.)

Tuples & Lists

Tuples & lists: ordered containers of any combination of data structures (strings, integers, variables, other tuples or lists). Tuples are immutable, lists mutable.



In [ ]:

    
z = ('our', 'first', 'tuple', 9, x) # put paren around items for tuple



In [ ]:

    
z = ['our', 'first', 'list', 3.4, [3,'hi']] # brackets for list

Some list methods



In [ ]:

    
z.append('eats') # append adds the object itself



In [ ]:

    
z



In [ ]:

    
z.extend('eats') # extend adds the pieces of the object, or iterable



In [ ]:

    
z



In [ ]:

    
z.index('eats')



In [ ]:

    
z.sort()



In [ ]:

    
z



In [ ]:

    
z.pop(2)



In [ ]:

    
z

Tuples vs lists -- slicing, indexing, assignment-via-index



In [34]:

    
z = 'Russell'



In [ ]:

    
z[0] # indexes first element in iterable

0 is the first index in Python (cf. R and Matlab)!!! </br></br>

How to think of it...

Why?

See:

http://en.wikipedia.org/wiki/Zero-based_numbering

http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF <-- charming hand-written note by Edsger W. Dijkstra, a giant in computer science

https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi <-- written by Guido van Rossum, inventor of Python



In [51]:

    
z[:3], z[3:] # slicing...3 is, in a manner of speaking, indexing the space between the two 's'









    Out[51]:





('Rus', 'sell')



In [9]:

    
y = ['my', 'list']



In [10]:

    
y[0]









    Out[10]:





'my'



In [11]:

    
y[:1]









    Out[11]:





['my']



In [ ]:

    
x = ('my', 'tuple')



In [ ]:

    
x[0]



In [ ]:

    
x[:1]

Can also index, slice, assign using negative numbers, which index from the back of the sequence



In [45]:

    
z[-1] # NOTE -- negative indexing does NOT start at 0!









    Out[45]:





'l'



In [41]:

    
z[-3:]









    Out[41]:





'ell'

Assigning new value via indexing works for lists (which are mutable), but not strings or tuples (which are immutable)



In [ ]:

    
x[0] = 'your'



In [ ]:

    
z[0] = 'Z'



In [ ]:

    
y[0] = 'your'

So why use tuple instead of list? Computer creates tuples faster, they use less memory, and, like other immutables, they can be used as keys in dictionary.

Set

Unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Sets support basic set operations like union, intersection, difference, etc. Sets are mutable.

Disclaimer: I haven't used these that much yet (though I probably should)...



In [42]:

    
engineers = {'John', 'Jane', 'Jack', 'Janice'}



In [44]:

    
programmers = {'Jack', 'Sam', 'Susan', 'Janice'}

In addition to set-literals using braces, you can make a set from any iterable. More on iterables next week, but for now, e.g. a list is an iterable.



In [45]:

    
managers = set(['Jane', 'Jack', 'Susan', 'Zack'])



In [46]:

    
employees = engineers | programmers | managers           # union



In [47]:

    
engineering_management = engineers & managers            # intersection



In [48]:

    
fulltime_management = managers - engineers - programmers # difference



In [49]:

    
engineers.add('Marvin')                                  # add element



In [50]:

    
print(engineers)









    



{'Janice', 'Jane', 'Jack', 'Marvin', 'John'}



In [51]:

    
employees.issuperset(engineers)     # superset test









    Out[51]:





False



In [52]:

    
employees.update(engineers)



In [53]:

    
employees.issuperset(engineers)









    Out[53]:





True

Dictionaries

Key-value pairs. Keys can be any immutable (e.g., tuples, strings, numerics, but not lists). Members of dictionary are indexed with key, cf. lists which are indexed with a range of numbers. Like sets, checking for membership is fast, O(1) in average case. Cf. lists, for which membership checking is O(n). Dictionaries are mutable.



In [3]:

    
mydict = {'the':10, 'of': 6}



In [ ]:

    
mydict['the'] #lookup with key



In [ ]:

    
mydict['the'] = 20



In [ ]:

    
mydict

Some useful methods...



In [4]:

    
mydict.get('cromulent',0) # (key, value to be returned if key not in dict)









    Out[4]:





0



In [55]:

    
mydict.items()









    Out[55]:





dict_items([('of', 6), ('the', 10)])



In [ ]:

    
mydict.values()



In [ ]:

    
mydict.keys()

Knowing which data structure to use...

Do you just need an ordered sequence of items?

- Use a tuple Do you need an ordered sequence of items and need to be able to *change that sequence in place*?

- Use a list Do you just need to know whether or not you've already *got* a particular value, but without ordering, and you don't need to store duplicates?

- Use a set Do you need to associate values with keys, so you can look them up efficiently (by key) later on?

- Use a dictionary

Another way to decide between tuples and lists (from Henry):

"By convention lists tend to be homogeneous. You don't know how many you'll end up with necessarily, but you have a bunch of the same thing. Tuples are not necessarily homogeneous, and the slots usually have some kind of predefined semantics, like ('Henry', 'Harrison', 27)."

A couple basic functions



In [56]:

    
x, y, z = 2, 'hi', ['my', 'list']



In [57]:

    
print(x,y,z)









    



2 hi ['my', 'list']



In [ ]:

    
len(z)



In [ ]:

    
range(10)

Control Flow

Conditionals (if/thens), loops, list comprehension

Conditionals



In [ ]:

    
x = 3
if x == 3: # notice == for computing truth value
    print('yes')



In [ ]:

    
y = 'tested'
if len(y) > 6:
    print('yes')
elif len(y) = 6:
    print('maybe')
else:
    print('no')

For loops

For every item in some list, tuple, string, or other iterable, do something with that item



In [ ]:

    
x = list('Russell')
for letter in x:
    print(x)



In [ ]:

    
for index, letter in enumerate(x): # enumerate is a generator, not an ordinary function, more next time on genereators
    print letter, index



In [ ]:

    
for index, letter in enumerate(x):
    if index == 0 or index == 1:
        print index, letter

While Loops

While something is True, do something else.



In [141]:

    
list1 = ['the','rain','in','spain','falls','mainly','on','the','plain']
list2 = []
while len(list1): # len(list1) is True if list1 is not empty, i.e., if len(list1) is > 1
     list2.append(list1.pop(0))
list2









    Out[141]:





['the', 'rain', 'in', 'spain', 'falls', 'mainly', 'on', 'the', 'plain']

List comprehension

A compressed, elegant way to construct lists with loops



In [ ]:

    
x = [y**2 for y in range(10)]



In [65]:

    
my_name = 'Russell'
consonants = set('bcdfghjklmnpqrstvwxz')
my_consonants = [x for x in my_name if x.lower() in consonants]



In [67]:

    
''.join(my_consonants) #a string method...take an iterable containing strings, and join them by the string object ('') in the first part of the line









    Out[67]:





'Rssll'

File input/output

Let's first make a little text file in textedit, textwrangler, etc.



In [103]:

    
myfile = open('test_file.txt','r+') # r+ enables reading and writing



In [104]:

    
myfile.read()









    Out[104]:





'This is a test file with some meaningless text in it.\nWe have a few lines of text in it.\nLike this one.\nAnd this one.\nThis is another line'

Note: if we try to do this again, we get nothing, because the read method changes our position in the file object. We can read the file in again, or change the position with myfile.seek(0).



In [105]:

    
myfile.read()









    Out[105]:





''



In [112]:

    
myfile.seek(0)









    Out[112]:





0



In [107]:

    
for line in myfile:
    print(line)









    



This is a test file with some meaningless text in it.

We have a few lines of text in it.

Like this one.

And this one.

This is another line



In [108]:

    
firstline = myfile.readline()



In [109]:

    
print(firstline)



In [113]:

    
all_lines = myfile.readlines()



In [114]:

    
print(all_lines)









    



['This is a test file with some meaningless text in it.\n', 'We have a few lines of text in it.\n', 'Like this one.\n', 'And this one.\n', 'This is another line']



In [115]:

    
myfile.write('\nThis is another line\n')









    Out[115]:





21



In [116]:

    
x = 'This is a line to be saved to the file'.split()



In [117]:

    
myfile.write(str(x))









    Out[117]:





69



In [118]:

    
myfile.close() # to save some system resources...important if you have big files

All the above works fine if your data are simple types like numerics or strings. But if you want a to save more complicated structures like dictionaries, and especially if you want to do it on the cheap, there are Python-specific files for doing this (e.g., pickles or pandas).



In [121]:

    
import pickle
my_dict = dict([('jake', 4139), ('jack', 4127), ('john', 4098)])
pickle.dump( my_dict, open( "save.p", "wb" ) )

Defining your own functions (and some exercises)

Many premade functions -- print, range, len, string.upper(), etc.

You can also make your own. Let's say we wanted a function that took a string and capitalized every other letter in it. So 'russell' would become 'RuSsElL'



In [130]:

    
def make_so_dope(string):
    listed_string = list(string)
    for index, letter in enumerate(listed_string):
        if index % 2 == 0: # x % y returns the remainder when you divide x by 2. if x is even, then % should return 0
            listed_string[index] = letter.upper()
    completed_string = ''.join(listed_string)
    return completed_string



In [131]:

    
make_so_dope('Russell')









    Out[131]:





'RuSsElL'

Let's make a few functions to solve some simple problems (taken from codingbat.com and Google's online Python class). In doing so, think about what data structures we will need, and what kinds of operations or procedures.

Ex. 1 We want make a package of goal kilos of chocolate. We have small bars (1 kilo each) and big bars (5 kilos each). Return the number of small bars to use, assuming we always use big bars before small bars. Return -1 if it can't be done.

Ex. 2 Given a list of numbers, return a list where all adjacent == elements have been reduced to a single element, so [1, 2, 2, 3] returns [1, 2, 3]. You may create a new list or modify the passed in list.

Ex. 3 Given two lists sorted in increasing order, create and return a merged list of all the elements in sorted order. You may modify the passed in lists. Ideally, the solution should work in "linear" time, making a single pass of both lists.



In [ ]:

    
def make_chocolate(small,big,goal):
    if (small + big * 5) < goal:
        return -1
    else:
        num_small = goal % (5 * big)
        return num_small



In [142]:

    
def remove_adjacent(nums):
  result = []
  for num in nums:
    if len(result) == 0 or num != result[-1]:
      result.append(num)
  return result



In [ ]:

    
def linear_merge(list1, list2):
  result = []
  while len(list1) and len(list2):
    if list1[0] < list2[0]:
      result.append(list1.pop(0))
    else:
      result.append(list2.pop(0))
  result.extend(list1)
  result.extend(list2)
  return result

If we have extra time...

Some other handy built-in functions and methods

any() returns True if any element of iterable is True. all() returns True if all elements of an iterable are True



In [52]:

    
any([1,0,0,0,0])









    Out[52]:





True



In [29]:

    
all([1,0,0,0,0])









    Out[29]:





False

Some other string and list methods...



In [31]:

    
ourstring = "it was a dark and stormy night 3 days ago, when i went to boston and found my dog"
ourstring.capitalize()









    Out[31]:





'It was a dark and stormy night 3 days ago, when i went to boston'



In [33]:

    
ourstring.replace('and','but')









    Out[33]:





'it was a dark but stormy night 3 days ago, when i went to boston'



In [47]:

    
ourlist = ourstring.split()
print(ourlist)









    



['it', 'was', 'a', 'dark', 'and', 'stormy', 'night', '3', 'days', 'ago,', 'when', 'i', 'went', 'to', 'boston']



In [48]:

    
ourlist.insert(3,'very')
print(ourlist)









    



['it', 'was', 'a', 'very', 'dark', 'and', 'stormy', 'night', '3', 'days', 'ago,', 'when', 'i', 'went', 'to', 'boston']



In [49]:

    
ourlist.remove('very') # remove first matching value
print(ourlist)









    



['it', 'was', 'a', 'dark', 'and', 'stormy', 'night', '3', 'days', 'ago,', 'when', 'i', 'went', 'to', 'boston']



In [50]:

    
del ourlist[7:10] # remove by index
print(ourlist)









    



['it', 'was', 'a', 'dark', 'and', 'stormy', 'night', 'when', 'i', 'went', 'to', 'boston']



In [ ]:

    
ourlist.index('dark')

zip() aggregates elements from each of the iterables it is passed.



In [55]:

    
words = ['the','dog','ate','my','homework']
tags = ['det','noun','verb','poss','noun']
freq = [1000,100,50,500,10]
words_and_tags_and_freqs = zip(words,tags,freq)



In [56]:

    
words_and_tags_and_freqs # apparently in Python 3 (cf. Python 2.x), zip is now an iterator (more on those next time), so calling it doesn't give us the list









    Out[56]:





<builtins.zip at 0x104f9d368>



In [58]:

    
list(words_and_tags_and_freqs) # to get the list of zipped tuples, have to either use list(), next(), or loop through, as in for x in words_and_tags_and_freqs









    Out[58]:





[('the', 'det', 1000),
 ('dog', 'noun', 100),
 ('ate', 'verb', 50),
 ('my', 'poss', 500),
 ('homework', 'noun', 10)]