Python doesn't use ';' and uses indentation rather than brackets. Dynamically typed language (there is no type declaring in Python).
Common data types: numeric (integers, float, complex), sequence (list, tuple, sequence), binary (byte, bytearray), True/False (bool), text (string).
In [1]:
x = 3
x = 4.5
Python accepts the previous because of dynamic typing (C would throw an error!)
In [2]:
x = 3
The previous line is going to cause a PyIntObject to be created. It is going to hold the value of the object, along with other details for Python to work with under the hood (the type of the object, number of references to the object etc.).For those more versed on programming, 'x' is being created on the stack and the PyIntObject with the value of three is being created on the heap. The stack hold your local variables and is managed by your program whereas the heap hold dinamically created data and is actually managed by the OS.
In [3]:
x = 4.5
Now a PyFloatObject with value of 4.5 will be created, then 'x' will point to that object.
In [4]:
x = 3
y = 3.0
x is y
Out[4]:
In [5]:
x == y
Out[5]:
The 'is' returns True if the references point to the same object. We hope it is False and it is: 'x' point to a PyIntObject and 'y' points to a PyFloatObject. The '==' tests the numeric equality. We hope it is True and it is: both have the same value.
In [6]:
x = 'Hello'
This object calls PyStringObject. There are some string methods: capitalize(), lower() etc. To call a method in Python, we just use a period after the variable name, the method name and any parameters to send to the method.
In [7]:
x = 'Hello'
x.lower()
Out[7]:
In [8]:
x
Out[8]:
The method above doesn't change the string permanently. To do this, we need apply the changes to the object:
In [9]:
x = 'Hello'
x = x.lower()
x
Out[9]:
In [10]:
for i in range(0, 10):
print(i)
Python doesn't use brackets, it uses indentation. The range function takes the following arguments: range(start, stop [, step]). We can do the same with the while loop:
In [11]:
i = 2
while i < 12:
print(i)
i += 3
In [12]:
for i in range(0, 10, 2):
print(i)
It is possible to make the same as before, but using conditionals:
In [13]:
for i in range(0, 10):
if i % 2 == 0:
print(i)
The '%' is the modulo or remainder of a quotient. Now imagine we want the following output: 0, 11, 2, 13 and 4. We need and 'else' statement:
In [14]:
for i in range(0, 5):
if i % 2 == 0:
print(i)
else:
print(i + 10)
If you want many 'else if' statement, you can use the 'elif' in Python:
In [15]:
for i in range(0, 5):
if i % 2 == 0:
print(i)
elif i % 3 == 1:
print(i + 10)
else:
print(i - 10)
In [16]:
def my_abs(val):
if val < 0:
return 0 - val
return val
In [17]:
print(my_abs(-7))
In [18]:
print(my_abs('Hi'))
In a compiled language, the compiler would have caught the type mismatch and we wouldn't have been allowed to pass a string to a function expecting a numeric argument. But in Python, it doesn't come up until run time. We get an error by trying to run this.
In [19]:
def print_abs(val):
if val < 0:
print(0 - val)
else:
print(val)
x = print_abs(-2.7)
print(x)
We might think that this would throw an error, because the 'print_abs' function doesn't return anything, but in fact it returns 'None'.
In [20]:
def inc_val(val):
val = val + 1
x = 7
inc_val(x)
print(x)
In [21]:
# Function 1:
def my_abs(val):
if val < 0:
return 0 - val
return val
In [22]:
# Function 2:
def my_abs(val):
if val < 0:
print(0 - val)
else:
print(val)
Print is not the same as return. Function 2 isn't returning the absolute value, it is just printing it.
In [23]:
def swap(val1, val2):
tmp = val1
val1 = val2
val2 = tmp
x = 6
y = 3
swap(x, y)
print(x,", ",y)
The swap function does swap the values within the function, it doesn't change what 'x' and 'y' points to.
In [24]:
def my_abs(val):
if val < 0:
return 0 - val
return val
print(val)
The variable 'val' was declared in the scope of the function 'my_abs'. It doesn't live outside that!
In [25]:
val = 0
def my_abs(val):
if val < 0:
return 0 - val
return val
print(val)
Now we have a variable that lives outside the function. By declaring 'val' at the top of the file, we've made it a global variable. Be cautious: we never wanna have a global variable have the same name as a function parameter. There are many different guidelines, for example, Google doesn't recommend using global variable at all.
(00:08 2017-07-15)
In [26]:
# All characters to lower case:
'Hello World!'.lower()
Out[26]:
In [27]:
# All characters to upper case:
'Hello World!'.upper()
Out[27]:
Concatenation:
In [28]:
'1' + '2'
Out[28]:
In [29]:
'Hello ' + 'World' + '!'
Out[29]:
Replication:
In [30]:
'Spam' * 5
Out[30]:
In [31]:
'Spam' * 3 + 'Eggs' * 2
Out[31]:
Strip(s[, chars]): return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespaces are removed.
In [32]:
' Extras \n'.strip()
Out[32]:
In [33]:
'****10*****'.strip('*')
Out[33]:
split(s[, sep[, maxsplit]]): return a list of the words of the string s
In [34]:
'Let\'s split the words'.split(' ')
Out[34]:
In [35]:
'Jane,Doe,Cars,5'.split(',')
Out[35]:
Slicing: when we index strings to get substrings.
H | E | L | L | O |
---|---|---|---|---|
0 | 1 | 2 | 3 | 4 |
-5 | -4 | -3 | -2 | -1 |
In [36]:
word = 'Hello'
word[1:3] # 1 inclusive to 3 exclusive
Out[36]:
In [37]:
word[4:7]
Out[37]:
In [38]:
word[-4:-1]
Out[38]:
Substring testing:
In [39]:
word = 'Hello'
'HE' in word
Out[39]:
In [40]:
'He' in word
Out[40]:
find(sub[, start [, end]]): returns the lowest index in the string where the substring sub is found. Return -1 on failure. Defaults for start and end are the entire string.
In [41]:
word.find('el')
Out[41]:
Convert to number:
In [42]:
word = '1234'
int(word)
Out[42]:
In [43]:
float(word)
Out[43]:
In [44]:
word = 'Hello'
int(word)
String formatting:
In [45]:
statement = 'We love {} {}.' # {} are placeholders
statement.format('data', 'analysis')
Out[45]:
In [46]:
statement = 'We love {0} {1}.' # you can number the {}
statement.format('data', 'analysis')
Out[46]:
In [47]:
statement = 'We love {1} {0}.'
statement.format('analysis', 'data')
Out[47]:
A list is resizeable and has an array implementation underneat the hood.
In [48]:
list1 = [11, 22, 33]
list1
Out[48]:
11 | 22 | 33 |
---|---|---|
0 | 1 | 2 |
In [49]:
list1[1] # slicing the list
Out[49]:
In [50]:
list1[3] # there is no third element, we get an error
In [51]:
# iterate over a list using Python-like syntax:
for i in list1:
print(i)
In [52]:
# iterate over a list using 'C'-like syntax:
for i in range(0, len(list1)):
print(list1[i])
In [53]:
# Lists are MUTABLE:
list1 = [11, 22, 33]
list1[1] = 95
list1
Out[53]:
In [54]:
# Appending to a list:
list1 = [11, 22, 33]
list1.append(44)
list1
Out[54]:
In [55]:
# Deleting from a list:
list1 = [11, 22, 33, 44]
list1.pop(2) # by the index
list1
Out[55]:
In [56]:
list1 = [11, 22 , 33, 44]
list1.remove(33) # by the value
list1
Out[56]:
In [57]:
# Adding a List to a List: extend
list1 = [1, 2, 3]
list2 = [4, 5, 6]
list1.extend(list2)
list1
Out[57]:
In [58]:
# Extend vs Append:
list1 = [1, 2, 3]
list1.append(list2)
list1
Out[58]:
In [59]:
# Zipping Lists:
list1 = [1, 2, 3]
list2 = [4, 5, 6]
for x, y in zip(list1, list2):
print(x, ',', y)
In [60]:
# Quiz:
x = [10,20,30]
y = x # y is always pointing to the x list!
x[1] = 42 # y is still 'pointing' to the x list
print(y)
In [61]:
# If we want a new copy of the list x:
x = [10, 20, 30]
y = list(x) # y now 'points' to a new object
x[1] = 42
print(y)
In [62]:
tuple1 = ('Honda', 'Civic', 4, 2017)
tuple1
Out[62]:
In [63]:
# We can slice the tuple (the same as a list):
tuple1[1]
Out[63]:
In [64]:
# Length of the tuple:
len(tuple1)
Out[64]:
In [65]:
# Iterating over a tuple:
for i in tuple1:
print(i)
In [66]:
tuple1[3] = 2018
Immutability is important for 2 reasons:
In [67]:
# Create a dictionary:
dict1 = {('Ghostbusters', 2016): 5.4,
('Ghostbusters', 1984): 7.8}
dict1
Out[67]:
In [68]:
# Dictionary slicing:
dict1[('Ghostbusters', 2016)]
Out[68]:
In [69]:
# Length of the dictionary
len(dict1)
Out[69]:
In [70]:
# Add a new key to the dictionary and its value:
dict1[('Cars', 2006)] = 7.1
dict1
Out[70]:
Dictionaries are unordered!
In [71]:
# Get a value back from a selected key:
dict1 = {('Ghostbusters', 2016): 5.4,
('Ghostbusters', 1984): 7.8,
('Cars', 2006): 7.1}
x = dict1[('Cars', 2006)]
x
Out[71]:
In [72]:
# Ask for a key not in the dictionary:
y = dict1[('Toy Story', 1995)]
y
In [73]:
# Safer way to get a value from a dictionary:
dict1 = {('Ghostbusters', 2016): 5.4,
('Ghostbusters', 1984): 7.8,
('Cars', 2006): 7.1}
x = dict1.get(('Cars', 2006))
x
Out[73]:
In [74]:
# Safer way with non-existing key:
x = dict1.get(('Toy Story', 1995))
x == None
Out[74]:
In [75]:
('Toy Story', 1995) in dict1
Out[75]:
In [76]:
# Deleting from a dictionary:
dict1 = {('Ghostbusters', 2016): 5.4,
('Ghostbusters', 1984): 7.8,
('Cars', 2006): 7.1}
dict1.pop(('Ghostbusters', 2016)) # we get the value deleted!
Out[76]:
In [77]:
dict1 # no longer exists
Out[77]:
In [78]:
dict1 = {('Ghostbusters', 2016): 5.4,
('Ghostbusters', 1984): 7.8,
('Cars', 2006): 7.1}
del dict1[('Cars', 2006)] # we doesn't get the value back
In [79]:
dict # no longer exists
Out[79]:
In [80]:
# Iterating over a dictionary:
dict1 = {('Ghostbusters', 2016): 5.4,
('Ghostbusters', 1984): 7.8,
('Cars', 2006): 7.1}
for i in dict1:
# print the keys
print(i)
In [81]:
for key, value in dict1.items():
# print keys and values
print(key, ':', value)
In [82]:
# Be CAREFUL while iterating:
for i in dict1:
# trying to delete items from dictionary
dict1.pop(i)
In [83]:
# Selective removal:
dict1 = {('Ghostbusters', 2016): 5.4,
('Ghostbusters', 1984): 7.8,
('Cars', 2006): 7.1}
to_remove = [] # created an empty list
for i in dict1:
# iterate over dict, append to the to_remove if met the criteria
if (i[1] < 2000):
to_remove.append(i)
for i in to_remove:
# iterate over to_remove, pop from the dict
dict1.pop(i)
dict1
Out[83]:
In [84]:
list1 = [i ** 2 for i in range(1, 11)]
list1
Out[84]:
Let's now make a list with these values:
[0, 1, 2, 3, 4, 5]
In [85]:
list1 = [i for i in range(0, 6)]
list1
Out[85]:
In [86]:
# All even values from 0 to 20:
list1 = [i for i in range(0, 20, 2)]
list1
Out[86]:
In [87]:
# List with alternate value 0 and 1:
list1 = [i % 2 for i in range(0, 10)]
list1
Out[87]:
In [88]:
# List with 10 random integers between 0 and 5:
import random
list1 = [random.randint(0, 5) for i in range(0, 10)]
list1
Out[88]:
In [89]:
# Dictionary comprehension:
dict1 = {i: i ** 2 for i in range(1, 11)}
dict1
Out[89]:
In [90]:
# Dictionary with values from A to Z and numeric keys:
dict1 = {i : chr(i) for i in range(65, 90)}
dict1
Out[90]:
In [91]:
# Create a set:
leos_colors = set(['blue', 'green', 'red'])
leos_colors
Out[91]:
In [92]:
# Add a new item:
leos_colors.add('yellow')
leos_colors
Out[92]:
In [93]:
# Add an existing value to a set:
leos_colors.add('blue')
leos_colors
Out[93]:
In [94]:
# Remove items: Discard
# if you try to discard an item which doesn't exist, it does nothing
leos_colors = set(['blue', 'green', 'red'])
leos_colors.discard('green')
leos_colors.discard('orange')
leos_colors
Out[94]:
In [95]:
# Remove items: Remove
# if you try to remove an item which doesn't exist, it throws an error!
leos_colors = set(['blue', 'green', 'red'])
leos_colors.remove('orange')
In [96]:
# Set operations: Union
leos_colors = set(['blue', 'green', 'red'])
ilkays_colors= set(['blue', 'yellow'])
either = ilkays_colors.union(leos_colors)
either
Out[96]:
In [97]:
# Set operations: Intersection
leos_colors = set(['blue', 'green', 'red'])
ilkays_colors = set(['blue', 'yellow'])
both = ilkays_colors.intersection(leos_colors)
both
Out[97]:
In [98]:
# Set quick operators:
leos_colors & ilkays_colors
Out[98]:
In [99]:
leos_colors | ilkays_colors
Out[99]:
In [101]:
# Be sure you have followed the instructions to download the 98-0.txt,
# the text of A Tale of Two Cities, by Charles Dickens
import collections
file = open('/home/jayme/Courses/Python 4 DS/word_cloud/98-0.txt')
# if you want to use stopwords, here's an example of how to do this
stopwords = set(line.strip() for line in open('/home/jayme/Courses/Python 4 DS/word_cloud/stopwords'))
# create your data structure here. F
wordcount={}
# Instantiate a dictionary, and for every word in the file, add to
# the dictionary if it doesn't exist. If it does, increase the count.
# Hint: To eliminate duplicates, remember to split by punctuation,
# and use case demiliters. The functions lower() and split() will be useful!
for word in file.read().lower().split():
word = word.replace(".","")
word = word.replace(",","")
word = word.replace("\"","")
word = word.replace("“","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
# after building your wordcount, you can then sort it and return the first
# n words. If you want, collections.Counter may be useful.
d = collections.Counter(wordcount)
#print(d.most_common(10))
for word, count in d.most_common(10):
print(word, ": ", count)
Why UNIX? Most OS are based on Unix (Linux distros, Mac OS, Android, iOS etc), well adopted in industry, powerful development environment, expected competence for most jobs. Three main parts of the Unix OS: kernel, shell and the programs. The shell is the interface between the user and kernel.
ls (list files in directory), mkdir (make a dir), cd (change dir, '.' current, '..' parent, '~' home dir), * (wildcard), pwd (print working directory), cat (see the contents of a file).
stdin (standard input): usually from the keyboard
stdout (standard output): usually to the terminal
stderr (standard error): usually to the terminal
On Unix, each process started from the command line has 3 file descriptors associated with them:
Redirect standard input to read from a file:
$ command < somefile
Redirect standard output to write to a file:
$ command > afile1
Redirect standard output explicitly as FD 1:
$ command 1> afile2
Redirect standard error to write to a file:
$ command 2> afile3
Redirect standard error and standard output to different files:
$ command 2> afile4 > afile5
Redirect everything:
$ command > onefile 2> anotherfile < yetanotherfile
The commands can be any order, doubling the 'greater than' will append to a file, for example:
$ command >> afile1
To redirect stderr and stdout to a file:
$ command > afile 2>&1
man (manual pages of Unix), man more (see the manual on more).
'more shakespeare.txt', you can press ENTER for a new line, or Space for a new page, or q to quit.
sort (sort the contents of a file), we can:
sort fruit.txt > fruits_sorted.txt (output the sorted file to a new file)
uniq (get the unique lines from a file), wc (print contents of a file), wc -l (print number of lines of a file), who (who is logged into the Unix system), you can type two command, one after another using a semi-colon, like 'pwd; ls -l', we can output both these command using parenthesis as '(pwd; ls -l) > out.txt'
A Unix pipe is a way to send the output of one command to the input of another, for example 'cat foo.txt | wc', we pipe the input of 'cat foo.txt' to count its lines in the 2nd command. Filter is a program that takes inputs and transforms its inputs in some way. Filter commands:
Examples of filtering:
ls -la | more (see all the files in a directory)
car filename | wc (see the lines of the file)
man cat | grep file (look for the occurance of file in the help of cat)
ls -l | grep txt | wc (how many txt files in a folder)
who | sort > current_users (sort alphabetically the who's to a file)
Text commands for Unix: cat, grep, wc, sort, uniq; head, tail, cut, sed (stream editor, performs basic text transformations on an input stream, like a file or input from a pipeline), find.
What are the top 15 words used in Shakespeare's works?
sed -e 's/ /\' \$ '\n/g' < shakespeare.txt | sort | sed '/^$/d' | uniq -c | sort -nr | head -15 > count_vs_words
Which users run the most processes on my Unix system?
ps -aef | cut | -c3-5 | sort | uniq -c | sort -nr | head -3
Transform fruits.txt into all caps for further processing.
tr '[a-z]' '[A-Z]' < fruits.txt > fruits_AllCaps.txt
Gnuplot is an Unix plotting tool!
In [102]:
tup1 = ('physics', 'chemistry', 1997, 2000, 2001, 1999)
tup1[1:4]
Out[102]:
In [ ]: