Week 2 (Optional): Background in Python and Unix

2.1. Week Introduction

2.1.1. Background on Python

Feel free to jump ahead this section!

2.2. Python: Basics

2.2.1. Python Overview

We'll use Jupyter, the combination of (Ju)lia, (Pyt)hon and R. It's been around since 1991. Python is powerful, fast, plays well with others, runs everywhere, is friendly and easy to learn, is Open.

2.2.2. Python: Variables

Python doesn't use ';' and uses indentation rather than brackets. Dynamically typed language (there is no type declaring in Python).

Common data types: numeric (integers, float, complex), sequence (list, tuple, sequence), binary (byte, bytearray), True/False (bool), text (string).


In [1]:
x = 3
x = 4.5

Python accepts the previous because of dynamic typing (C would throw an error!)

2.2.3. Python: Objects Part 1

Objects can hold data and can have actions associated with them.


In [2]:
x = 3

The previous line is going to cause a PyIntObject to be created. It is going to hold the value of the object, along with other details for Python to work with under the hood (the type of the object, number of references to the object etc.).For those more versed on programming, 'x' is being created on the stack and the PyIntObject with the value of three is being created on the heap. The stack hold your local variables and is managed by your program whereas the heap hold dinamically created data and is actually managed by the OS.


In [3]:
x = 4.5

Now a PyFloatObject with value of 4.5 will be created, then 'x' will point to that object.


In [4]:
x = 3
y = 3.0
x is y


Out[4]:
False

In [5]:
x == y


Out[5]:
True

The 'is' returns True if the references point to the same object. We hope it is False and it is: 'x' point to a PyIntObject and 'y' points to a PyFloatObject. The '==' tests the numeric equality. We hope it is True and it is: both have the same value.

2.2.4. Python: Objects Part 2


In [6]:
x = 'Hello'

This object calls PyStringObject. There are some string methods: capitalize(), lower() etc. To call a method in Python, we just use a period after the variable name, the method name and any parameters to send to the method.


In [7]:
x = 'Hello'
x.lower()


Out[7]:
'hello'

In [8]:
x


Out[8]:
'Hello'

The method above doesn't change the string permanently. To do this, we need apply the changes to the object:


In [9]:
x = 'Hello'
x = x.lower()
x


Out[9]:
'hello'

2.2.5. Python: Variables Quiz Explanation

What would this code produce:

x = 7
y = x
x = 3
print(x, ', ', y)

The output should be: '3, 7'

2.2.6. Python: Loops


In [10]:
for i in range(0, 10):
    print(i)


0
1
2
3
4
5
6
7
8
9

Python doesn't use brackets, it uses indentation. The range function takes the following arguments: range(start, stop [, step]). We can do the same with the while loop:


In [11]:
i = 2
while i < 12:
    print(i)
    i += 3


2
5
8
11

2.2.7. Python: Loop Quiz Explanation

How many times does the following code is evaluated:

i = 0
while i < 10:
    print(i)
    i += 1

The loop is executed 10 times; however, it is evaluated 11 times.

2.2.8. Python: Conditions


In [12]:
for i in range(0, 10, 2):
    print(i)


0
2
4
6
8

It is possible to make the same as before, but using conditionals:


In [13]:
for i in range(0, 10):
    if i % 2 == 0:
        print(i)


0
2
4
6
8

The '%' is the modulo or remainder of a quotient. Now imagine we want the following output: 0, 11, 2, 13 and 4. We need and 'else' statement:


In [14]:
for i in range(0, 5):
    if i % 2 == 0:
        print(i)
    else:
        print(i + 10)


0
11
2
13
4

If you want many 'else if' statement, you can use the 'elif' in Python:


In [15]:
for i in range(0, 5):
    if i % 2 == 0:
        print(i)
    elif i % 3 == 1:
        print(i + 10)
    else:
        print(i - 10)


0
11
2
-7
4

2.2.9. Python: Functions

To define a function in Python, first you need to type 'def' followed by the function name. For example:


In [16]:
def my_abs(val):
    if val < 0:
        return 0 - val
    return val

In [17]:
print(my_abs(-7))


7

In [18]:
print(my_abs('Hi'))


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-3cf16f619f4e> in <module>()
----> 1 print(my_abs('Hi'))

<ipython-input-16-ddf0f7d97d29> in my_abs(val)
      1 def my_abs(val):
----> 2     if val < 0:
      3         return 0 - val
      4     return val

TypeError: '<' not supported between instances of 'str' and 'int'

In a compiled language, the compiler would have caught the type mismatch and we wouldn't have been allowed to pass a string to a function expecting a numeric argument. But in Python, it doesn't come up until run time. We get an error by trying to run this.


In [19]:
def print_abs(val):
    if val < 0:
        print(0 - val)
    else:
        print(val)
        
x = print_abs(-2.7)
print(x)


2.7
None

We might think that this would throw an error, because the 'print_abs' function doesn't return anything, but in fact it returns 'None'.


In [20]:
def inc_val(val):
    val = val + 1
    
x = 7
inc_val(x)
print(x)


7

2.2.10. Python: Function Quiz 1 Explanation


In [21]:
# Function 1:
def my_abs(val):
    if val < 0:
        return 0 - val
    return val

In [22]:
# Function 2:
def my_abs(val):
    if val < 0:
        print(0 - val)
    else:
        print(val)

Print is not the same as return. Function 2 isn't returning the absolute value, it is just printing it.

2.2.11. Python: Function Quiz 2 Explanation


In [23]:
def swap(val1, val2):
    tmp = val1
    val1 = val2
    val2 = tmp

x = 6
y = 3
swap(x, y)
print(x,", ",y)


6 ,  3

The swap function does swap the values within the function, it doesn't change what 'x' and 'y' points to.

2.2.12. Python: Scope


In [24]:
def my_abs(val):
    if val < 0:
        return 0 - val
    return val

print(val)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-e5abd832c461> in <module>()
      4     return val
      5 
----> 6 print(val)

NameError: name 'val' is not defined

The variable 'val' was declared in the scope of the function 'my_abs'. It doesn't live outside that!


In [25]:
val = 0
def my_abs(val):
    if val < 0:
        return 0 - val
    return val

print(val)


0

Now we have a variable that lives outside the function. By declaring 'val' at the top of the file, we've made it a global variable. Be cautious: we never wanna have a global variable have the same name as a function parameter. There are many different guidelines, for example, Google doesn't recommend using global variable at all.

(00:08 2017-07-15)

2.3. Python: Key Data Structures

(11:55 2017-07-15)

2.3.1. Data Structures and Basic Libraries in Python

2.3.2. String Functions

There are many more string methods besides those showed here!

Change case:


In [26]:
# All characters to lower case:
'Hello World!'.lower()


Out[26]:
'hello world!'

In [27]:
# All characters to upper case:
'Hello World!'.upper()


Out[27]:
'HELLO WORLD!'

Concatenation:


In [28]:
'1' + '2'


Out[28]:
'12'

In [29]:
'Hello ' + 'World' + '!'


Out[29]:
'Hello World!'

Replication:


In [30]:
'Spam' * 5


Out[30]:
'SpamSpamSpamSpamSpam'

In [31]:
'Spam' * 3 + 'Eggs' * 2


Out[31]:
'SpamSpamSpamEggsEggs'

Strip(s[, chars]): return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespaces are removed.


In [32]:
'    Extras \n'.strip()


Out[32]:
'Extras'

In [33]:
'****10*****'.strip('*')


Out[33]:
'10'

split(s[, sep[, maxsplit]]): return a list of the words of the string s


In [34]:
'Let\'s split the words'.split(' ')


Out[34]:
["Let's", 'split', 'the', 'words']

In [35]:
'Jane,Doe,Cars,5'.split(',')


Out[35]:
['Jane', 'Doe', 'Cars', '5']

Slicing: when we index strings to get substrings.

H E L L O
0 1 2 3 4
-5 -4 -3 -2 -1

In [36]:
word = 'Hello'
word[1:3] # 1 inclusive to 3 exclusive


Out[36]:
'el'

In [37]:
word[4:7]


Out[37]:
'o'

In [38]:
word[-4:-1]


Out[38]:
'ell'

Substring testing:


In [39]:
word = 'Hello'
'HE' in word


Out[39]:
False

In [40]:
'He' in word


Out[40]:
True

find(sub[, start [, end]]): returns the lowest index in the string where the substring sub is found. Return -1 on failure. Defaults for start and end are the entire string.


In [41]:
word.find('el')


Out[41]:
1

Convert to number:


In [42]:
word = '1234'
int(word)


Out[42]:
1234

In [43]:
float(word)


Out[43]:
1234.0

In [44]:
word = 'Hello'
int(word)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-c6f2507fedf0> in <module>()
      1 word = 'Hello'
----> 2 int(word)

ValueError: invalid literal for int() with base 10: 'Hello'

String formatting:


In [45]:
statement = 'We love {} {}.' # {} are placeholders
statement.format('data', 'analysis')


Out[45]:
'We love data analysis.'

In [46]:
statement = 'We love {0} {1}.' # you can number the {}
statement.format('data', 'analysis')


Out[46]:
'We love data analysis.'

In [47]:
statement = 'We love {1} {0}.'
statement.format('analysis', 'data')


Out[47]:
'We love data analysis.'

2.3.3. Lists in Python

A list is resizeable and has an array implementation underneat the hood.


In [48]:
list1 = [11, 22, 33]
list1


Out[48]:
[11, 22, 33]
11 22 33
0 1 2

In [49]:
list1[1] # slicing the list


Out[49]:
22

In [50]:
list1[3] # there is no third element, we get an error


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-50-0c7887d9918b> in <module>()
----> 1 list1[3] # there is no third element, we get an error

IndexError: list index out of range

In [51]:
# iterate over a list using Python-like syntax:

for i in list1:
    print(i)


11
22
33

In [52]:
# iterate over a list using 'C'-like syntax:

for i in range(0, len(list1)):
    print(list1[i])


11
22
33

In [53]:
# Lists are MUTABLE:

list1 = [11, 22, 33]
list1[1] = 95
list1


Out[53]:
[11, 95, 33]

In [54]:
# Appending to a list:

list1 = [11, 22, 33]
list1.append(44)
list1


Out[54]:
[11, 22, 33, 44]

In [55]:
# Deleting from a list:

list1 = [11, 22, 33, 44]
list1.pop(2) # by the index
list1


Out[55]:
[11, 22, 44]

In [56]:
list1 = [11, 22 , 33, 44]
list1.remove(33) # by the value
list1


Out[56]:
[11, 22, 44]

In [57]:
# Adding a List to a List: extend

list1 = [1, 2, 3]
list2 = [4, 5, 6]
list1.extend(list2)
list1


Out[57]:
[1, 2, 3, 4, 5, 6]

In [58]:
# Extend vs Append:

list1 = [1, 2, 3]
list1.append(list2)
list1


Out[58]:
[1, 2, 3, [4, 5, 6]]

In [59]:
# Zipping Lists:

list1 = [1, 2, 3]
list2 = [4, 5, 6]

for x, y in zip(list1, list2):
    print(x, ',', y)


1 , 4
2 , 5
3 , 6

2.3.4. Quiz

x = [10,20,30]  
y = x  
x[1] = 42  
print(y)

What should be the output?

2.3.5. Reference Quiz Explanation


In [60]:
# Quiz:

x = [10,20,30]
y = x # y is always pointing to the x list!
x[1] = 42 # y is still 'pointing' to the x list
print(y)


[10, 42, 30]

In [61]:
# If we want a new copy of the list x:

x = [10, 20, 30]
y = list(x) # y now 'points' to a new object
x[1] = 42
print(y)


[10, 20, 30]

2.3.6. Tuples in Python

Lists are mutable, tuples are immutable!


In [62]:
tuple1 = ('Honda', 'Civic', 4, 2017)
tuple1


Out[62]:
('Honda', 'Civic', 4, 2017)

In [63]:
# We can slice the tuple (the same as a list):

tuple1[1]


Out[63]:
'Civic'

In [64]:
# Length of the tuple:

len(tuple1)


Out[64]:
4

In [65]:
# Iterating over a tuple:

for i in tuple1:
    print(i)


Honda
Civic
4
2017

In [66]:
tuple1[3] = 2018


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-a9bb2cc569fc> in <module>()
----> 1 tuple1[3] = 2018

TypeError: 'tuple' object does not support item assignment

Immutability is important for 2 reasons:

  • It makes parallel computing harder, because we are sure everyone is working at the same data.
  • Tuples are usually keys in the dictionaries, because the tuple doesn't change, the dictionary can organize based on its initial value without worries about the key getting changed by somebody.

2.3.7. Dictionaries in Python

Dictionaries are just a Python term for a map. The items in a dictionary can be strings, lists, a tuple etc.

Key Value Grades
'CS101' 'Beth' ['A', 'C']
'CS102' 'Marc' ['B', 'B']
Key Value
('Ghostbusters', 2016) 5.4
('Ghostbusters', 1984) 7.8
('Cars', 2016) 7.1

In [67]:
# Create a dictionary:

dict1 = {('Ghostbusters', 2016): 5.4,
       ('Ghostbusters', 1984): 7.8}
dict1


Out[67]:
{('Ghostbusters', 1984): 7.8, ('Ghostbusters', 2016): 5.4}

In [68]:
# Dictionary slicing:

dict1[('Ghostbusters', 2016)]


Out[68]:
5.4

In [69]:
# Length of the dictionary

len(dict1)


Out[69]:
2

In [70]:
# Add a new key to the dictionary and its value:

dict1[('Cars', 2006)] = 7.1
dict1


Out[70]:
{('Cars', 2006): 7.1, ('Ghostbusters', 1984): 7.8, ('Ghostbusters', 2016): 5.4}

Dictionaries are unordered!


In [71]:
# Get a value back from a selected key:

dict1 = {('Ghostbusters', 2016): 5.4,
       ('Ghostbusters', 1984): 7.8,
       ('Cars', 2006): 7.1}
x = dict1[('Cars', 2006)]
x


Out[71]:
7.1

In [72]:
# Ask for a key not in the dictionary:

y = dict1[('Toy Story', 1995)]
y


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-72-d86d4e7ba192> in <module>()
      1 # Ask for a key not in the dictionary:
      2 
----> 3 y = dict1[('Toy Story', 1995)]
      4 y

KeyError: ('Toy Story', 1995)

In [73]:
# Safer way to get a value from a dictionary:

dict1 = {('Ghostbusters', 2016): 5.4,
       ('Ghostbusters', 1984): 7.8,
       ('Cars', 2006): 7.1}

x = dict1.get(('Cars', 2006))
x


Out[73]:
7.1

In [74]:
# Safer way with non-existing key:

x = dict1.get(('Toy Story', 1995))
x == None


Out[74]:
True

In [75]:
('Toy Story', 1995) in dict1


Out[75]:
False

In [76]:
# Deleting from a dictionary:

dict1 = {('Ghostbusters', 2016): 5.4,
       ('Ghostbusters', 1984): 7.8,
       ('Cars', 2006): 7.1}

dict1.pop(('Ghostbusters', 2016)) # we get the value deleted!


Out[76]:
5.4

In [77]:
dict1 # no longer exists


Out[77]:
{('Cars', 2006): 7.1, ('Ghostbusters', 1984): 7.8}

In [78]:
dict1 = {('Ghostbusters', 2016): 5.4,
       ('Ghostbusters', 1984): 7.8,
       ('Cars', 2006): 7.1}

del dict1[('Cars', 2006)] # we doesn't get the value back

In [79]:
dict # no longer exists


Out[79]:
dict

In [80]:
# Iterating over a dictionary:

dict1 = {('Ghostbusters', 2016): 5.4,
       ('Ghostbusters', 1984): 7.8,
       ('Cars', 2006): 7.1}

for i in dict1:
    # print the keys
    print(i)


('Ghostbusters', 2016)
('Ghostbusters', 1984)
('Cars', 2006)

In [81]:
for key, value in dict1.items():
    # print keys and values
    print(key, ':', value)


('Ghostbusters', 2016) : 5.4
('Ghostbusters', 1984) : 7.8
('Cars', 2006) : 7.1

In [82]:
# Be CAREFUL while iterating:

for i in dict1:
    # trying to delete items from dictionary
    dict1.pop(i)


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-82-1c661b4245cf> in <module>()
      1 # Be CAREFUL while iterating:
      2 
----> 3 for i in dict1:
      4     # trying to delete items from dictionary
      5     dict1.pop(i)

RuntimeError: dictionary changed size during iteration

In [83]:
# Selective removal:

dict1 = {('Ghostbusters', 2016): 5.4,
       ('Ghostbusters', 1984): 7.8,
       ('Cars', 2006): 7.1}

to_remove = [] # created an empty list

for i in dict1:
    # iterate over dict, append to the to_remove if met the criteria
    if (i[1] < 2000):
        to_remove.append(i)
        
for i in to_remove:
    # iterate over to_remove, pop from the dict
    dict1.pop(i)
    
dict1


Out[83]:
{('Cars', 2006): 7.1, ('Ghostbusters', 2016): 5.4}

2.3.8. List and Dictionary Comprehension

Suppose I want a list of squares from 1 to 10. Probably you're tempted to build a loop for this, but in Python there is a easier way!


In [84]:
list1 = [i ** 2 for i in range(1, 11)]
list1


Out[84]:
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Let's now make a list with these values:
[0, 1, 2, 3, 4, 5]


In [85]:
list1 = [i for i in range(0, 6)]
list1


Out[85]:
[0, 1, 2, 3, 4, 5]

In [86]:
# All even values from 0 to 20:

list1 = [i for i in range(0, 20, 2)]
list1


Out[86]:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [87]:
# List with alternate value 0 and 1:

list1 = [i % 2 for i in range(0, 10)]
list1


Out[87]:
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

In [88]:
# List with 10 random integers between 0 and 5:

import random

list1 = [random.randint(0, 5) for i in range(0, 10)]
list1


Out[88]:
[4, 4, 2, 1, 5, 4, 1, 4, 5, 1]

In [89]:
# Dictionary comprehension:

dict1 = {i: i ** 2 for i in range(1, 11)}
dict1


Out[89]:
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}

In [90]:
# Dictionary with values from A to Z and numeric keys:

dict1 = {i : chr(i) for i in range(65, 90)}
dict1


Out[90]:
{65: 'A',
 66: 'B',
 67: 'C',
 68: 'D',
 69: 'E',
 70: 'F',
 71: 'G',
 72: 'H',
 73: 'I',
 74: 'J',
 75: 'K',
 76: 'L',
 77: 'M',
 78: 'N',
 79: 'O',
 80: 'P',
 81: 'Q',
 82: 'R',
 83: 'S',
 84: 'T',
 85: 'U',
 86: 'V',
 87: 'W',
 88: 'X',
 89: 'Y'}

2.3.9. Sets in Python

Sets support a number of useful math operations and because they only allow unique elements. Sets have three characteristics:

  • Unordered
  • Unique
  • Support set operations (e.g. union, intersection)

In [91]:
# Create a set:

leos_colors = set(['blue', 'green', 'red'])
leos_colors


Out[91]:
{'blue', 'green', 'red'}

In [92]:
# Add a new item:

leos_colors.add('yellow')
leos_colors


Out[92]:
{'blue', 'green', 'red', 'yellow'}

In [93]:
# Add an existing value to a set:

leos_colors.add('blue')
leos_colors


Out[93]:
{'blue', 'green', 'red', 'yellow'}

In [94]:
# Remove items: Discard
# if you try to discard an item which doesn't exist, it does nothing

leos_colors = set(['blue', 'green', 'red'])
leos_colors.discard('green')
leos_colors.discard('orange')
leos_colors


Out[94]:
{'blue', 'red'}

In [95]:
# Remove items: Remove
# if you try to remove an item which doesn't exist, it throws an error!

leos_colors = set(['blue', 'green', 'red'])
leos_colors.remove('orange')


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-95-b634340178b0> in <module>()
      3 
      4 leos_colors = set(['blue', 'green', 'red'])
----> 5 leos_colors.remove('orange')

KeyError: 'orange'

In [96]:
# Set operations: Union

leos_colors = set(['blue', 'green', 'red'])
ilkays_colors= set(['blue', 'yellow'])

either = ilkays_colors.union(leos_colors)
either


Out[96]:
{'blue', 'green', 'red', 'yellow'}

In [97]:
# Set operations: Intersection

leos_colors = set(['blue', 'green', 'red'])
ilkays_colors = set(['blue', 'yellow'])

both = ilkays_colors.intersection(leos_colors)
both


Out[97]:
{'blue'}

In [98]:
# Set quick operators:

leos_colors & ilkays_colors


Out[98]:
{'blue'}

In [99]:
leos_colors | ilkays_colors


Out[99]:
{'blue', 'green', 'red', 'yellow'}

2.3.10. Python Word Count


In [101]:
# Be sure you have followed the instructions to download the 98-0.txt,
# the text of A Tale of Two Cities, by Charles Dickens

import collections

file = open('/home/jayme/Courses/Python 4 DS/word_cloud/98-0.txt')

# if you want to use stopwords, here's an example of how to do this
stopwords = set(line.strip() for line in open('/home/jayme/Courses/Python 4 DS/word_cloud/stopwords'))

# create your data structure here.  F
wordcount={}

# Instantiate a dictionary, and for every word in the file, add to 
# the dictionary if it doesn't exist. If it does, increase the count.

# Hint: To eliminate duplicates, remember to split by punctuation, 
# and use case demiliters. The functions lower() and split() will be useful!

for word in file.read().lower().split():
    word = word.replace(".","")
    word = word.replace(",","")
    word = word.replace("\"","")
    word = word.replace("“","")
    if word not in stopwords:
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1

# after building your wordcount, you can then sort it and return the first
# n words.  If you want, collections.Counter may be useful.

d = collections.Counter(wordcount)

#print(d.most_common(10))
for word, count in d.most_common(10):
    print(word, ": ", count)


said :  642
mr :  616
one :  420
lorry :  313
will :  290
upon :  289
little :  264
man :  259
defarge :  259
time :  236

2.4. Unix

2.4.1 Unix is Entirely OPTIONAL

2.4.2. Introduction to UNIX

Why UNIX? Most OS are based on Unix (Linux distros, Mac OS, Android, iOS etc), well adopted in industry, powerful development environment, expected competence for most jobs. Three main parts of the Unix OS: kernel, shell and the programs. The shell is the interface between the user and kernel.

2.4.5. Live Code: Intro to UNIX

In the Terminal... ls (list files), cd (change directory), .. (parent directory), clear (clear Terminal), ~ (home directory)

2.4.6. Basic unix Commands

ls (list files in directory), mkdir (make a dir), cd (change dir, '.' current, '..' parent, '~' home dir), * (wildcard), pwd (print working directory), cat (see the contents of a file).

stdin (standard input): usually from the keyboard
stdout (standard output): usually to the terminal
stderr (standard error): usually to the terminal

2.4.7. Live Code: Basic UNIX Commands

cp (copy), mv (move)

2.4.8. Redirecting Standard IO

On Unix, each process started from the command line has 3 file descriptors associated with them:

  • FD 0: standard input, "stdin"
  • FD 1: standard output, "stdout"
  • FD 2: standard errpr, "stderr"

Redirect standard input to read from a file:

$ command < somefile

Redirect standard output to write to a file:

$ command > afile1

Redirect standard output explicitly as FD 1:

$ command 1> afile2

Redirect standard error to write to a file:

$ command 2> afile3

Redirect standard error and standard output to different files:

$ command 2> afile4 > afile5

Redirect everything:

$ command > onefile 2> anotherfile < yetanotherfile

The commands can be any order, doubling the 'greater than' will append to a file, for example:

$ command >> afile1

To redirect stderr and stdout to a file:

$ command > afile 2>&1

2.4.9. Live Code: Redirecting Standard IO

man (manual pages of Unix), man more (see the manual on more).

'more shakespeare.txt', you can press ENTER for a new line, or Space for a new page, or q to quit.

sort (sort the contents of a file), we can:

sort fruit.txt > fruits_sorted.txt (output the sorted file to a new file)

uniq (get the unique lines from a file), wc (print contents of a file), wc -l (print number of lines of a file), who (who is logged into the Unix system), you can type two command, one after another using a semi-colon, like 'pwd; ls -l', we can output both these command using parenthesis as '(pwd; ls -l) > out.txt'

2.4.10. Pipes and Filters

A Unix pipe is a way to send the output of one command to the input of another, for example 'cat foo.txt | wc', we pipe the input of 'cat foo.txt' to count its lines in the 2nd command. Filter is a program that takes inputs and transforms its inputs in some way. Filter commands:

  • grep: search for lines with a given string or look for a pattern in a given input stream
  • more: shows as much as fits in your shell window
  • less:
  • sort: sort alphabetically or numerically
  • uniq: give unique lines

Examples of filtering:

ls -la | more (see all the files in a directory)
car filename | wc (see the lines of the file)
man cat | grep file (look for the occurance of file in the help of cat)
ls -l | grep txt | wc (how many txt files in a folder)
who | sort > current_users (sort alphabetically the who's to a file)

2.4.11. Live Code: Pipes and Filters

ps (process status, or what is running), ps -aef (all processes).

2.4.12. Useful Unix Commands for Data Science

Text commands for Unix: cat, grep, wc, sort, uniq; head, tail, cut, sed (stream editor, performs basic text transformations on an input stream, like a file or input from a pipeline), find.

What are the top 15 words used in Shakespeare's works?

sed -e 's/ /\' \$ '\n/g' < shakespeare.txt | sort | sed '/^$/d' | uniq -c | sort -nr | head -15 > count_vs_words

Which users run the most processes on my Unix system?

ps -aef | cut | -c3-5 | sort | uniq -c | sort -nr | head -3

Transform fruits.txt into all caps for further processing.

tr '[a-z]' '[A-Z]' < fruits.txt > fruits_AllCaps.txt

Gnuplot is an Unix plotting tool!

2.4.13. Live Code: Useful Unix Commands for Data Science

paste (paste all the files into a single file)

2.5. (Optional) Week 2: Assessment


In [102]:
tup1 = ('physics', 'chemistry', 1997, 2000, 2001, 1999)
tup1[1:4]


Out[102]:
('chemistry', 1997, 2000)

In [ ]: