Python for Developers

First Edition

Chapter 5: Types


Variables in the Python interpreter are created by assignment and destroyed by the garbage collector, when there are no more references to them.

Variable names must start with a letter or underscore (_) and be followed by letters, digits or underscores (_).  Uppercase and lowercase letters are considered different.

There are several pre-defined simple types of data in Python, such as:

  • Numbers (integer, real, complex, ... )
  • Text

Furthermore, there are types that function as collections. The main ones are:

  • List
  • Tuple
  • Dictionary

Python types can be:

  • Mutable: allow the contents of the variables to be changed.
  • Immutable: do not allow the contents of variables to be changed.

In Python, variable names are references that can be changed at execution time.

The most common types and routines are implemented in the form of builtins, i.e. they are always available at runtime, without the need to import any library.

Numbers

Python provides some numeric types as builtins:

  • Integer (int): i = 1
  • Floating Point real (float): f = 3.14
  • Complex (complex): c = 3 + 4j

In addition to the conventional integers, there are also long integers, whose dimensions are arbitrary and limited by the available memory. Conversions between integer and long are performed automatically. The builtin function int() can be used to convert other types to integer, including base changes.

Example:


In [ ]:
# Converting real to integer
print 'int(3.14) =', int(3.14)

# Converting integer to real
print 'float(5) =', float(5)

# Calculation between integer and real results in real
print '5.0 / 2 + 3 = ', 5.0 / 2 + 3

# Integers in other base
print "int('20', 8) =", int('20', 8) # base 8
print "int('20', 16) =", int('20', 16) # base 16

# Operations with complex numbers
c = 3 + 4j
print 'c =', c
print 'Real Part:', c.real
print 'Imaginary Part:', c.imag
print 'Conjugate:', c.conjugate()


int(3.14) = 3
float(5) = 5.0
5.0 / 2 + 3 =  5.5
int('20', 8) = 16
int('20', 16) = 32
c = (3+4j)
Real Part: 3.0
Imaginary Part: 4.0
Conjugate: (3-4j)

The real numbers can also be represented in scientific notation, for example: 1.2e22.

Python has a number of defined operators for handling numbers through arithmetic calculations, logic operations (that test whether a condition is true or false) or bitwise processing (where the numbers are processed in binary form).

Arithmetic Operations:

  • Sum (+)
  • Difference (-)
  • Multiplication (*)
  • Division (/): between two integers the result is equal to the integer division. In other cases, the result is real.
  • Integer Division (//): the result is truncated to the next lower integer, even when applied to real numbers, but in this case the result type is real too.
  • Module (%): returns the remainder of the division.
  • Power (**): can be used to calculate the root, through fractional exponents (eg 100 ** 0.5).
  • Positive (+)
  • Negative (-)

Logical Operations:

  • Less than (<)
  • Greater than (>)
  • Less than or equal to (<=)
  • Greater than or equal to (>=)
  • Equal to (==)
  • Not equal to (!=)

Bitwise Operations:

  • Left Shift (<<)
  • Right Shift (>>)
  • And (&)
  • Or (|)
  • Exclusive Or (^)
  • Inversion (~)

During the operations, numbers are converted appropriately (eg. (1.5+4j) + 3 gives 4.5+4j).

Besides operators, there are also some builtin features to handle numeric types: abs(), which returns the absolute value of the number, oct(), which converts to octal, hex(), which converts for hexadecimal, pow(), which raises a number by another and round(), which returns a real number with the specified rounding.

Text

Strings are Python builtins for handling text. As they are immutable, you can not add, remove or change any character in a string. To perform these operations, Python needs to create a new string.

Types:

  • Standard String: s = 'Led Zeppelin'
  • Unicode String: u = u'Björk'

The standard string can be converted to unicode by using the function unicode().

String initializations can be made:

  • With single or double quotes.
  • On several consecutive lines, provided that it's between three single or double quotes.
  • Without expansion characters (example: s = r '\ n', where s will contain the characters \ and n).

String Operations:


In [ ]:
s = 'Camel'

# Concatenation
print 'The ' + s + ' ran away!'

# Interpolation
print 'Size of %s => %d' % (s, len(s))

# String processed as a sequence
for ch in s: print ch

# Strings are objects
if s.startswith('C'): print s.upper()

# what will happen? 
print 3 * s
# 3 * s is consistent with s + s + s


The Camel run away!
Size of Camel => 5
C
a
m
e
l
CAMEL
CamelCamelCamel

The operator % is used for string interpolation. The interpolation is more efficient in use of memory than the conventional concatenation.

Symbols used in the interpolation:

  • %s: string.
  • %d: integer.
  • %o: octal.
  • %x: hexacimal.
  • %f: real.
  • %e: real exponential.
  • %%: percent sign.

Symbols can be used to display numbers in various formats.

Example:


In [ ]:
# Zeros left
print 'Now is %02d:%02d.' % (16, 30)

# Real (The number after the decimal point specifies how many decimal digits )
print 'Percent: %.1f%%, Exponencial:%.2e' % (5.333, 0.00314)

# Octal and hexadecimal
print 'Decimal: %d, Octal: %o, Hexadecimal: %x' % (10, 10, 10)


Now is 16:30.
Percent: 5.3%, Exponencial:3.14e-03
Decimal: 10, Octal: 12, Hexadecimal: a

Since version 2.6, in addition to interpolation operator %, the string method and function format() is available. Examples:


In [ ]:
musicians = [('Page', 'guitarist', 'Led Zeppelin'),
('Fripp', 'guitarist', 'King Crimson')]

# Parameters are identified by order
msg = '{0} is {1} of {2}'

for name, function, band in musicians:
    print(msg.format(name, function, band))

# Parameters are identified by name
msg = '{greeting}, it is {hour:02d}:{minute:02d}'

print msg.format(greeting='Good Morning', hour=7, minute=30)

# Builtin function format()
print 'Pi =', format(3.14159, '.3e')


Page is guitarist of Led Zeppelin
Fripp is guitarist of King Crimson
Good Morning, it is 07:30
Pi = 3.142e+00

The function format() can be used only to format one piece of data each time.

Slices of strings can be obtained by adding indexes between brackets after a string.

Python indexes:

  • Start with zero.
  • Count from the end if they are negative.
  • Can be defined as sections, in the form [start: end + 1: step]. If not set the start, it will be considered as zero. If not set end + 1, it will be considered the size of the object. The step (between characters), if not set, is 1.

It is possible to invert strings by using a negative step:


In [6]:
print 'Python'[::-1]
# shows: nohtyP


nohtyP

Various functions for dealing with text are implemented in the module string.


In [ ]:
import string

# the alphabet
a = string.ascii_letters

# Shifting left the alphabet
b = a[1:] + a[0]

# The function maketrans() creates a translation table
# from the characters of both strings it received as parameters.
# The characters not present in the table will be 
# copied to the output.
tab = string.maketrans(a, b)

# The message...
msg = '''This text will be translated..
It will become very strange.
'''
# The function translate() uses the translation table
# created by maketrans() to translate the string
print string.translate(msg, tab)


Uijt ufyu xjmm cf usbotmbufe..
Ju xjmm cfdpnf wfsz tusbohf.

The module also implements a type called Template, which is a model string that can be filled through a dictionary. Identifiers are initialized by a dollar sign ($) and may be surrounded by curly braces, to avoid confusion.

Example:


In [ ]:
import string

# Creates a template string
st = string.Template('$warning occurred in $when')

# Fills the model with a dictionary
s = st.substitute({'warning': 'Lack of electricity',
    'when': 'April 3, 2002'})

# Shows:
# Lack of electricity occurred in April 3, 2002
print s


Lack of electricity occurred in April 3, 2002

It is possible to use mutable strings in Python through the UserString module, which defines the MutableString type


In [9]:
import UserString

s = UserString.MutableString('Python')
s[0] = 'p'

print s # shows "python"


python

Mutable Strings are less efficient than immutable strings, as they are more complex (in terms of the structure), which is reflected in increased consumption of resources (CPU and memory). The unicode strings can be converted to conventional strings through the decode() method and the reverse path can be done by the method encode(). Example:


In [10]:
# Unicode String 
u = u'Hüsker Dü'
# Convert to str
s = u.encode('latin1')
print s, '=>', type(s)

# String str
s = 'Hüsker Dü'
u = s.decode('latin1')

print repr(u), '=>', type(u)


H�sker D� => <type 'str'>
u'H\xc3\xbcsker D\xc3\xbc' => <type 'unicode'>

To use both methods, it is necessary to pass as an argument the compliant coding. The most used are "latin1" "utf8".

Lists

Lists are collections of heterogeneous objects, which can be of any type, including other lists.

Lists in the Python are mutable and can be changed at any time. Lists can be sliced ​​in the same way as strings, but as the lists are mutable, it is possible to make assignments to the list items.

Syntax:

list = [a, b, ..., z]

Common operations with lists:


In [11]:
# a new list: 70s Brit Progs
progs = ['Yes', 'Genesis', 'Pink Floyd', 'ELP']

# processing the entire list
for prog in progs:
    print prog

# Changing the last element
progs[-1] = 'King Crimson'

# Including
progs.append('Camel')

# Removing
progs.remove('Pink Floyd')

# Ordering 
progs.sort()

# Inverting
progs.reverse()

# prints with number order
for i, prog in enumerate(progs):
    print i + 1, '=>', prog

# prints from de second item
print progs[1:]


Yes
Genesis
Pink Floyd
ELP
1 => Yes
2 => King Crimson
3 => Genesis
4 => Camel
['King Crimson', 'Genesis', 'Camel']

The function enumerate() returns a tuple of two elements in each iteration: a sequence number and an item from the corresponding sequence.

The list has a pop() method that helps the implementation of queues and stacks:


In [ ]:
my_list = ['A', 'B', 'C']
print 'list:', my_list

# The empty list is evaluated as false
while my_list:
    # In queues, the first item is the first to go out
    # pop(0) removes and returns the first item 
    print 'Left', my_list.pop(0), ', remain', len(my_list)

# More items on the list
my_list += ['D', 'E', 'F']
print 'list:', my_list

while my_list:
    # On stacks, the first item is the last to go out
    # pop() removes and retorns the last item
    print 'Left', my_list.pop(), ', remain', len(my_list)


list: ['A', 'B', 'C']
Left A , remain 2
Left B , remain 1
Left C , remain 0
list: ['D', 'E', 'F']
Left F , remain 2
Left E , remain 1
Left D , remain 0

The sort (sort) and reversal (reverse) operations are performed in the list and do not create new lists.

Tuples

Similar to lists, but immutable: it's not possible to append, delete or make assignments to the items.

Syntax:

my_tuple = (a, b, ..., z)

The parentheses are optional.

Feature: a tuple with only one element is represented as:

t1 = (1,)

The tuple elements can be referenced the same way as the elements of a list:

first_element = tuple[0]

Lists can be converted into tuples:

my_tuple = tuple(my_list)

And tuples can be converted into lists:

my_list = list(my_tuple)

While tuple can contain mutable elements, these elements can not undergo assignment, as this would change the reference to the object.

Example (using the interactive mode):

>>> t = ([1, 2], 4)
>>> t[0].append(3)
>>> t
([1, 2, 3], 4)
>>> t[0] = [1, 2, 3]
Traceback (most recent call last):
  File "<input>", line 1, in ?
TypeError: object does not support item assignment
>>>

Tuples are more efficient than conventional lists, as they consume less computing resources (memory) because they are simpler structures the same way immutable strings are in relation to mutable strings.

Other types of sequences

Also in the builtins, Python provides:

  • set: mutable sequence univocal (without repetitions) unordered.
  • frozenset: immutable sequence univocal unordered.

Both types implement set operations, such as: union, intersection e difference.

Example:


In [ ]:
# Data sets
s1 = set(range(3))
s2 = set(range(10, 7, -1))
s3 = set(range(2, 10, 2))

# Shows the data
print 's1:', s1, '\ns2:', s2, '\ns3:', s3

# Union
s1s2 = s1.union(s2)
print 'Union of s1 and s2:', s1s2

# Difference
print 'Difference with s3:', s1s2.difference(s3)

# Intersectiono
print 'Intersection with s3:', s1s2.intersection(s3)

# Tests if a set includes the other
if s1.issuperset([1, 2]):
    print 's1 includes 1 and 2'

# Tests if there is no common elements
if s1.isdisjoint(s2):
    print 's1 and s2 have no common elements'


s1: set([0, 1, 2]) 
s2: set([8, 9, 10]) 
s3: set([8, 2, 4, 6])
Union of s1 and s2: set([0, 1, 2, 8, 9, 10])
Difference with s3: set([0, 1, 10, 9])
Intersection with s3: set([8, 2])
s1 includes 1 and 2
s1 and s2 have no common elements

When one list is converted to a set, the repetitions are discarded.

In version 2.6, a builtin type for mutable characters list, called bytearray is also available.

Dictionaries

A dictionary is a list of associations composed by a unique key and corresponding structures. Dictionaries are mutable, like lists.

The key must be an immutable type, usually strings, but can also be tuples or numeric types. On the other hand the items of dictionaries can be either mutable or immutable. The Python dictionary provides no guarantee that the keys are ordered.

Syntax:

dictionary = {'a': a, 'b': b, ..., 'z': z}

Structure:

Example of a dictionary:

dic = {'name': 'Shirley Manson', 'band': 'Garbage'}

Acessing elements:

print dic['name']

Adding elements:

dic['album'] = 'Version 2.0'

Removing one elemento from a dictionary:

del dic['album']

Getting the items, keys and values:

items = dic.items()
keys = dic.keys()
values = dic.values()

Examples with dictionaries:


In [15]:
# Progs and their albums
progs = {'Yes': ['Close To The Edge', 'Fragile'],
    'Genesis': ['Foxtrot', 'The Nursery Crime'],
    'ELP': ['Brain Salad Surgery']}

# More progs
progs['King Crimson'] = ['Red', 'Discipline']

# items() returns a list of 
# tuples with key and value 
for prog, albums in progs.items():
    print prog, '=>', albums

# If there is 'ELP', removes
if progs.has_key('ELP'):
    del progs['ELP']


Yes => ['Close To The Edge', 'Fragile']
ELP => ['Brain Salad Surgery']
Genesis => ['Foxtrot', 'The Nursery Crime']
King Crimson => ['Red', 'Discipline']

Sparse matrix example:


In [17]:
# Sparse Matrix implemented
# with dictionary

# Sparse Matrix is a structure
# that only stores values that are
# present in the matrix

dim = 6, 12
mat = {}

# Tuples are immutable
# Each tuple represents
# a position in the matrix
mat[3, 7] = 3
mat[4, 6] = 5
mat[6, 3] = 7
mat[5, 4] = 6
mat[2, 9] = 4
mat[1, 0] = 9

for lin in range(dim[0]):
    for col in range(dim[1]):
        # Method get(key, value)
        # returns the key value
        # in dictionary or 
        # if the key doesn't exists
        # returns the second argument
        print mat.get((lin, col), 0),
    print


0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 4 0 0
0 0 0 0 0 0 0 3 0 0 0 0
0 0 0 0 0 0 5 0 0 0 0 0
0 0 0 0 6 0 0 0 0 0 0 0

Generating the sparse matrix:


In [ ]:
# Matrix in form of string
matrix = '''0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 4 0 0
0 0 0 0 0 0 0 3 0 0 0 0
0 0 0 0 0 0 5 0 0 0 0 0
0 0 0 0 6 0 0 0 0 0 0 0'''

mat = {}

# split the matrix in lines
for row, line in enumerate(matrix.splitlines()):

    # Splits the line int cols
    for col, column in enumerate(line.split()):

        column = int(column)
        # Places the column in the result,
        # if it is differente from zero
        if column:
            mat[row, col] = column

print mat
# The counting starts with zero
print 'Complete matrix size:', (row + 1) * (col + 1)
print 'Sparse matrix size:', len(mat)


 {(5, 4): 6, (3, 7): 3, (1, 0): 9, (4, 6): 5, (2, 9): 4}
Complete matrix size: 72
Sparse matrix size: 5

The sparse matrix is a good solution for processing structures in which most of the items remain empty, like spreadsheets for example.

True, False and Null

In Python, the boolean type (bool) is a specialization of the integer type (int). The True value is equal to 1, while the False value is equal to zero.

The following values ​​are considered false:

  • False.
  • None (null).
  • 0 (zero).
  • '' (empty string).
  • [] (empty list).
  • () (empty tuple).
  • {} (emtpy dicionary).
  • Other structures with size equal zero.

All other objects out of that list are considered true.

The object None, which is of type NoneType, in Python represents the null and is evaluated as false by the interpreter.

Boolean Operators

With logical operators it is possible to build more complex conditions to control conditional jumps and loops.

Boolean operators in Python are: and, or , not , is , in.

  • and: returns a true value if and only if it receives two expressions that are true.
  • or : returns a false value if and only if it receives two expressions that are false.
  • not : returns false if it receives a true expression and vice versa.
  • is: returns true if it receives two references to the same object false otherwise.
  • in : returns true if you receive an item and a list and the item occur one or more times in the list false otherwise.

The calculation of the resulting operation and is as follows: if the first expression is true, the result will be the second expression, otherwise it will be the first.

As for the operator or if the first expression is false, the result will be the second expression, otherwise it will be the first. For other operators, the return will be of type bool (True or False).

Examples:


In [20]:
print 0 and 3 # Shows 0
print 2 and 3 # Shows 3

print 0 or 3 # Shows 3
print 2 or 3 # Shows 2

print not 0 # Shows True
print not 2 # Shows False
print 2 in (2, 3) # Shows True
print 2 is 3 # Shows False


0
3
3
2
True
False
True
False

Besides boolean operators, there are the functions all(), which returns true when all of the items in the sequence passed as parameters are true, and any(), which returns true if any item is true.


In [1]:



Out[1]: