We will try to explain the first principles of functional programming by trying as much as possible to avoid big FP words.
We will use python for this exercise, because it's easy to read; and it provides all the functional features required for understanding concepts described in this document.
In [6]:
def count_w(filename):
count = 0
offset = 0
file = open(filenames)
for line in file:
for w in line.split():
count += 1
return count
# But This Doesnt Scale!
When things don't scale,
Lets try the same example with Threads
In [2]:
count = 0
def count_w_chunk(file_chunk):
global count
count = 0
offset = 0
file = open(filenames)
for line in file:
for w in line.split():
count += 1
#actual driver code
def count_w(file):
threads = [] # empty thread list
chunks = file.make_chunks()
# magic function parallelize
for chunk in chunks:
parallelize(count_w_chunks(chunk))
print (count)
In [3]:
count = 0
def increment(delta):
global count
temp = count
temp += delta
count = temp
return count
# what will happen if increment(1) is run parallelly 1000 times ?
The result will be less than 1000 most of the time. We will need some sort of locking/synchronization mechanism to avoid writing the same value to the variable over and over again.
In [4]:
# this will work, but sucks
count = 0
def increment(delta):
global count
with synchronized(count):
count += delta
return count
The culprit in the above code is the assignment. Re-assignment to be precise. Once initialized, a variable
can have different values at different times, depending on what we are assigning
to it.
If we had code that did not change the state
of count
variable, we would'nt need to synchronize the assignments.
When a function changes it state
while it's being executed, and the state
change is persisted between execution.
Put simply but inaccurately, if a function might different values while invoking it with the same parameters
Functions that changes state
outside it's scope also cause side effects. For example,
code
def foo():
with open('filename', 'a') as f:
f.write('side effect')
return 0
The above function returns 0 all the time, but still changes the state
of the file system
math
f(x) = x^2 + 2x + 3
They do no state manipulation. f(25) is 678 no matter how many times you call it.
These kind of functions are called pure
functions.
Read more about pure functions here
While you are at it, read about referential transparency
Its okay if you don't get it.
In [5]:
# Modeling a 'pure' function
def f(x):
return x*x + 2*x + 3
In [25]:
f(9)
Out[25]:
In [7]:
# frange(0,1,0.1) == [0, .1, .2, .3, .4 ... .9]
def frange(a,b,delta):
i = a
while i < b:
yield i
i += delta
# yes there are assignments in the above code, they are avoidable using recursion
In [8]:
# integration; approximation of area under the curve. for more accurate value, reduce the value of err
def integ(f, a,b):
err = 0.01
return sum( [f(x) * err for x in frange(a,b,err)]
)
In [9]:
def square(x):
return x*x
integ(square, 0, 2)
Out[9]:
In [10]:
# functions returning functions
def add_value(value):
def adder(x):
return x + value
return adder
In [11]:
add5 = add_value(5)
In [12]:
add5(22)
Out[12]:
In [13]:
def square(x):
return x*x
# is the same as
lambda x: x*x
Out[13]:
In [14]:
# functions taking functions as argument
def fogger(f,g):
def fog(x):
return f(g(x))
return fog
In [15]:
def add6(x):
return x + 6
new_function = fogger (add6, add5)
new_function(20)
Out[15]:
In [16]:
# map(funct, list) --> new list
# applies the funct on each element of the list and returns a new list
def square(x):
return x*x
map(square, [1,2,3,4,5,6])
Out[16]:
In [17]:
# lambdas are useful in one-liners like these
map(lambda x: x*x, [1,2,3,4,5,6])
Out[17]:
In [18]:
# reduce(func, list) -- > new value
# reduces a list to a single value by combining elements (left to right) using func
reduce(lambda x,y: x + y , [1,2,3,4,5])
Out[18]:
In [19]:
# reduce can also take an inital value to start with
reduce(lambda x,y: x + y , [1,2,3,4,5], 25)
Out[19]:
In [20]:
reduce (lambda x, y: x / y, [4,2,1]) # ((4/2) / 1 )
Out[20]:
In [21]:
help(reduce)
In [22]:
# filter (predicate, list) --> new list
filter(lambda x: len(x) <= 3, ["str", "string", "srj", "sreeraj", "fp", "functional programming"])
Out[22]:
Revisit the question, if you forgot
How can we solve it in a paralell way, while avoiding mutations?
file --> [chunk ..] --> [words] --> value
In [23]:
def count_chunk(chunk):
## chunk is made of lines, lines are made of chunks
words = [word for line in lines for word in line.split()]
return len(words)
def count(file):
chunks = make_chunks(file)
reduce(sum, map(count_chunk, file.make_chunks()))
The code in vanilla python is not parallelized. because there is no state change happening in the code, it is possible to replace vanilla map and reduce functions with something like map_p
and reduce_p
, which are parallel implementations of map and reduce.
This is the core principle behind hadoop mapreduce
framework, where you think in terms of map and reduce to execute complex data transformations on huge amount of data
In [ ]: