Threading

2018-01-07

Basics

  • A thread is the smallest unit that computer can schedule to run.
  • One process can contain multiple threads.
    • These threads share the memory and the state of the process. i.e. variables and commands are shared.
    • Threads are run by scheduling or time slicing in a single processor machine.
  • Why threading?
    • A computer with multiple CPUs can run programs faster
    • Given a process, threads share the memory of global variables; change in variable reflects for all threads. Threads can have a local variable.

Notes


In [1]:
import threading
from _thread import start_new_thread, allocate_lock
import logging
import time
import numpy as np

Example 1: run multiple threads


In [2]:
def worker1():
    print(threading.currentThread().getName(), '--begin')
    time.sleep(3)
    print(threading.currentThread().getName(), '--end')
    
def worker2():
    print(threading.currentThread().getName(), '--BEGIN')
    time.sleep(2)
    print(threading.currentThread().getName(), '--END')
    
w1 = threading.Thread(name='firstWorker', target=worker1)
w2 = threading.Thread(name='secondWorker', target=worker2)
w3 = threading.Thread(name='thirdWorker', target=worker1)
w4 = threading.Thread(target=worker2)

w1.start()
w2.start()
w3.start()
w4.start()

# threads run asynchronously and look messy in the results.
# So, logging module can help tracking threads.


firstWorker secondWorkerthirdWorkerThread-4--begin   
--BEGIN--begin--BEGIN


secondWorkerThread-4  --END--END

firstWorker --end
thirdWorker --end

Example 2: use logging for tracking threads


In [3]:
logging.basicConfig(level=logging.DEBUG,
                   format='[%(levelname)s] (%(threadName)-10s) %(message)s')

def worker1():
    logging.debug('--begin') # <-- use logging instead of print function!
    time.sleep(3)
    logging.debug('--end')
    
def worker2():
    logging.debug('--BEGIN')
    time.sleep(2)
    logging.debug('--END')
    
w1 = threading.Thread(name='firstWorker', target=worker1)
w2 = threading.Thread(name='secondWorker', target=worker2)
w3 = threading.Thread(name='thirdWorker', target=worker1)
w4 = threading.Thread(target=worker2)

w1.start()
w2.start()
w3.start()
w4.start()


[DEBUG] (firstWorker) --begin
[DEBUG] (secondWorker) --BEGIN
[DEBUG] (thirdWorker) --begin
[DEBUG] (Thread-5  ) --BEGIN
[DEBUG] (secondWorker) --END
[DEBUG] (Thread-5  ) --END
[DEBUG] (firstWorker) --end
[DEBUG] (thirdWorker) --end

Example 3: threads without locking


In [4]:
num_threads = 0
def elephant(a):
    global num_threads
    num_threads += 1
    time.sleep(0.1)
    num_threads -= 1
    print(num_threads)

start_new_thread(elephant,(99,))
start_new_thread(elephant,(999,))
start_new_thread(elephant,(1733,))
start_new_thread(elephant,(17334,))

if num_threads == 0:
    print('num_threads is 0')
else:
    print('num_threads is not 0')

# The output is strange and it also changes every time you run.
# That's because the global variable num_threads are accessed and changed simultaneously!
# Using locking method can help salvage this situation.


num_threads is 0
2310



Example 4: threads with locking


In [5]:
num_threads = 0
lock = allocate_lock() # make a lock object
def elephant(a):
    global num_threads
    lock.acquire() # start locking
    num_threads += 1 
    time.sleep(0.1)
    num_threads -= 1
    lock.release() # releasing locking
    print(num_threads)

start_new_thread(elephant,(99,))
start_new_thread(elephant,(999,))
start_new_thread(elephant,(1733,))
start_new_thread(elephant,(17334,))

if num_threads == 0:
    print('num_threads is 0')
else:
    print('num_threads is not 0')


num_threads is 0
0
0
0
0

Example 5: daemon vs. non-daemon threads


In [6]:
# Daemon threads die when the main program dies.
# Non-daemon threads do not die until the work is done.
# NB. Default is non-daemon mode.

logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-10s) %(message)s')

def daemon():
    logging.debug('Start')
    time.sleep(3)
    logging.debug('Done')
    
d = threading.Thread(name='Daemon', target=daemon)
d.setDaemon(True)

def non_daemon():
    logging.debug('Start')
    logging.debug('Done')

t = threading.Thread(name='Non-daemon', target=non_daemon)

d.start()
t.start()

# It's supposed not to show the last line "[DEBUG] (Daemon    ) Done"
# But, Jupyter Notebook seemingly runs each cell as a program.
# If you run in the command line as 
#   $ python test.py
# it will throw logs:
#    [DEBUG] (Daemon    ) Start
#    [DEBUG] (Non-daemon) Start
#    [DEBUG] (Non-daemon) Done
#
# This program exits before the daemon thread finishes.
#
# Adding below lines will make the program waits until daemon threads finish.
#
# d.join()
# t.join()


[DEBUG] (Daemon    ) Start
[DEBUG] (Non-daemon) Start
[DEBUG] (Non-daemon) Done
[DEBUG] (Daemon    ) Done

Example 6: subclassing thread (simple)


In [7]:
logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-10s) %(message)s')

class SimpleThread(threading.Thread):
    def run(self):
        logging.debug('Running now...')
        return
    
for i in range(4):
    t = SimpleThread()
    t.start()


[DEBUG] (Thread-6  ) Running now...
[DEBUG] (Thread-7  ) Running now...
[DEBUG] (Thread-8  ) Running now...
[DEBUG] (Thread-9  ) Running now...

Example 7: subclassing thread (complex)


In [11]:
# If you want to customize threads, 
# make instance attributes so that they can be seen.

logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-10s) %(message)s')

class AwesomeThreads(threading.Thread):
    def __init__(self, group=None, target=None, name=None,
                args=(), kwargs=None, daemon=None):
        # This redefining let you control threads and save values
        threading.Thread.__init__(self, group=group, target=target, 
                                  name=name, daemon=daemon)
        self.args = args
        self.kwargs = kwargs
        return
        
    def run(self):
        logging.debug('My args={}, My kwargs={}'.format(self.args, self.kwargs))
        return

for i in range(6):
    t = AwesomeThreads(args=(i,), kwargs={'하나':1, '둘':2})
    t.start()


[DEBUG] (Thread-10 ) My args=(0,), My kwargs={'하나': 1, '둘': 2}
[DEBUG] (Thread-11 ) My args=(1,), My kwargs={'하나': 1, '둘': 2}
[DEBUG] (Thread-12 ) My args=(2,), My kwargs={'하나': 1, '둘': 2}
[DEBUG] (Thread-13 ) My args=(3,), My kwargs={'하나': 1, '둘': 2}
[DEBUG] (Thread-14 ) My args=(4,), My kwargs={'하나': 1, '둘': 2}
[DEBUG] (Thread-15 ) My args=(5,), My kwargs={'하나': 1, '둘': 2}

Next step will be to understand TensorFlow threading