Some definitions

A task in computer programming can be defined as a piece of job that the computer do during a period of time. Tasks consumes computer resources (memory, CPUs, I/O channels, etc.), which are limited. From a task point of view, these resources are used in some rate and following a pattern that depends on the task.

Tasks can be classified in two types: CPU-bound and I/O-bound:

  1. I/O-bound tasks usually must wait for I/O communications. Waiting means that the task cannot progress until some piece of data has arrived. During this time, its a good idea to dedicate the unused resources to a different task that is not waiting for. The resources switching between tasks can be done by the OS or by the programmer. The advantage of implementing the switching is that the communication between tasks is easier and there is more control over the resources.

  2. CPU-bound programs, on the flip side, are those that does not need to wait for I/O operations and keep the CPU working all the time (in fact, the execution time of the task is proportional to CPU resources). In the ideal case, it is expected that by using N identical CPUs the running time should be divided by N. Notice that in this case there is not CPU switching.

Concurrency is the action of running several tasks in an interval of time, interleaving them or in parallel:

  1. If a task A is I/O-bounded, the resources are re-assigned to a different task B when the A waits for the I/O. What the user see is that several tasks are running concurrently, although what is really happening is that only one of the tasks is active in a given moment of time. In this case we are interleaving the tasks.

  2. If a task is CPU-bounded, to speed it up the only solution is to dedicate several CPUs, at the same time, in parallel.

Notice that it is also possible to use paralellism for I/O-bound tasks (see for example channel bonding).

Concurrency in Python

Python provides 3 alternatives for running more than one task (see See https://realpython.com/python-concurrency/), concurrently:

  1. threading. A thread) is a sequence of instructions that periodically are executed, typically, in concurrency (and in parallel if the computing resources allows that) with other threads of the same process space. However, CPU-bound tasks are not a good fit for CPython threads, due to the Global Interpreter Lock (GIL) that CPython uses is only able to run one one thread in parallel. Concurrency based on threads is also called Pre-emptive multitasking. Therefore, in CPython, threads are useful when the problem to solve has a lot of blocking instructions (basically, it uses I/O operations that waits for a data transference completion, such as the incomming of a packet from a socket).

  2. multiprocessing. This is the name that the standard library provides for providing multiprocessing, that is, the posibility of using more than one core when necessary through several cooperating tasks. A process creation is slower than a thread creation. The communication between processes (that must be created by the Python module that uses multiprocessing) is also slower to run and harder to implement than in the case of the threads, where all of them share the same process space. Moreover, the process switching overhead is higher than the thread switching overhead. If the number of blocking operations is high, threads can be better than processes.

  3. Coroutines. A coroutine is a function that voluntarly gives the CPU control (yields) to a different coroutine, by explicitally indicating where to stop and if necessary, which task to wake up. Coroutines remember their running context when they resume. Coroutines are also named cooperative tasks, and therefore, concurrency based on coroutines is also called cooperative multitasking. Coroutines should be used when the user explicitly want to specify the points in the code where the execution must be transfered between tasks. This produces a negligible switching overhead, compared to the threads, where the OS runs them using fine-grain random thread switching. Notice also that the OS in this case does not need to implement concurrency. Regarding the context space, like threads, all coroutines run in the same one. Corroutines are usually implemented using the asyncio library which provides ..., although there exists also the posibility of implementing them with this facility. See https://www.educative.io/blog/python-concurrency-making-sense-of-asyncio and https://medium.com/analytics-vidhya/python-generators-coroutines-async-io-with-examples-28771b586578 and https://realpython.com/async-io-python/ and https://stackabuse.com/coroutines-in-python/


In [ ]: