Coroutines in Python
The first time I learned the word "Coroutines" in Python docs is about one year ago, I escaped from it because it is not the easy thing to understand in a quick time and didn't have enough time to understand it. However, I still studied it in spare time. Now, time to write down my understanding about "coroutines" of Python. This article is about the concept of "coroutines", not a practical article, but I will give you a resource to learn to how to use them.
Generators, coroutines and yield statement. Python supports generators as other programming languages do, Racket, Emacs Lisp, JavaScript, etc. We will start from generators and to get to coroutines.
If you had coded with some programming languages that support generators before or understood it already, feel free to skip this section and go to next section.
In Python, a generator is a function which returns a generator iterator, we also call it a generator function. So, what is the generator iterator? I am assuming you have learned about iterators in Python. The generator iterator is a iterator as well, you can iterate over it, like how you iterate over a list, string, etc., however, there is a thing different from them. I will show you the code and explain it to you later, now let's back to the generator.
Sometime a generator refers to a generator function, sometime it refers to a generator iterator in some context.
Actually, there are two ways to create generator iterators in Python, one is generator function and another one is generator expression.
#! /usr/bin/env python3 # gen_iter.py # generator expression gen_iter_by_expr = (i for i in range(10)) # generator function def gen_iter(): yield 8 gen_iter_by_fun = gen_iter() print("Iterate over 'gen_iter_by_expr'") for i in gen_iter_by_expr: print(i) print("Iterate over 'gen_iter_by_fun'") for i in gen_iter_by_fun: print(i)
Here is the output you will see if running this code.
Iterate over 'gen_iter_by_expr' 0 1 2 3 4 5 6 7 8 9 Iterate over 'gen_iter_by_fun' 8
You might feel that generator expression has no different between list comprehension in Python. Not exactly, although the outputs of them will be same after replacing "gen_iter_by_expr = (i for i in range(10))" by "gen_iter_by_expr = [i for i in range(10)]". List comprehension creates a list, not a generator iterator. If you just want a object for iterating over, then use the one you like. Now I am going to show the differents between them.
A generator iterator can be iterated over only one time. What happens if we do it again? We append one line to "iter.py" above, then run it. And we get nothing be printed out.
for i in gen_iter_by_expr: print(i)
Do you still remember what happens when a list used in a for-loop?
>>> #! /usr/bin/env python3 >>> items = [1, 2, 3] >>> for i in items: >>> i >>> # equals to >>> # get the iterator >>> it = iter(items) >>> # start to iterator over >>> # same as it.__next__() >>> next(it) 1 >>> next(it) 2 >>> next(it) 3 >>> next(it) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>>
This happens to a generator iterator as well. The generator iterator will do that easier.
gen_iter_by_expr = (i for i in range(3)) # same as next(gen_iter_by_expr) or __next__(gen_iter_by_expr) gen_iter_by_expr.send(None) 0 gen_iter_by_expr.send(None) 1 gen_iter_by_expr.send(None) 2 gen_iter_by_expr.send(None) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
There is a thing I have to tell you, you can not pass any argument other than None to the first call of send() method. The fact is that a generator iterator needs to start up with a call of send() method with None. Of course, you can use the next() or __next__() function to do that instead. And these are all the stuffs about generator iterator.
Now you might get another problem, what are the arguments passed to send() method used for? Wait, do not forget I still need to explain the generator function left above to you.
You will see a yield statement in the definition of gen_iter, it turns a function into a generator function. It returns a generator iterator when calling it. What happens when the generator iterator created by it calls the send() method? After all it does not look similar to a generator iterator created by the generator expression. Actually it is easy to understand it.
That the generator iterator created by generator function calls send() method will execute in the way like how the function does, but something different from it. The key point is the yield statement, it is the exit or entry, or you can think it of a simple function delimiter. Let's rewrite the definition of gen_iter to have it works like in the way gen_iter_by_expr does.
#! /usr/bin/env python3 # gen_iter.py # generator function def gen_iter(): yield 8 yield 9 yield 10 gen_iter_by_fun = gen_iter() print("Iterate over 'gen_iter_by_fun'") for i in gen_iter_by_fun: print(i)
How to figure out that a yield statement is a delimiter? In this example, there are 3 yield statements. The first yield statement split the execution into 2 part, the first part is
#! /usr/bin/env python3 def gen_iter(): return 8
and the second part is the rest of the definition.
The workflow of gen_iter:
- When the generator iterator calls send() method at the first time, it returns 8 as the result;
- at the second time, it returns 9 as the result;
- at the third time, it returns 10 as the result.
Now no more execution part are left. However, the yield statement is not such a simple thing as I told you, they are just about "exits". I am going to tell you about "entries". Let's rewrite gen_iter again.
#! /usr/bin/env python3 # gen_iter.py # generator function def gen_iter(): passed = yield 8 passed = yield passed passed = yield passed gen_iter_by_fun = gen_iter() print(gen_iter_by_fun.send(None)) print(gen_iter_by_fun.send(9)) print(gen_iter_by_fun.send(10))
This time let's check the output first.
8 9 10
Remember the problem we ask before? What are the arguments used for? It is clear now. Let me show you the workflow first.
- At the first time gen_iter_by_fun calls send() method with None, it returns 8 as result;
- at the second time, it calls send() method with 9, the variable "passed" in the first statement is assigned 9, and it returns "passed" as the result (the execution point moves to the right-hand-side of the second yield statement, and pauses there), 9, here;
- at the third time, it calls send() method with 10, the variable "passed" in the second statement is assigned 10, and it returns "passed" as the result, 10, here.
Now no more execution part are left again. In short, a yield statement is one delimiter, the right hand side of it is the exit, the value on right hand side is for "outside" (the caller); the left hand side of it is entry or, more precisely, resuming point; the value passed from "outside" to "inside", is the argument to be passed to send() method.
def fundef(): entry_point = yield exit_point
That is why the yield statements can only be used in the definition of function, because of that only procedure has entry and exit.
One more housekeeping, you can rewrite gen_iter with a loop, depending on your purpose.
#! /usr/bin/env python3 # gen_iter.py # generator function def gen_iter(): passed = yield 8 times_left = 2 while times_left: passed = yield passed times_left -= 1 gen_iter_by_fun = gen_iter() print(gen_iter_by_fun.send(None)) print(gen_iter_by_fun.send(9)) print(gen_iter_by_fun.send(10))
Oh, I almost forget one thing before heading to corounties. The yield statement has two forms:
#! /usr/bin/env python3 yield <expr> yield from <expr>
We have already known how to use the first form, the second form provides a transparent two-way channel from the caller to the sub-generator. Now I define a new generator function using "yield from" statement, named `gen_iter_outer'.
#! /usr/bin/env python3 def gen_iter_outer(g): yield from g g = gen_iter_outer(gen_iter()) print(g.send(None)) print(g.send(6)) print(g.send(4)) # If call one more time, then it will raise a StopIteration
The output here. It has same behavior as the generator iterator created by gen_iter().
8 6 4
`gen_iter_outer' function accepts a generator iterator as the input and returns another generator iterator as output, the input is the so-called sub-generator, the output is the caller. It is interesting that when the caller calls send(6), the 6 is passed to the sub-generator. This is the meaning of "a transparent two-way channel from the caller to the sub-generator". And one more thing, a caller cloud have more than one sub-generator.
#! /usr/bin/env python3 def count_to_ten(): print("Start from 1") yield from (i for i in range(1, 6)) print() print("Five numbers left") yield from (i for i in range(6, 11)) for i in count_to_ten(): print(i)
Here is the output.
Start from 1 1 2 3 4 5 Five numbers left 6 7 8 9 10
And I am not going to explain why we need yield from here. Here is why if you want more. Oh, I almost forget that, you might see code like this
# !/usr/bin/env python3 def example_func(*args, **kwargs): data = yield from generator_from_outside(*args, **kwargs) # more actions
If you played around with the `count_to_ten' above by assigning the result of "yield from" statement to a variable, you will see that can not get any value of variable but None. So, where is the value of variable from? Let me show you a working example below.
# !/usr/bin/env python3 def accumarray(num): res = 0 for i in range(1, num+1): res += i yield res return res def yield_from_accumarray(num): res = yield from accumarray(num) print('result is', res) # 1+2+...+10 for res in yield_from_accumarray(10): print(res)
And here is the output, you will see the variable 'res' is 55. Yep, the value of "yield from" statement is the return value of the sub-generator.
1 3 6 10 15 21 28 36 45 55 result is 55
After this, you might think what is the generator used for? Now one more example is coming, it will shows you how to simulate threading. It is about how a scheduler schedules two tasks that are generator iterators and has them executed like how threading governed by operating systems does.
#! /usr/bin/env python3 import time def task_generator(name, num): while num > 0: print("{name} {num}s left".format(name=name, num=num)) num -= 1 passed = yield num task1 = task_generator("task1", 5) task2 = task_generator("task2", 5) def task_scheduler(interval=1, *tasks): queue = list(tasks) while queue: task = queue.pop(0) try: task.send(None) queue.append(task) time.sleep(interval) except StopIteration: pass task_scheduler(1, task1, task2)
If you really understood the stuffs I told you before this example, it does not need me to explain
this example to you anymore. It is a simple example with a simple scheduler, which pops the first task
, calls the task and push it to the tail of the queue until all the tasks are completed. The while loop
looks similar to the event loop which are used by asynchronous programming (but the event loop can do more).
That is all, very simple, right? In reality, we won't implement a scheduler ourselves, Python provide a
module for us to do these works. The module is asyncio, it provides some advanced schedulers that works
with, etc, coroutine, future or task objects. The next section is for coroutines.
Coroutines. Time to explain what are coroutines in Python. Actually, I don't know either, or I am not sure I am right. "Coroutines can be entered, exited, and resumed at many different points.", the official docs documented. If it is that case, then we have already known what are coroutines. Coroutines extend from generators. You might think, coroutines are generator iterators. Not exactly. In the reality, generator and coroutine are two different types. According to the docs, "Generator -based coroutines use the yield from syntax introduced in PEP 380, instead of the original yield syntax."
There are two types of coroutines, generator-based coroutines and native coroutine function. The coroutine function returns a coroutine object, may be defined with the async def statement, and may contain await, async for, and async with keywords (there are too many for me to explain them all). I am not going to explain them to you. I use Python since Python 3.4, there are too many thing had changed, the keywords are some of them. They are the shorthand to asyncio module which introduces the coroutines. That is why I prefer Racket, Python is a good language though.
Coroutines take advantage of generator capabilities, changing context, like how threads and processes do that. We can use them to do (simulate) some asynchronous works, efficiently. Once you understand the concept I told you, you can read the docs of asyncio module or this to finish your works with coroutines. I am not going to copy docs here, so go and read the sources I provided to you.
Continuation from Coroutines - About asyncio protocol in Python 3.5+
Written on 2018.8.22
If you want to use async statement in Python, then you need to understand the protocol of asyncio module.
There are async def, async for, async with, and await, 4 statements introduced since Python 3.5. The sources
of the below are from PEP 492.
async def
New syntax for defining coroutines, the coroutines of this type are native coroutines. The functions defined by "async for" are called native coroutine function.
await
It is used to obtain a result of coroutine execution and only accepts an awaitable. The "await" keyword is not allowed to be used outside "async def" function, as well as the statements, "async for" and "async with", you will be seeing in below.
The awaitable (object) can be one of:
- a native coroutine object returned from a native coroutine function
- a generator-based coroutine object returned from a function decorated with types.coroutine() or which is a function decorated with asyncio.coroutine()
- an object with an __await__() method returning an iterator, the objects of this type are called Future-like objects.
- an object defined with CPython C API with a tp_as_async.am_await function, returning an iterator (similar to __await__() method).
- types.coroutine(). It allows interoperability between existing generator-based coroutines in asyncio and native coroutines introduced by this PEP. That is transforms a generator function into a coroutine function which returns a generator-base coroutine.
async for
It makes it possible to perform asynchronous calls in iterators. An asynchronous iterable (object) is able to call asynchronous code in its implementation and asynchronous iterator can call asynchronous code in its next method.
The asynchronous iterable are the objects meet this conditions:
- to implement __aiter__() method, returnning an asynchronous iterator object.
- to implement __anext__() method, returnning an awaitable
- to stop iteration __anext__() must raise a StopAsyncIteration exception
async with
It peforms asynchronous calls when entering and exiting an runtime context. An asynchronous context manager is a context manager that is able to suspend execution in its enter and exit methods, they are __aenter__() and __aexit__(). Both must return an awaitable.
So, how do generator-based and native coroutine look like?
The asyncio module provides two functions to help us to determine if it is a "coroutine":
- asyncio.iscoroutine to check whether a object is a coroutine object.
- asyncio.iscoroutinefunction to check whether a function is a coroutine function.
- If both return True, then it is a coroutine.
Working Example about Coroutines
In this section, I will show you the difference between generator-based coroutines and native coroutines, as well as difference "old" code and "new" code.
For generator-based coroutines, there are two ways to define a generator-based coroutine. In this example, they are "asyncio_genbase_coroutine_func" and "types_genbase_coroutine_func".
# -*- coding: utf-8 -*- # !/usr/bin/env python3 # genbase-coroutine.py import asyncio import types @asyncio.coroutine def asyncio_genbase_coroutine_func(num): # according to official documentation, any "yield" statement # is not allowed here. # as for "yield from" statements, not all of them are allowed, # only "yield from <awaitable> are allowed. return [i for i in range(num)] @types.coroutine def types_genbase_coroutine_func(num): data = yield from asyncio_genbase_coroutine_func(num) # or just simple as this # return asyncio_genbase_coroutine_func(num) return data task1 = types_genbase_coroutine_func(5) loop = asyncio.get_event_loop() loop.run_until_complete(task1) loop.close()
asyncio.get_event_loop() gets the default event loop (like I show you earlier), once we have this we could run the task1 synchronously with loop.run_until_complete().
For native coroutines, only "async def" statement can define them.
# -*- coding: utf-8 -*- # !/usr/bin/env python3 # native-coroutine.py import asyncio async def native_coroutine(num): return [i for i in range(num)] async def native_coroutine_outer(num): data = await native_coroutine(num) return data # or just # return await native_coroutine(num) task1 = native_coroutine_outer(5) loop = asyncio.get_event_loop() loop.run_until_complete(task1) loop.close()
If you want to execute the tasks asynchronously, then like this
task1 = native_coroutine_outer(5) task2 = native_coroutine_outer(10) loop.ensure_future(task1) loop.ensure_future(task2) loop.run_forever()
loop.run_forever() will run the loop forever to process everything.
The generator-based function is the "old" style to define coroutines, "async def" and "await" statement is the "new" style. As you see, they look so similar, we can rewrite code from "old" to "new" by removing the decorator and replacing the "def" with "async def" and "yield from" with "await".
About "async for" and "async with", someday I maybe introduce them to you just like these new sections today.
2019/3/30
async for and async with are just the "async" version for "for" and "with" statements, the difference is that they only manipulate coroutine,
like async with coroutine as res:, you could read more example from documentation of aiohttp library.