Knowpapa.com - a developer's blog

Asyncio : A tutorial for the beginners

The word Asyncio is made of two terms: Async + I/O.

“Async” is about concurrency. Concurrency is about doing more than one thing at a time.

The I/O term there means that we use Asyncio to handle I/O bound tasks and not CPU bound tasks. “Bound task” means the thing that you are waiting on.

If you are doing, for instance, a lot of math processing, you are waiting on the processor – which is a CPU bound task. If on the other hand, you are waiting on say a result from the network or result from the database or an input from the user, the task is I/O bound.

Combining the two, Asyncio is the new tool to provide concurrency particularly for some I/O bound task. Concurrency ensures that you do not have to wait for I/O bound results. Asyncio is not useful if the CPU is already busy (CPU bound).

Let’s for a moment consider a simple request-response. The code below fetches the content of a given URL and sends its text as a response.

import requests
def get_content(url):
    response = requests.get(url)
    return response.txt

Every time this code is called, it tries to fetch the response and till the time it has not received the response, it blocks all program execution. This is a bad thing. You don’t want such a wait time on say a high traffic server.

Before asyncio came into the scene, you would handle the above issue using one of:
1) multi-threading
2) multi-processing
3) greenlets.
4) callbacks

Let’s look at these 4 methods in brief.

1) Threading: In multi-threading, you would spawn a new thread for every request, which is good if you have a few threads running simultaneously. Context switching becomes a bottleneck, once it exceeds a threshold amount of threads. In CPython, this is a bigger problem. Since CPython’s memory management is not thread-safe, it uses global interpreter lock(GIL) which is a mutex that prevents multiple threads from executing at once.

2) Multi-processing: This is similar to multi-threading but spawning new processes is a more expensive task than spawning new threads so its usage is even more limiting.

3 ) Greenlets: Here’s a piece of greenlet code:

from gevent import monkey
monkey.patch_all()

Simply adding these two lines of code patches all I/O bound Python code in your program with concurrent non-blocking functions. This voodoo magic may be good for a few lines of code but in a longer code, you want the code to behave the way you write it. In the very least this code violates the principle ‘explicit is better than implicit’.

4) Callbacks: Take a look at this code in jquery.

$.get(url, function(response){
    console.log(response);
}); 

This code handles the same request-response cycle but it is not a blocking code. The response is handled in the second argument of the ‘get’ method, which is a callback function.

Callback functions are great but they are not without their disadvantages. For one, every such callback runs an underlying loop for polling the results. Have a few of them running around and you are making your processor sluggish. This is what some would refer to as the ‘callback hell’. Running of loops to handle such callbacks was also the approach that python frameworks like Tornado initially took to provide some amount of asynchronicity.

Later when Python introduced support for coroutines, Tornado introduced coroutine based code which is much closer to the asyncio code that we discuss next. Here’s a simple Async Handler in Tornado which uses coroutines to provide asynchronicity.

class GenAsyncHandler(RequestHandler):
    @gen.coroutine
    def get(self):
        http_client = AsyncHTTPClient()
        response = yield http_client.fetch("http://example.com")
        do_something_with_response(response)
        self.render("template.html")

When the execution comes to the point of fetching the response, the execution is paused using the yield statement. We resume executing the function only when the response becomes available.

The above code utilises generator functions to generator a coroutine for every request making the above code non-blocking. Tornado manages these coroutines in an I/O loop. Twisted, another framework, similar to Tornado calls this an inline callback.

Before we get to the core of asyncio, it’s important that we understand generator functions.

Here is a simple generator function that outputs numbers sequentially.

def gen_1():
     yield 1
     yield 2
     yield 3

The first time this function runs, it returns 1, next time it returns 2 and so on.

We can call this generator function from another generator.

def gen_2():
     for i in gen_1():
           yield i
     yield 5
     yield 6

Here’s the new syntax for writing the above code in Python 3.3 and above.

def gen_2():
     yield from gen_1():
     yield 5
     yield 6

Now that we understand generators, here’s how the coroutine code looks like.

for t in gen_2():
     t.run()

With this brief discussion on generators, we are now ready to tackle asyncio.

So how does asyncio provide asynchronicity?

asyncio uses a single-threaded approach and starts an event loop using a call to asyncio.get_event_loop(). This loop switches tasks at optimal times. Most often this switching occurs when the program experiences I/O blocking, but asyncio can also be used to handle event driven code or to schedule a code to run at a specific future time. This is what makes it extremely useful for handling real-time updates.

Let’s look at a small piece of asyncio code.

import asyncio
import aiohttp

@asyncio.coroutine
def get_content(url):
      response  = yield from aiohttp.get(url)
      return (yield from response.text())

Here aiohttp is an asynchronous client and server which has some useful features web sockets, middleware and signals.

We are yielding the task of getting the page to aiohttp. This is very similar to the Tornado code that we discussed earlier. Python 3.5 has further refined this syntax as follows.

import aiohttp
async def get_content(url):
     response = await aiohttp.get(url)
     return (await response.text())

So now instead of using the coroutine decorator, you simply use the keyword ‘async’. Similarly, instead of saying yield from you say ‘await’.
This is much more concise and takes away all the confusion there was to the difference between ‘yield’ and ‘yield from’. So now anywhere you see the word async, you know the function will run asynchronously. The ‘pausing’ is done by using the ‘await’ keyword.

You then use the above code with asyncio by running a loop as follows:

import asyncio
loop = asyncio.get_event_loop()
content = loop.run_until_complete(
   get_content('http://knowpapa.com')
)
print(content)

This, in turn, is the core concept behind asyncio.

One more thing. What do you do if you need to get the result of an async function in your normal synchronous code?

You use asyncio.Future(). A Future represents the result of a function that is yet to complete. The asyncio event loop can watch for a Future object’s state, thus allowing your normal synchronous code to wait for the blocking part to finish some work.

Here’s an example that shows this.

import asyncio

def mark_complete(future, result):
    future.set_result(result)

event_loop = asyncio.get_event_loop()
try:
    task_to_complete = asyncio.Future()
    event_loop.call_soon(mark_complete, task_to_complete, 'my result')
    result = event_loop.run_until_complete(task_to_complete)
finally:
    event_loop.close()

The state of the Future changes to ‘done’ when future.set_result() is called. Even after it is done, the instance of Future retains the result. This can be retrieved by calling the result() method on the Future instance as follows.

task_to_complete.result()

If there are results from various sources, you can use asyncio.gather() to get all the results.

    results = await asyncio.gather(
        phase1(),
        phase2(),
    )

Finally, a Future can also invoke callbacks when it completes. Callbacks are invoked in the order they are registered.

def callback(future, n):
    print('{}: future done: {}'.format(n, future.result()))


async def register_callbacks(all_done):
    print('registering callbacks on future')
    all_done.add_done_callback(functools.partial(callback, n=1))
    all_done.add_done_callback(functools.partial(callback, n=2))


async def main(all_done):
    await register_callbacks(all_done)
    print('setting result of future')
    all_done.set_result('the result')

To summarise, asyncio provides a method to run an event loop. Within the loop you can run multiple tasks asynchronously, specifying when to run a code – that could be when an event occurs or say after a given period of time.