HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Checking HTTP headers with asyncio and aiohttp

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withasynciocheckingheadershttpaiohttpand

Problem

This is one of my first attempts to do something practical with asyncio. The task is simple:


Given a list of URLs, determine if the content type is HTML for every URL.

I've used aiohttp, initializing a single "session", ignoring SSL errors and issuing HEAD requests to avoid downloading the whole endpoint body. Then, I simply check if text/html is inside the Content-Type header string:

import asyncio

import aiohttp

@asyncio.coroutine
def is_html(session, url):
    response = yield from session.head(url, compress=True)
    print(url, "text/html" in response.headers["Content-Type"])

if __name__ == '__main__':
    links = ["https://httpbin.org/html",
             "https://httpbin.org/image/png",
             "https://httpbin.org/image/svg",
             "https://httpbin.org/image"]
    loop = asyncio.get_event_loop()

    conn = aiohttp.TCPConnector(verify_ssl=False)
    with aiohttp.ClientSession(connector=conn, loop=loop) as session:
        f = asyncio.wait([is_html(session, link) for link in links])
        loop.run_until_complete(f)


The code works, it prints (the output order is inconsistent, of course):

https://httpbin.org/image/svg False
https://httpbin.org/image False
https://httpbin.org/image/png False
https://httpbin.org/html True


But, I'm not sure if I'm using asyncio loop, wait and coroutines, aiohttp's connection and session objects appropriately. What would you recommend to improve?

Solution

IMO your code should look more like this:

import asyncio
import aiohttp
URLS = [...]

if __name__ == "__main__":
    print(
        asyncio.get_event_loop().run_until_complete(
            asyncio.gather(*(foo(url) for url in URLS))))


Where individual URL is processed something like:

async def foo(url):
    async with aiohttp.ClientSession() as s:
        async with s.head(...) as r:
            return url, r.headers[...]


Note separate session for each URL.

Additionally, exception handling may be needed, in which case, it should be encapsulated inside foo.

Code Snippets

import asyncio
import aiohttp
URLS = [...]

if __name__ == "__main__":
    print(
        asyncio.get_event_loop().run_until_complete(
            asyncio.gather(*(foo(url) for url in URLS))))
async def foo(url):
    async with aiohttp.ClientSession() as s:
        async with s.head(...) as r:
            return url, r.headers[...]

Context

StackExchange Code Review Q#159677, answer score: 2

Revisions (0)

No revisions yet.