patternpythonMinor
Checking HTTP headers with asyncio and aiohttp
Viewed 0 times
withasynciocheckingheadershttpaiohttpand
Problem
This is one of my first attempts to do something practical with
Given a list of URLs, determine if the content type is HTML for every URL.
I've used
The code works, it prints (the output order is inconsistent, of course):
But, I'm not sure if I'm using
asyncio. The task is simple: Given a list of URLs, determine if the content type is HTML for every URL.
I've used
aiohttp, initializing a single "session", ignoring SSL errors and issuing HEAD requests to avoid downloading the whole endpoint body. Then, I simply check if text/html is inside the Content-Type header string:import asyncio
import aiohttp
@asyncio.coroutine
def is_html(session, url):
response = yield from session.head(url, compress=True)
print(url, "text/html" in response.headers["Content-Type"])
if __name__ == '__main__':
links = ["https://httpbin.org/html",
"https://httpbin.org/image/png",
"https://httpbin.org/image/svg",
"https://httpbin.org/image"]
loop = asyncio.get_event_loop()
conn = aiohttp.TCPConnector(verify_ssl=False)
with aiohttp.ClientSession(connector=conn, loop=loop) as session:
f = asyncio.wait([is_html(session, link) for link in links])
loop.run_until_complete(f)The code works, it prints (the output order is inconsistent, of course):
https://httpbin.org/image/svg False
https://httpbin.org/image False
https://httpbin.org/image/png False
https://httpbin.org/html TrueBut, I'm not sure if I'm using
asyncio loop, wait and coroutines, aiohttp's connection and session objects appropriately. What would you recommend to improve?Solution
IMO your code should look more like this:
Where individual URL is processed something like:
Note separate session for each URL.
Additionally, exception handling may be needed, in which case, it should be encapsulated inside
import asyncio
import aiohttp
URLS = [...]
if __name__ == "__main__":
print(
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*(foo(url) for url in URLS))))Where individual URL is processed something like:
async def foo(url):
async with aiohttp.ClientSession() as s:
async with s.head(...) as r:
return url, r.headers[...]Note separate session for each URL.
Additionally, exception handling may be needed, in which case, it should be encapsulated inside
foo.Code Snippets
import asyncio
import aiohttp
URLS = [...]
if __name__ == "__main__":
print(
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*(foo(url) for url in URLS))))async def foo(url):
async with aiohttp.ClientSession() as s:
async with s.head(...) as r:
return url, r.headers[...]Context
StackExchange Code Review Q#159677, answer score: 2
Revisions (0)
No revisions yet.