patternpythonMinor
Joining url path components intelligently
Viewed 0 times
pathintelligentlycomponentsurljoining
Problem
I'm a little frustrated with the state of url parsing in python, although I sympathize with the challenges. Today I just needed a tool to join path parts and normalize slashes without accidentally losing other parts of the URL, so I wrote this:
As you can see, it's not very DRY, but it does do what I need, which is this:
Another example:
Can you recommend an approach that is readable without being even more repetitive and verbose?
from urlparse import urlsplit, urlunsplit
def url_path_join(*parts):
"""Join and normalize url path parts with a slash."""
schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
# Use the first value for everything but path. Join the path on '/'
scheme = next((x for x in schemes if x), '')
netloc = next((x for x in netlocs if x), '')
path = '/'.join(x.strip('/') for x in paths if x)
query = next((x for x in queries if x), '')
fragment = next((x for x in fragments if x), '')
return urlunsplit((scheme, netloc, path, query, fragment))As you can see, it's not very DRY, but it does do what I need, which is this:
>>> url_path_join('https://example.org/fizz', 'buzz')
'https://example.org/fizz/buzz'Another example:
>>> parts=['https://', 'http://www.example.org', '?fuzz=buzz']
>>> '/'.join([x.strip('/') for x in parts]) # Not sufficient
'https:/http://www.example.org/?fuzz=buzz'
>>> url_path_join(*parts)
'https://www.example.org?fuzz=buzz'Can you recommend an approach that is readable without being even more repetitive and verbose?
Solution
I'd suggest the following improvements (in descending order of importance):
If you're looking for something a bit more radical in nature, why not let
- Extract your redundant generator expression to a function so it only occurs once. To preserve flexibility, introduce
defaultas an optional parameter
- This makes the comment redundant because
firstis a self-documenting name (you could call itfirst_or_defaultif you want to be more explicit), so you can remove that
- Rephrase your docstring to make it more readable: normalize and with a slash don't make sense together
- PEP 8 suggests not to align variable assignments, so does Clean Code by Robert C. Martin. However, it's more important to be consistent within your project.
def url_path_join(*parts):
"""Normalize url parts and join them with a slash."""
schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
scheme = first(schemes)
netloc = first(netlocs)
path = '/'.join(x.strip('/') for x in paths if x)
query = first(queries)
fragment = first(fragments)
return urlunsplit((scheme, netloc, path, query, fragment))
def first(sequence, default=''):
return next((x for x in sequence if x), default)If you're looking for something a bit more radical in nature, why not let
first handle several sequences at once? (Note that unfortunately, you cannot combine default parameters with sequence-unpacking in Python 2.7, which has been fixed in Python 3.)def url_path_join(*parts):
"""Normalize url parts and join them with a slash."""
schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
scheme, netloc, query, fragment = first_of_each(schemes, netlocs, queries, fragments)
path = '/'.join(x.strip('/') for x in paths if x)
return urlunsplit((scheme, netloc, path, query, fragment))
def first_of_each(*sequences):
return (next((x for x in sequence if x), '') for sequence in sequences)Code Snippets
def url_path_join(*parts):
"""Normalize url parts and join them with a slash."""
schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
scheme = first(schemes)
netloc = first(netlocs)
path = '/'.join(x.strip('/') for x in paths if x)
query = first(queries)
fragment = first(fragments)
return urlunsplit((scheme, netloc, path, query, fragment))
def first(sequence, default=''):
return next((x for x in sequence if x), default)def url_path_join(*parts):
"""Normalize url parts and join them with a slash."""
schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
scheme, netloc, query, fragment = first_of_each(schemes, netlocs, queries, fragments)
path = '/'.join(x.strip('/') for x in paths if x)
return urlunsplit((scheme, netloc, path, query, fragment))
def first_of_each(*sequences):
return (next((x for x in sequence if x), '') for sequence in sequences)Context
StackExchange Code Review Q#13027, answer score: 6
Revisions (0)
No revisions yet.