HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Joining url path components intelligently

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pathintelligentlycomponentsurljoining

Problem

I'm a little frustrated with the state of url parsing in python, although I sympathize with the challenges. Today I just needed a tool to join path parts and normalize slashes without accidentally losing other parts of the URL, so I wrote this:

from urlparse import urlsplit, urlunsplit

def url_path_join(*parts):
    """Join and normalize url path parts with a slash."""
    schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
    # Use the first value for everything but path. Join the path on '/'
    scheme   = next((x for x in schemes if x), '')
    netloc   = next((x for x in netlocs if x), '')
    path     = '/'.join(x.strip('/') for x in paths if x)
    query    = next((x for x in queries if x), '')
    fragment = next((x for x in fragments if x), '')
    return urlunsplit((scheme, netloc, path, query, fragment))


As you can see, it's not very DRY, but it does do what I need, which is this:

>>> url_path_join('https://example.org/fizz', 'buzz')
'https://example.org/fizz/buzz'


Another example:

>>> parts=['https://', 'http://www.example.org', '?fuzz=buzz']
>>> '/'.join([x.strip('/') for x in parts]) # Not sufficient
'https:/http://www.example.org/?fuzz=buzz'
>>> url_path_join(*parts)
'https://www.example.org?fuzz=buzz'


Can you recommend an approach that is readable without being even more repetitive and verbose?

Solution

I'd suggest the following improvements (in descending order of importance):

  • Extract your redundant generator expression to a function so it only occurs once. To preserve flexibility, introduce default as an optional parameter



  • This makes the comment redundant because first is a self-documenting name (you could call it first_or_default if you want to be more explicit), so you can remove that



  • Rephrase your docstring to make it more readable: normalize and with a slash don't make sense together



  • PEP 8 suggests not to align variable assignments, so does Clean Code by Robert C. Martin. However, it's more important to be consistent within your project.



def url_path_join(*parts):
    """Normalize url parts and join them with a slash."""
    schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
    scheme = first(schemes)
    netloc = first(netlocs)
    path = '/'.join(x.strip('/') for x in paths if x)
    query = first(queries)
    fragment = first(fragments)
    return urlunsplit((scheme, netloc, path, query, fragment))

def first(sequence, default=''):
    return next((x for x in sequence if x), default)


If you're looking for something a bit more radical in nature, why not let first handle several sequences at once? (Note that unfortunately, you cannot combine default parameters with sequence-unpacking in Python 2.7, which has been fixed in Python 3.)

def url_path_join(*parts):
    """Normalize url parts and join them with a slash."""
    schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
    scheme, netloc, query, fragment = first_of_each(schemes, netlocs, queries, fragments)
    path = '/'.join(x.strip('/') for x in paths if x)
    return urlunsplit((scheme, netloc, path, query, fragment))

def first_of_each(*sequences):
    return (next((x for x in sequence if x), '') for sequence in sequences)

Code Snippets

def url_path_join(*parts):
    """Normalize url parts and join them with a slash."""
    schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
    scheme = first(schemes)
    netloc = first(netlocs)
    path = '/'.join(x.strip('/') for x in paths if x)
    query = first(queries)
    fragment = first(fragments)
    return urlunsplit((scheme, netloc, path, query, fragment))

def first(sequence, default=''):
    return next((x for x in sequence if x), default)
def url_path_join(*parts):
    """Normalize url parts and join them with a slash."""
    schemes, netlocs, paths, queries, fragments = zip(*(urlsplit(part) for part in parts))
    scheme, netloc, query, fragment = first_of_each(schemes, netlocs, queries, fragments)
    path = '/'.join(x.strip('/') for x in paths if x)
    return urlunsplit((scheme, netloc, path, query, fragment))

def first_of_each(*sequences):
    return (next((x for x in sequence if x), '') for sequence in sequences)

Context

StackExchange Code Review Q#13027, answer score: 6

Revisions (0)

No revisions yet.