HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Is this escape unescape function pair correct?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thisfunctionunescapecorrectpairescape

Problem

Is this example of an escape - unescape function correct?

ESCAPE = '/'

def escape(s):
    rv = s
    rv = rv.replace(ESCAPE, ESCAPE + ESCAPE)
    rv = rv.replace('+', ESCAPE + 'PLUS')
    rv = rv.replace('-', ESCAPE + 'MINUS')
    return rv

def unescape(s):
    rv = s
    rv = rv.replace(ESCAPE + 'MINUS', '-')
    rv = rv.replace(ESCAPE + 'PLUS', '+')
    rv = rv.replace(ESCAPE + ESCAPE, ESCAPE)
    return rv

Solution

Performing string substitutions using multiple passes is almost always a bad idea, not only for performance, but for correctness. In this case, there is indeed a bug: unescape('//PLUS/PLUS') should produce '/PLUS+', but instead produces '/++'.

Therefore, you have to parse the input string in one pass:

import re

def unescape(s):
    def replacement(match):
        return {
            '/': '/',
            'PLUS': '+',
            'MINUS': '-',
        }[match.group(1)]
    return re.sub('/(/|PLUS|MINUS)', replacement, s)


In your original code and in the implementation above, there is a lot of repetition. If there are more symbols being escaped than just + and -, then it makes sense to specify the mapping just once.

ESCAPE_CHAR = '/'
ESCAPE = {
    ESCAPE_CHAR: ESCAPE_CHAR,
    '+': 'PLUS',
    '-': 'MINUS',
}
ESCAPE_RE = re.compile('|'.join(map(re.escape, ESCAPE.keys())))

UNESCAPE = dict((v, k) for (k, v) in ESCAPE.items())
UNESCAPE_RE = re.compile(
    re.escape(ESCAPE_CHAR) + '(' +
        '|'.join(map(re.escape, UNESCAPE.keys())) +
    ')')

def escape(s):
    return ESCAPE_RE.sub(lambda match: ESCAPE_CHAR + ESCAPE[match.group(0)], s)

def unescape(s):
    return UNESCAPE_RE.sub(lambda match: UNESCAPE[match.group(1)], s)

Code Snippets

import re

def unescape(s):
    def replacement(match):
        return {
            '/': '/',
            'PLUS': '+',
            'MINUS': '-',
        }[match.group(1)]
    return re.sub('/(/|PLUS|MINUS)', replacement, s)
ESCAPE_CHAR = '/'
ESCAPE = {
    ESCAPE_CHAR: ESCAPE_CHAR,
    '+': 'PLUS',
    '-': 'MINUS',
}
ESCAPE_RE = re.compile('|'.join(map(re.escape, ESCAPE.keys())))

UNESCAPE = dict((v, k) for (k, v) in ESCAPE.items())
UNESCAPE_RE = re.compile(
    re.escape(ESCAPE_CHAR) + '(' +
        '|'.join(map(re.escape, UNESCAPE.keys())) +
    ')')

def escape(s):
    return ESCAPE_RE.sub(lambda match: ESCAPE_CHAR + ESCAPE[match.group(0)], s)

def unescape(s):
    return UNESCAPE_RE.sub(lambda match: UNESCAPE[match.group(1)], s)

Context

StackExchange Code Review Q#57121, answer score: 8

Revisions (0)

No revisions yet.