HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

URL percent encoding function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
percentfunctionurlencoding

Problem

I wrote a custom URL encoding function and I'd like to run it past a few other experienced C developers. I have tested it on a few strings, and it has worked on all of them.

This is to be run on iOS devices, so memory and processor use are potentially a small concern.

Do you see any potential problems - UB, wrong return, excessive memory or processor usage, difficulty reading?

Any suggestions for improvement?

NSString *urlEncode(NSString *orig)
{
    static const BOOL safe[0x100] = {
        ['a'...'z'] = YES,
        ['A'...'Z'] = YES,
        ['0'...'9'] = YES,
        ['-'] = YES,
        ['_'] = YES,
        ['.'] = YES,
        ['~'] = YES
    };
    const char *digits = "0123456789ABCDEF";
    const char *cstr = orig.UTF8String;
    int clen = orig.length;
    int nlen = 3*clen;
    char newstr[nlen + 1];

    const char *o = cstr   + clen;
    char *p = newstr + nlen;
    *p = 0;
    while(o > cstr) {
        unsigned char c = *--o;
        if(safe[c]) {
            *--p = c;
        } else {
            *--p = digits[c&0xf];
            *--p = digits[c>>4];
            *--p = '%';
        }
    }
    NSString *str = [NSString stringWithUTF8String:p];
    return str;
}


Yes, newStr could get long, but given that a URL > 2kB is unreliable, I think it's unlikely to be a problem in reality.

Solution

First of all, URL encoding is more nuanced than you might think. Be cautious when applying encoding.

As for your code, the only blatant bug I see is that it can segfault when encoding a non-ASCII string. The -length method returns the number of Unicode characters in the string. Since you are working one char at a time, you need the number of bytes instead.

NSUInteger clen = [orig lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSUInteger nlen = 3 * clen;


It would be better to change the method signature to:

NSString *urlEncode(const NSString *orig)


I think that using a for loop would improve readability. Currently, you declare/initialize *o, test it, and decrement it on three separate lines. A for loop would provide an instantly recognizable structure to make it obvious how those three lines of code are related:

for (const char *o = cstr + clen - 1; o >= cstr; o--) {
    unsigned char c = *o;
    if (safe[c]) {
        *--p = c;
    } else {
        *--p = digits[c & 0x0f];
        *--p = digits[c >> 4];
        *--p = '%';
    }
}


The null char would be better written as '\0'.

Personally, I would choose to iterate forward rather than backward through the string, as it simplifies the loop by a few instructions, and is easier to understand. It also gives you the option of calling -maximumLengthOfBytesUsingEncoding (which runs in O(1) time) instead of -lengthOfBytesUsingEncoding.

unsigned char c;
char *p = newstr;
for (const char *o = cstr; (c = *o); o++) {
    ...
}

Code Snippets

NSUInteger clen = [orig lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSUInteger nlen = 3 * clen;
NSString *urlEncode(const NSString *orig)
for (const char *o = cstr + clen - 1; o >= cstr; o--) {
    unsigned char c = *o;
    if (safe[c]) {
        *--p = c;
    } else {
        *--p = digits[c & 0x0f];
        *--p = digits[c >> 4];
        *--p = '%';
    }
}
unsigned char c;
char *p = newstr;
for (const char *o = cstr; (c = *o); o++) {
    ...
}

Context

StackExchange Code Review Q#29895, answer score: 3

Revisions (0)

No revisions yet.