HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

URL decode function optimization

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
decodeoptimizationfunctionurl

Problem

I have the following mockup that I need some feedback on how I might improve the url_decode function to make it faster and use less memory. There are a couple of requirements. I have to use the StringArg and StringReturn structs. I also need to completely decode the string. I am aware of double encoded attacks, but this is for data mining and I need to get the string fully decoded, hence the while loop.

Please have a look at function StringReturn url_decode(char line) and suggest anything that I can do to improve the speed or reduce the memory used.

#include 
#include 
#include 
#include 
#include 

struct StringArg
{
    int length;
    char* data;
    StringArg(const char* inData)
    {
        length = strlen(inData);
        data = new char[length+1];
        strcpy(data, inData);
    }
};

struct StringReturn
{
    int size;
    char* data;
    StringReturn(int inSize)
    {
        size = inSize;
        data = new char[size+1];
    }
};

StringArg* stringArg(char* line)
{
    return new StringArg(line);
}

StringReturn* stringReturnInfo(int size)
{
    return new StringReturn(size);
}

StringReturn* url_decode(char* line)
{
        StringArg *str;
        str = stringArg(line);

        int lengths = str->length;
        char* urlData = new char[lengths+1];
        char* fmt = new char[4];
        memcpy(urlData, str->data, str->length);
        urlData[lengths] = 0;
        bool bFoundChar = false;

        int j = 0;

        do
        {
                j = 0;
                bFoundChar = false;
                for (int i = 0; i size = j;

        memcpy(ret->data, urlData, j);
        delete urlData;
        delete fmt;

        return ret;

}

void test_decode(char* line)
{
    StringReturn *str = url_decode(line);
    printf(str->data);
}

int main(int argc, const char * argv[])
{

    test_decode("bob.com?url=http%3A%2F%2Fsteve.com%3Furl%3Dhttp%253A%252F%252Fjohn.com");

    return 0;
}

Solution

This is doomed to failure:

struct StringArg
{
    int length;
    char* data;
    StringArg(const char* inData)
    {
        length = strlen(inData);
        data = new char[length+1];
        strcpy(data, inData);
    }
};


This can only lead to disaster. It is unlikely it is more efficient than std::string and is much more likely to lead to memory leaks.

Use std::string it will work is correct and has been optimized very heavily you can not beat it using your own hand written one. This is a premature optimization.

Same Again:

struct StringReturn
{
    int size;
    char* data;
    StringReturn(int inSize)
    {
        size = inSize;
        data = new char[size+1];
    }
};


Don't pass pointers around:

StringArg* stringArg(char* line)
{
    return new StringArg(line);
}


Use std::unique_ptr to encapsulate the pointer object so that it does correct memory management. If you are doing things correctly the resulting code optimizes to a no-op and if you don't do things correctly it saves your ass. In fact I would just throw away your string classes and use std::string. The copy construction from a return is nearly always elided by NRVO and thus is much more efficient than you seem to think.

Same Again

StringReturn* stringReturnInfo(int size)
{
    return new StringReturn(size);
}


Manually copying a C-string is very painful. Use std::string

int lengths = str->length;
    char* urlData = new char[lengths+1];
    char* fmt = new char[4];
    memcpy(urlData, str->data, str->length);
    urlData[lengths] = 0;

    // Or you can do:

    std::string urlData = line;


Also your code is not exception safe.

Please use RIAA to make sure that there are no leaks (or use std::string that does this already).

The definition of a URL can be found here: https://www.rfc-editor.org/rfc/rfc3986

Pretending the '+' is a space is only valid in the path section of a URI and not valid in the query or fragment parts of the URI thus scanning the whole string for '+' is not correct.

I also need to completely decode the string. I am aware of double encoded attacks

The specification does not allow for double (or nested) encoding. If it decodes to another '%' then that is the value. You don't try and re-interpret that value.

Code Snippets

struct StringArg
{
    int length;
    char* data;
    StringArg(const char* inData)
    {
        length = strlen(inData);
        data = new char[length+1];
        strcpy(data, inData);
    }
};
struct StringReturn
{
    int size;
    char* data;
    StringReturn(int inSize)
    {
        size = inSize;
        data = new char[size+1];
    }
};
StringArg* stringArg(char* line)
{
    return new StringArg(line);
}
StringReturn* stringReturnInfo(int size)
{
    return new StringReturn(size);
}
int lengths = str->length;
    char* urlData = new char[lengths+1];
    char* fmt = new char[4];
    memcpy(urlData, str->data, str->length);
    urlData[lengths] = 0;

    // Or you can do:

    std::string urlData = line;

Context

StackExchange Code Review Q#26103, answer score: 2

Revisions (0)

No revisions yet.