HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Fastest way of removing a substring from a string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
removingsubstringwayfastestfromstring

Problem

Taking a break from my C++ series, I recently reviewed the code in this question. I ended up writing my own separate derivation of the function so that I could remove all of the substrings from a certain string.

#include 
#include 
#include 

#define NUM_CYCLES 10000000

void rmSubstr(char *str, const char *toRemove)
{
    size_t length = strlen(toRemove);
    while((str = strstr(str, toRemove)))
    {
        memmove(str, str + length, 1 + strlen(str + length));
    }
}

clock_t runTest(void (*fn)(char*, const char*), char *str, char *substr)
{
    clock_t start = clock();
    for (int i = 0; i < NUM_CYCLES; ++i)
    {
        fn(str, substr);
    }
    return clock() - start;
}

int main(void)
{
    char* str1 = strdup("test this string out");
    char* str2 = strdup("test this ...........slightly............ longer thing which contains a sub string out");
    char* str3 = strdup("string");
    char* str4 = strdup("this string is not a string test");

    printf("Function ran in %lu clock cycles\nProduced output: %s\n\n", runTest(rmSubstr, str1, " string"), str1);
    printf("Function ran in %lu clock cycles\nProduced output: %s\n\n", runTest(rmSubstr, str2, " string"), str2);
    printf("Function ran in %lu clock cycles\nProduced output: %s\n\n", runTest(rmSubstr, str3, " string"), str3);
    printf("Function ran in %lu clock cycles\nProduced output: %s\n\n", runTest(rmSubstr, str4, " string"), str4);
}


Generates on my MacBook Pro (Intel Core i5 2.4 GHz with 2 cores, 4 GB RAM, running Mac OS X 10.10 beta 4):

Function ran in 319220 clock cycles
Produced output: test this out

Function ran in 1293377 clock cycles
Produced output: test this ...........slightly............ longer thing which contains a sub out

Function ran in 198460 clock cycles
Produced output: string

Function ran in 448832 clock cycles
Produced output: this is not a test

Program ended with exit code: 0


The only thing I would like to be reviewed in this question is the speed of th

Solution

It should be possible to optimize a global search-and-replace by doing all the moves in one pass.

void rmSubstr(char *str, const char *toRemove)
{
    size_t length = strlen(toRemove);
    char *found,
         *next = strstr(str, toRemove);

    for (size_t bytesRemoved = 0; (found = next); bytesRemoved += length)
    {
        char *rest = found + length;
        next = strstr(rest, toRemove);
        memmove(found - bytesRemoved,
                rest,
                next ? next - rest: strlen(rest) + 1);
    }
}


Output using your test code:

Function ran in 351851 clock cycles
Produced output: test this out

Function ran in 1349681 clock cycles
Produced output: test this ...........slightly............ longer thing which contains a sub out

Function ran in 197570 clock cycles
Produced output: string

Function ran in 416682 clock cycles
Produced output: this is not a test

Program ended with exit code: 0

Code Snippets

void rmSubstr(char *str, const char *toRemove)
{
    size_t length = strlen(toRemove);
    char *found,
         *next = strstr(str, toRemove);

    for (size_t bytesRemoved = 0; (found = next); bytesRemoved += length)
    {
        char *rest = found + length;
        next = strstr(rest, toRemove);
        memmove(found - bytesRemoved,
                rest,
                next ? next - rest: strlen(rest) + 1);
    }
}

Context

StackExchange Code Review Q#67055, answer score: 6

Revisions (0)

No revisions yet.