HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppModerateCanonical

strstr() implementation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
implementationstrstrstackoverflow

Problem

In this strstr implementation, I am basically skipping the already matched and checked characters with an if else condition. Is this a right way of implementing it?

char *strstrImp(char *input, char *find) {
    if(*find=='\0')
        return input;
    if(*input=='\0')
        return NULL;

    char *start=input;
    char *p;
    while(*start!='\0')
    {
        p=start;
        char *q=find;
        while(*p!='\0' && *q!='\0' && *p==*q)
        {
            p++;
            q++;
        }

        if(*q=='\0')
        {
            return start;
        }
        //update the starting position based on whether any character matched
        if(p == start) //no match
            start++;
        else
            start = p; //skip characters and start from the previously left place
    }
    return NULL;
}

int main()
{
    char * res;
    res = strstrImp("helloworld","low");
    printf("%s",res);
}

Solution

Const-correctness

Since you tagged this question as c++, you must take care of const-correctness:

cr54045.cpp:36:18: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
 res = strstrImp("helloworld","low");
                 ^
cr54045.cpp:36:31: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
 res = strstrImp("helloworld","low");
                              ^


There should be a

const char *strstrImp(const char *input, const char *find)
{
    return strstrImp(const_cast(input), find);
}


Incorrect optimization

Try this test case, which compares your implementation against the standard strstr().

int main()
{
    const char *input = "alfalfalfa romeo", *find = "alfalfa ";
    printf("%s\n", strstrImp(input, find));  // (null)
    printf("%s\n", strstr(input, find));     // alfalfa romeo
}


Logic simplification

Since the start = p optimization is bogus, we will only end the loop with start++. The function body would then look like:

if (*find == '\0')
    return input;
if (*input == '\0')
    return NULL;

char *start = input;
while (*start != '\0')
{
    …
    start++;
}
return NULL;


Next, notice that the if (*input == '\0') return NULL special case is redundant. Also notice that you never refer to input again, so you might as well think of start and input as the same thing. The outline simplifies to:

if (*find == '\0')
    return input;
while (*input != '\0')
{
    …
    input++;
}
return NULL;


Next, let's turn our attention inside the outer loop.

Since p is only ever used inside the loop, you should pull its declaration inside the loop. p and q should both be const char *.

while (*input != '\0')
{
    const char *p = input, *q = find;
    while (*p != 0 && *q != '\0' && *p == *q)
    {
        p++;
        q++;
    }
    if (*q=='\0')
    {
        return input;
    }
    input++;
}


It turns out that the inner-loop condition can be simplified, because the *p == '\0' test is redundant:

$$\begin{array}{ccc|c|c}
\text{p == q} & \text{q != '\0'} & \text{p != '\0'} & {\begin{array}{l}\text{p!='\0' &&}\\\text{q!='\0' &&}\\\text{p==q}\end{array}} & \begin{array}{l}\text{q != '\0' &&}\\\text{p == *q}\end{array} \\
\hline \\
\begin{array}{c}T\\T\end{array} & \begin{array}{c}T\\T\end{array} & \begin{array}{c}T\\F\end{array} & \begin{array}{c}T\\F\ (impossible)\end{array} & T \\
\hline \\
\begin{array}{c}T\\T\end{array} & \begin{array}{c}F\\F\end{array} & \begin{array}{c}T\\F\end{array} & \begin{array}{c}F\ (impossible)\\F\end{array} & F \\
\hline \\
\begin{array}{c}F\\F\end{array} & \begin{array}{c}T\\T\end{array} & \begin{array}{c}T\\F\end{array} & \begin{array}{c}F\\F\end{array} & F \\
\hline \\
\begin{array}{c}F\\F\end{array} & \begin{array}{c}F\\F\end{array} & \begin{array}{c}T\\F\end{array} & \begin{array}{c}F\\F\end{array} & F \\
\end{array}$$

The function can now look like:

if (*find=='\0')
    return input;

while (*input != '\0')
{
    const char *p, *q;
    for (p = input, q = find; *q != '\0' && *p == *q; p++, q++) {}
    if (*q == '\0')
    {
        return input;
    }
    input++;
}
return NULL;


A well designed function shouldn't have surprising behaviour, and therefore shouldn't have special cases. Can we get rid of if (*find == '\0') return input;?

The behaviour of strstr() is, if find is "", then the function returns input. In order to reach the return input; statement when find == "" and input == "", we have to get rid of both the if (find == '\0') special case and the while (input != '\0') guard condition.

char *strstrImp(char *input, const char *find)
{
    do
    {
        const char *p, *q;
        for (p = input, q = find; *q != '\0' && *p == *q; p++, q++) {}
        if (*q == '\0')
        {
            return input;
        }
    } while (*(input++) != '\0');
    return NULL;
}

const char *strstrImp(const char *input, const char *find)
{
    return strstrImp(const_cast(input), find);
}


Suppose we make that change — then what about the case where input == "" and find is non-empty? It turns out, that works just fine: it will flow through all of the conditions and return NULL! There you go — a compact implementation of strstr() with reasonably simple logic and without special cases.

Code Snippets

cr54045.cpp:36:18: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
 res = strstrImp("helloworld","low");
                 ^
cr54045.cpp:36:31: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
 res = strstrImp("helloworld","low");
                              ^
const char *strstrImp(const char *input, const char *find)
{
    return strstrImp(const_cast<char *>(input), find);
}
int main()
{
    const char *input = "alfalfalfa romeo", *find = "alfalfa ";
    printf("%s\n", strstrImp(input, find));  // (null)
    printf("%s\n", strstr(input, find));     // alfalfa romeo
}
if (*find == '\0')
    return input;
if (*input == '\0')
    return NULL;

char *start = input;
while (*start != '\0')
{
    …
    start++;
}
return NULL;
if (*find == '\0')
    return input;
while (*input != '\0')
{
    …
    input++;
}
return NULL;

Context

StackExchange Code Review Q#54045, answer score: 19

Revisions (0)

No revisions yet.