HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Find the first occurrence of a substring in a string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
findthesubstringfirstoccurrencestring

Problem

This is a simple version from the function strstr. It returns the address of the first occurrence of the substring s2 in s1. I want to know possible problems in the code, and how to improve it, in general.

#include 

/* Version of strstr, which returns the adress of
   the first occurrence of s2 in s1, else, NULL
*/
char *strs(char *s1, char* s2)
{
    char *ptr = NULL;
    int i, n;

    for (i = 0, n = 0; s1[i] != '\0'; i++)
    {
        if (s1[i] == s2[n])
        {
            if (n == 0)
                ptr = &(s1[i]);
            if (s2[n + 1] == '\0')
                return ptr;
            ++n;
        }
        else
            n = 0;
    }
    return NULL;
}

int main(void)
{
    char s1[] = "I'm waiting in my cold cell\nWhen the bell begins to chime\n";
    char s2[] = "waiting";
    char *ptr = strs(s1, s2);

    if (ptr == NULL)
    {
        printf("There is no occurrence of \"%s\" in the first string!\n",
        s2);
    }
    else
    {
        printf("%s\n", ptr);
    }
}

Solution

Comments about your code:

I believe that you would rename it to something like strstr if you could.

You should accept pointers to const char, since you don't modify them.

You should initialize i and n right inside the loop. It will narrow their scope and make the code more maintainable since it won't pollute the scope of the function. On top of that, they will be deallocated whenever the loop ends.

using int for index is a very bad idea. The biggest issues with it that it is signed and not guaranteed to be large enough. You should use size_t instead, because it is guaranteed that size_t will have enough capacity to index array of maximum possible size.

You're increasing overhead by using subscript ([]) operator.

Loop could be rewritten in other, more readable and straightforward form, one of which is shown in suggested implementation. I know that you would like to have high performance, but I don't think that it is that important in this situation. May be with more context of the usage of the algorithm I could suggest particular algorithm (there are quite a few of them). The best in my opinion is Boyer–Moore string search algorithm.

Suggested implementation:

I think that strncmp should be used. I know that including string.h header defeats the purpose, but you could write your own. It increases code reuse in C!

To make it work we just need to precalculate length of the substring.

const char *strs(const char *text, const char *substr)
{
    size_t substrlen = strlen(substr);
    while (*text != 0)
    {
        if (strncmp(text, substr, substrlen) == 0)
        {
            return text;
        }
        ++text;
    }

    return NULL;
}


The performance gain from the data being const depends on physical constness and the compiler. Further reading.

About performance gain..

Code Snippets

const char *strs(const char *text, const char *substr)
{
    size_t substrlen = strlen(substr);
    while (*text != 0)
    {
        if (strncmp(text, substr, substrlen) == 0)
        {
            return text;
        }
        ++text;
    }

    return NULL;
}

Context

StackExchange Code Review Q#138074, answer score: 2

Revisions (0)

No revisions yet.