patterncMinor
Find the first occurrence of a substring in a string
Viewed 0 times
findthesubstringfirstoccurrencestring
Problem
This is a simple version from the function strstr. It returns the address of the first occurrence of the substring s2 in s1. I want to know possible problems in the code, and how to improve it, in general.
#include
/* Version of strstr, which returns the adress of
the first occurrence of s2 in s1, else, NULL
*/
char *strs(char *s1, char* s2)
{
char *ptr = NULL;
int i, n;
for (i = 0, n = 0; s1[i] != '\0'; i++)
{
if (s1[i] == s2[n])
{
if (n == 0)
ptr = &(s1[i]);
if (s2[n + 1] == '\0')
return ptr;
++n;
}
else
n = 0;
}
return NULL;
}
int main(void)
{
char s1[] = "I'm waiting in my cold cell\nWhen the bell begins to chime\n";
char s2[] = "waiting";
char *ptr = strs(s1, s2);
if (ptr == NULL)
{
printf("There is no occurrence of \"%s\" in the first string!\n",
s2);
}
else
{
printf("%s\n", ptr);
}
}Solution
Comments about your code:
I believe that you would rename it to something like
You should accept pointers to
You should initialize
using
You're increasing overhead by using subscript (
Loop could be rewritten in other, more readable and straightforward form, one of which is shown in suggested implementation. I know that you would like to have high performance, but I don't think that it is that important in this situation. May be with more context of the usage of the algorithm I could suggest particular algorithm (there are quite a few of them). The best in my opinion is Boyer–Moore string search algorithm.
Suggested implementation:
I think that
To make it work we just need to precalculate length of the substring.
The performance gain from the data being const depends on physical constness and the compiler. Further reading.
About performance gain..
I believe that you would rename it to something like
strstr if you could.You should accept pointers to
const char, since you don't modify them.You should initialize
i and n right inside the loop. It will narrow their scope and make the code more maintainable since it won't pollute the scope of the function. On top of that, they will be deallocated whenever the loop ends.using
int for index is a very bad idea. The biggest issues with it that it is signed and not guaranteed to be large enough. You should use size_t instead, because it is guaranteed that size_t will have enough capacity to index array of maximum possible size.You're increasing overhead by using subscript (
[]) operator.Loop could be rewritten in other, more readable and straightforward form, one of which is shown in suggested implementation. I know that you would like to have high performance, but I don't think that it is that important in this situation. May be with more context of the usage of the algorithm I could suggest particular algorithm (there are quite a few of them). The best in my opinion is Boyer–Moore string search algorithm.
Suggested implementation:
I think that
strncmp should be used. I know that including string.h header defeats the purpose, but you could write your own. It increases code reuse in C!To make it work we just need to precalculate length of the substring.
const char *strs(const char *text, const char *substr)
{
size_t substrlen = strlen(substr);
while (*text != 0)
{
if (strncmp(text, substr, substrlen) == 0)
{
return text;
}
++text;
}
return NULL;
}The performance gain from the data being const depends on physical constness and the compiler. Further reading.
About performance gain..
Code Snippets
const char *strs(const char *text, const char *substr)
{
size_t substrlen = strlen(substr);
while (*text != 0)
{
if (strncmp(text, substr, substrlen) == 0)
{
return text;
}
++text;
}
return NULL;
}Context
StackExchange Code Review Q#138074, answer score: 2
Revisions (0)
No revisions yet.