patterncMinor
Weekday+day validation
Viewed 0 times
dayweekdayvalidation
Problem
The string I'd like to check is something like "abcSun24def". If any valid "xxxyy" (xxx = weekday and yy = day) is found, return the position inside the string. If "xxxyy" is not found, return -1.
The code works as desired, but I think it can be optimized.
The code works as desired, but I think it can be optimized.
/* -------------------------------------------------------------
FUNC : findxy (find pattern xxxyy)
xxx = weekday (e.g. "Mon01")
yy = day
roster specific formatting
PARAMS : c (char *), pointer to string
RETURNS : (int), if pattern found, pointer to found pattern in string c
-1 if pattern not found
REMARKS :
---------------------------------------------------------------- */
int findxy(char *c) {
const char *days[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"};
int i, j;
char bufw[4];
char bufd[3];
/* check if c is at least 5 chars long */
if (strlen(c) = 1 && atoi(bufd) <= 31) {
return i;
}
}
}
}
}
return -1;
}Solution
I see some things that may help you improve your code.
Use the required
The code uses
Use
In your
Check for a null pointer
Things do not go well if the routine is passed a
Use better naming
The
Avoid copying if practical
It's not strictly necessary to make copies of portions of the passed string. With a bit of careful planning, it can be done in place. Here's one way to do it, although it's not very efficient:
Use a finite state machine
We can create a much more efficient routine by creating a finite state machine. For a particular candidate string, we note that the first character must be one of {'F', 'M', 'S', 'T', 'W'}. If it is not one of those, then the candidate string can be immediately rejected. Now let's say the first character is 'S'. In that case the second character must be one of {'a', 'u'}. We can proceed like this, one character at a time to create a finite state machine. Here's a visualization of such a state machine:
This is how compiler tools like
A very pedantic note
Strictly speaking, this line:
is guaranteed to be portable. The C standard requires that encoding of digits is contiguous, so this will work with any character encoding, including EBCDIC, Unicode and ASCII.
Use the required
#includesThe code uses
strlen and memcpy which means that it should #include . It was not difficult to infer, but it helps reviewers if the code is complete. It's also an important part of the interface. I believe these are the required includes:#include
#include
#include Use
const where practicalIn your
findxy routine, the passed string is never altered, which is just as it should be. You should indicate that fact by declaring it like this:int findxy(const char *c)Check for a null pointer
Things do not go well if the routine is passed a
NULL pointer. On my machine, I get a segmentation fault and a crash. You can eliminate this hole by adding these lines near the top of the routine:if (c == NULL) {
return -1;
}Use better naming
The
days array is well named because it's easy to guess from the name what it contains. Likewise i and j are commonly used as index variables as you have done in this code. However, findxy is a rather cryptic name for what this does and c is a poor name for the passed string. I'd recommend something like this instead:int findWeekdayDate(const char *str)Avoid copying if practical
It's not strictly necessary to make copies of portions of the passed string. With a bit of careful planning, it can be done in place. Here's one way to do it, although it's not very efficient:
int isValidWeekdayDate(const char *str) {
static const char *days[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"};
if (str == NULL || strlen(str) 0 && val <= 31) {
return 1;
}
}
}
}
return 0;
}
int findWeekdayDate(const char *str) {
for (const char *curr = str ; *curr; ++curr) {
if (isValidWeekdayDate(curr)) {
return curr-str;
}
}
return -1;
}Use a finite state machine
We can create a much more efficient routine by creating a finite state machine. For a particular candidate string, we note that the first character must be one of {'F', 'M', 'S', 'T', 'W'}. If it is not one of those, then the candidate string can be immediately rejected. Now let's say the first character is 'S'. In that case the second character must be one of {'a', 'u'}. We can proceed like this, one character at a time to create a finite state machine. Here's a visualization of such a state machine:
This is how compiler tools like
flex and bison and lex and yacc work. The code is very efficient but might not be as easy to understand, so it's a tradeoff you should be aware of. A very pedantic note
Strictly speaking, this line:
int val = (pos[3]-'0') * 10 + (pos[4]-'0');is guaranteed to be portable. The C standard requires that encoding of digits is contiguous, so this will work with any character encoding, including EBCDIC, Unicode and ASCII.
Code Snippets
#include <string.h>
#include <stdlib.h>
#include <ctype.h>int findxy(const char *c)if (c == NULL) {
return -1;
}int findWeekdayDate(const char *str)int isValidWeekdayDate(const char *str) {
static const char *days[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"};
if (str == NULL || strlen(str) < 5) {
return 0;
}
for (const char **dayname = days; *dayname; ++dayname) {
char *pos = strstr(str, *dayname);
if (pos && pos == str) {
if (isdigit(pos[3]) && isdigit(pos[4])) {
int val = (pos[3]-'0') * 10 + (pos[4]-'0');
if (val > 0 && val <= 31) {
return 1;
}
}
}
}
return 0;
}
int findWeekdayDate(const char *str) {
for (const char *curr = str ; *curr; ++curr) {
if (isValidWeekdayDate(curr)) {
return curr-str;
}
}
return -1;
}Context
StackExchange Code Review Q#150587, answer score: 6
Revisions (0)
No revisions yet.