HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Splitting a command line in C

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
commandsplittingline

Problem

Is this a good C program for splitting a command line without doing any expansion on it? Don't worry too much about main() and the output -- those are for testing.

```
#include
#include
#include
#include
#include
void pullwhitespace(char **input);
int main (int argc, char **argv)
{
if (argc != 2) {
fputs("Must have exactly one command line argument\n", stderr);
return 64;
}
char *input = argv[1];
char *outstring = malloc(strlen(argv[1]) + 1);
char *outptr = outstring;
char a;
pullwhitespace(&input);
NORMAL:
/ We are in an unquoted part of the input /
switch ((a = *input++))
{
case ' ' :
case '\t':
*outptr++ = '\0';
pullwhitespace(&input);
goto NORMAL;
case '\n':
case '\0':
goto DONE;
case '\\':
if (((a = *input++)) == '\n') {
break;
} else if (a) {
*outptr++ = a;
goto NORMAL;
} else {
goto FAIL;
}
case '\'':
goto QUOTE;
case '"':
goto DQUOTE;
default:
*outptr++ = a;
goto NORMAL;
}
assert(0);
QUOTE:
/ We are in a single quoted string /
while (((a = *input++)) != '\'') {
if (a) {
*outptr++ = a;
} else {
goto FAIL;
}
}
goto NORMAL;
DQUOTE:
/ We are in a double quoted string /
switch ((a = *input++))
{
case '\0':
goto FAIL;
case '\\':
{
char b = *input++;
switch (b)
{
case '\0':
goto FAIL;

Solution

I doubt that this code actually works. Especially pullwhitespace doesn't do what you probably intended (hint: it doesn't modify the argument, it makes a copy of the pointer and advances it). Have you tested it before submitting it for review here?

Anyway: You might want to have a look here for alternative ways to implement a state machine. You approach is similar to Remo.D's answer in the linked question but I think his macros make it more readable.

A few things I noticed:

-
When you are in NORMAL and read a white space you do this *outptr++ = '\0'; Given that strings in C are NULL terminated character sequences this seems dodgy as it will terminate the output right there unless you tell someone to read further by supplying a length. If you intend to split on white space then you should store the results in a char ** like argv.

-
Again in NORMAL:

case '\\':
        if (((a = *input++)) == '\n') {
                break;
        } else if (a) {
                *outptr++ = a;
                goto NORMAL;
        } else {
                goto FAIL;
        }


So if you find a \ followed by a newline you break the switch which is followed by an assert(0). Normally assert should be used to state invariants as "this function expects this condition to be true and it is probably a bug in the program if it is not". Specifically assert(0) should be used as "should never execute this code path". It seems weird that external user input can trigger an assert just because the program doesn't like he input. Especially since you have a FAIL state anyway.

Also the else condition evaluation depends on the side effect of the if condition evaluation - not very nice. You should read the next character first in a separate line and then evaluate. This makes it much clearer. So this case could probably have been written as:

case '\\':
        a = *input++;
        if (a && a != '\n') {
               *outptr++ = a;
               goto NORMAL;
        }
        goto FAIL;


-
If you are in DQUOTE and encounter a \ then you read another character from the input. However that character simply gets swallowed because in the default case you do this: *outptr++ = a; which will write the \ to the output and skip the current character (held in b).

Code Snippets

case '\\':
        if (((a = *input++)) == '\n') {
                break;
        } else if (a) {
                *outptr++ = a;
                goto NORMAL;
        } else {
                goto FAIL;
        }
case '\\':
        a = *input++;
        if (a && a != '\n') {
               *outptr++ = a;
               goto NORMAL;
        }
        goto FAIL;

Context

StackExchange Code Review Q#37341, answer score: 2

Revisions (0)

No revisions yet.