HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

String parsing in C

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
parsingstringstackoverflow

Problem

This is supposed to be strict ANSI C89 pedantic code. It should extract word1, word2 and word3 from a string formatted [ word1 ] word2 [ word3 ] and return failure in any other format.

It seems to work, but it seems so ugly. No need to comment about the fact GetTokenBetweenSquareBraces and GetTokenBtweenOpositeSquareBraces are duplicates.

I would love some tips on how to clean this up.

```
#include
#include
#include

char TrimWhiteSpaces(char str) {
char *out = str;
int i;
int len = strlen(str);
for (i=0; i=0 && isspace(str[i]); str[i]=0, i--);/scan backward/
return out;
}
char GetTokenBetweenSquareBraces(char input, char **output, int * output_size) {
char *p = TrimWhiteSpaces(input);
*output_size=0;
if (p[0] == '[')
{
*output = TrimWhiteSpaces(p + 1);
do
{
(*output_size)++;
}while((output)[output_size] != ']' && isalnum((output)[output_size]));
}
else
{
return NULL;
}
return (output) + output_size;
}
char GetTokenBtweenOpositeSquareBraces(char input, char **output, int * output_size) {
char *p = TrimWhiteSpaces(input);
*output_size=0;
if (p[0] == ']')
{
*output = TrimWhiteSpaces(p + 1);
do
{
(*output_size)++;
}while((output)[output_size] != '[' && isalnum((output)[output_size]));
}
else
{
return NULL;
}
return (output) + output_size;
}
int GetWords(char str,char word1,char word2,char word3)
{
char next=NULL,output=NULL;

int outputsize;
printf ("\nSplitting string \"%s\" into tokens:\n",str);

next = GetTokenBetweenSquareBraces (str,&output,&outputsize);
strncpy(word1,output,outputsize);
word1[outputsize] = '\0';
strcpy(word1,TrimWhiteSpaces(word1));
if(!next) return 0;

next = GetTokenBtweenOpositeSquareBraces (next,&output

Solution

#include 
#include 
#include 

char * TrimWhiteSpaces(char *str) {
    char *out = str;
    int i;
    int len = strlen(str);
    for (i=0; i<len && isspace(str[i]); i++, out++); /*scan forward*/


I'd at least have a body with a comment in it here. Its easy to miss that semicolon. I don't think you need the i < len test. The 0 at the end of the string should fail the isspace test and so you don't need to check for the length as well. It also doesn't really make sense to keep track of i. Instead, just use out.

for (i=len-1; i>=0 && isspace(str[i]); str[i]=0, i--);/*scan backward*/


It isn't really necessary to set all those spaces to 0. Overall, you are doing to much work in that one line. You should at least only do the 0 setting inside the loop body because it has nothing to do with the loop control.

return out;


Generally, its best to either modify your parameters or return new ones. Don't do both. Here you return a new string pointer and modify the original string.

}
char * GetTokenBetweenSquareBraces(char * input, char **output, int * output_size) {
    char *p = TrimWhiteSpaces(input);
    *output_size=0;
    if (p[0] == '[')
    {
        *output = TrimWhiteSpaces(p + 1);
        do       
        {                
            (*output_size)++;        
        }while((*output)[*output_size] != ']' && isalnum((*output)[*output_size]));


] isn't a number or a letter. You don't need both of these tests.

}
    else
    {
        return NULL;
    }
    return (*output) + *output_size;
}

char * GetTokenBtweenOpositeSquareBraces(char * input, char **output, int * output_size)        {
    char *p = TrimWhiteSpaces(input);
    *output_size=0;
    if (p[0] == ']')
    {
        *output = TrimWhiteSpaces(p + 1);
        do       
        {                
            (*output_size)++;        
        }while((*output)[*output_size] != '[' && isalnum((*output)[*output_size]));
    }
    else
    {
        return NULL;
    }
    return (*output) + *output_size;
}


Deja Vu! This is almost exactly the same as the previous function. Only the bracket directions have been reversed. It seems that you should be able to share that code.

int GetWords(char * str,char * word1,char * word2,char * word3)
{
    char * next=NULL,*output=NULL;

    int outputsize;
    printf ("\nSplitting string \"%s\" into tokens:\n",str);


Generally I recommend against having your working functions doing any output. Also odd choice of where to put newlines.

next = GetTokenBetweenSquareBraces (str,&output,&outputsize);
    strncpy(word1,output,outputsize);
    word1[outputsize] = '\0';
    strcpy(word1,TrimWhiteSpaces(word1));


Why are you trimming whitespace here? Didn't you already do that. You are doing a lot of work to copy the text. Maybe that's something that GetTokenBetweenSquareBraces should have done?

if(!next) return 0;

    next = GetTokenBtweenOpositeSquareBraces (next,&output,&outputsize);
    strncpy(word2,output,outputsize);
    word2[outputsize] = '\0';
    strcpy(word2,TrimWhiteSpaces(word2));

    if(!next) return 0;


Deja Vu!

next = GetTokenBetweenSquareBraces (next,&output,&outputsize);
    strncpy(word3,output,outputsize);
    word3[outputsize] = '\0';
    strcpy(word3,TrimWhiteSpaces(word3));

    if(!next) return 0;


Deja Vu!

return 1;
}
void TestGetWords(char * str )
{
    char word1[20],word2[20],word3[20];


Your code isn't careful to make sure that you don't overflow these variables. You may want to do something about that

if (GetWords(str,word1,word2,word3))
    {
        printf("|%s|%s|%s|\n",word1,word2,word3);
    }
    else
    {
        printf("3ViLLLL\n");
    }

}
int main (void)
{
    char str[] ="[  hello   ]  gfd   [ hello2 ] ";
    char str2[] ="[ hello   [  gfd   [ hello2 ] ";
    char str3[] ="the wie321vg42g42g!@#";
    char str4[] ="][123[]23][231[";

    TestGetWords(str);
    TestGetWords(str2);
    TestGetWords(str3);
    TestGetWords(str4);


For purposes of automated testing, its actually better if you provide the correct answer and check against that in code. That way the program will tell you when its wrong.

getchar();
    return 1;


0 is used to indicate a successful program run.

}


Overall, your program is ugly because you are using the wrong vocabulary. You've taken the vocabulary as given instead of defining the vocabulary that made the task easy to describe.
Here is my approach to your problem

``
char Whitespace(char str)
/*
This function return the
str pointer incremented past any whitespace.
*/
{
/* when an error occurs, we return NULL.
If an error has already occurred, just pass it on */
if(!str) return str;

while(isspace(*str))
{
str++;
}
return str;
}

char Character(char str, char c)
/*
This function tries to match a specific character.
It returns
str` incremented past the character
or NULL if the characte

Code Snippets

#include <stdio.h>
#include <string.h>
#include <ctype.h>

char * TrimWhiteSpaces(char *str) {
    char *out = str;
    int i;
    int len = strlen(str);
    for (i=0; i<len && isspace(str[i]); i++, out++); /*scan forward*/
for (i=len-1; i>=0 && isspace(str[i]); str[i]=0, i--);/*scan backward*/
return out;
}
char * GetTokenBetweenSquareBraces(char * input, char **output, int * output_size) {
    char *p = TrimWhiteSpaces(input);
    *output_size=0;
    if (p[0] == '[')
    {
        *output = TrimWhiteSpaces(p + 1);
        do       
        {                
            (*output_size)++;        
        }while((*output)[*output_size] != ']' && isalnum((*output)[*output_size]));
}
    else
    {
        return NULL;
    }
    return (*output) + *output_size;
}

char * GetTokenBtweenOpositeSquareBraces(char * input, char **output, int * output_size)        {
    char *p = TrimWhiteSpaces(input);
    *output_size=0;
    if (p[0] == ']')
    {
        *output = TrimWhiteSpaces(p + 1);
        do       
        {                
            (*output_size)++;        
        }while((*output)[*output_size] != '[' && isalnum((*output)[*output_size]));
    }
    else
    {
        return NULL;
    }
    return (*output) + *output_size;
}

Context

StackExchange Code Review Q#10031, answer score: 5

Revisions (0)

No revisions yet.