patterncMinor
String tokenizing with arbitrary token count in C
Viewed 0 times
tokenizingwitharbitrarytokencountstring
Problem
I am building a small program for input handling so I can conveniently parse lines and tokenize strings. The token handling works right now but I have to malloc a lot in the main() to make it work. Sadly I can not malloc everything at once because then the strings inside will have NULL pointers. Maybe someone more experienced with C can help me improve this:
#include
int get_tokens(char *line, char **tokens, size_t number_of_tokens) {
// Make a copy of line to prevent it from being changed
char *copy = malloc(strlen(line) + 1);
if (copy == NULL) {
perror("Could not allocate memory!");
exit(EXIT_FAILURE);
}
strcpy(copy, line);
char *delimiter = " ";
char *token = strtok(copy, delimiter);
if (token) {
strcpy(*tokens, token);
size_t tokens_found = 1;
while ((token = strtok(NULL, delimiter)) && tokens_found < number_of_tokens) {
strcpy(*(tokens + tokens_found), token);
++tokens_found;
}
} else {
printf("No tokens found! The supplied string was empty.");
return EXIT_FAILURE;
}
free(copy);
return EXIT_SUCCESS;
}
int main() {
// this will also be the buffer length for the tokens
size_t line_length = 10;
char line[line_length];
get_line(line, line_length);
// Above function works just as the name implies
size_t tokens_wanted = 2;
char **tokens = malloc(tokens_wanted * sizeof(char**));
for (size_t j = 0; j < tokens_wanted; ++j) {
tokens[j] = malloc(sizeof(char) * line_length);
}
get_tokens(line, tokens, tokens_wanted);
// process the tokens
// free the elements of tokens
free(tokens);
return EXIT_SUCCESS;
}Solution
I see some things that I think could help you improve your code.
Use the required
The code uses
Don't leak memory
This code calls
Use
The comment about making a copy of
Also, the
Consider a different interface
In the current code the return value of
This works nicely and is much simpler. If fewer than the requested number of tokens is found, the remaining
Omit
When a C or C++ program reaches the end of
Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the
For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C99 and C++98) have maintained the same idea. We rely on automatically generated member functions in C++, and few people write explicit
So I advocate omitting it; others disagree (often vehemently!) In any case, if you encounter code that omits it, you'll know that it's explicitly supported by the standard and you'll know what it means.
Use the required
#includesThe code uses
malloc which means that it should #include . It also uses strtok which is in string.h.Don't leak memory
This code calls
malloc several places but doesn't match each one with a call to free. This means that the routines are leaking memory. Specifically, just before the call to free(tokens); in main, you need to also free the individually allocated memory:for (size_t j = 0; j < tokens_wanted; ++j) {
free(tokens[j]);
}Use
const where practicalThe comment about making a copy of
line to prevent it from being changed suggests that it should be declared const in the parameter list: int get_tokens(const char *line, char **tokens, size_t number_of_tokens) {Also, the
delimiter string should also be const and probably also static.Consider a different interface
In the current code the return value of
get_tokens isn't used. It also prints and exits the program in some circumstances. Instead, I'd advice changing the interface in a couple of ways. First, I'd recommend that the caller make a copy of the string if necessary. This makes the code cleaner and simpler and allows the caller to perhaps do something other than simply abruptly halting the program if an empty string is found. Second, I'd suggest that because strtok operates by modifying the passed string, there really isn't a need to allocate more memory and copy. Instead, I'd recommend that the pointers returned can simply point to the passed string. The much-simplified version might look like this:void get_tokens(char *line, char **tokens, size_t number_of_tokens) {
static const char *delimiter = " ";
for (size_t i = 0; i < number_of_tokens; ++i) {
tokens[i] = strtok(line, delimiter);
line = NULL;
}
}This works nicely and is much simpler. If fewer than the requested number of tokens is found, the remaining
tokens pointers are set to NULL. The calling function might look like this:int main() {
const size_t line_length = 100;
char line[line_length];
get_line(line, line_length);
size_t tokens_wanted = 2;
char **tokens = malloc(tokens_wanted * sizeof(char**));
if (tokens == NULL) {
perror("out of memory!");
return 1;
}
get_tokens(line, tokens, tokens_wanted);
for (size_t i=0; i < tokens_wanted; ++i) {
printf("token[%lu] = \"%s\"\n", i, tokens[i]);
}
free(tokens);
}Omit
return 0When a C or C++ program reaches the end of
main the compiler will automatically generate code to return 0, so there is no need to put return 0; explicitly at the end of main. Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the
main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0.For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C99 and C++98) have maintained the same idea. We rely on automatically generated member functions in C++, and few people write explicit
return; statements at the end of a void function. Reasons against omitting seem to boil down to "it looks weird". If, like me, you're curious about the rationale for the change to the C standard read this question. Also note that in the early 1990s this was considered "sloppy practice" because it was undefined behavior (although widely supported) at the time. So I advocate omitting it; others disagree (often vehemently!) In any case, if you encounter code that omits it, you'll know that it's explicitly supported by the standard and you'll know what it means.
Code Snippets
for (size_t j = 0; j < tokens_wanted; ++j) {
free(tokens[j]);
}int get_tokens(const char *line, char **tokens, size_t number_of_tokens) {void get_tokens(char *line, char **tokens, size_t number_of_tokens) {
static const char *delimiter = " ";
for (size_t i = 0; i < number_of_tokens; ++i) {
tokens[i] = strtok(line, delimiter);
line = NULL;
}
}int main() {
const size_t line_length = 100;
char line[line_length];
get_line(line, line_length);
size_t tokens_wanted = 2;
char **tokens = malloc(tokens_wanted * sizeof(char**));
if (tokens == NULL) {
perror("out of memory!");
return 1;
}
get_tokens(line, tokens, tokens_wanted);
for (size_t i=0; i < tokens_wanted; ++i) {
printf("token[%lu] = \"%s\"\n", i, tokens[i]);
}
free(tokens);
}Context
StackExchange Code Review Q#146862, answer score: 3
Revisions (0)
No revisions yet.