patterncMinor
Small part of an INI parser
Viewed 0 times
parserpartsmallini
Problem
I know there are probably thousands of INI parser implementations in C.
Nevertheless I want to implement one to learn the art of parsing grammar.
I just wrote a small piece of the INI parser and wanted to ask if I am on the right track:
``
//dbg("Section name is: %s", section_name);
*name = section_name;
// and then free() it somewhere
pos += length_of_section + 1;
assert(pos < length);
found_section = true;
} else if (current == ']') {
dbg("Found unmatched ] at position %zu", pos);
return;
} else if (current == ';') {
//dbg("End of line, bye!");
return;
} else {
dbg("Unexpected character %c", current);
return;
}
Nevertheless I want to implement one to learn the art of parsing grammar.
I just wrote a small piece of the INI parser and wanted to ask if I am on the right track:
``
void parse_line(const char *str, char **name) {
size_t length = strlen(str);
bool found_section = false;
for (size_t pos = 0; pos = length;
//dbg("Processing %c", current);
if (current == '[') {
size_t updated_pos = pos + 1;
ssize_t match_pos = -1;
if (is_last_ch) {
dbg("Reached end of line without closing ] (1)!");
return;
}
if (found_section) {
dbg("Already found a section!");
return;
}
//dbg("Found a [ at position %zu, looking for ] now", pos);
for (size_t x = updated_pos; x = updated_pos);
size_t length_of_section = match_pos - updated_pos;
//dbg("Length of name is %zu", length_of_section);
char *section_name = malloc(length_of_section + 1);
assert(section_name);
for (size_t x = 0; x < length_of_section; ++x) {
size_t i = x + updated_pos;
section_name[x] = str[i];
}
section_name[length_of_section] = 0;
// utilize section_name` in some way...//dbg("Section name is: %s", section_name);
*name = section_name;
// and then free() it somewhere
pos += length_of_section + 1;
assert(pos < length);
found_section = true;
} else if (current == ']') {
dbg("Found unmatched ] at position %zu", pos);
return;
} else if (current == ';') {
//dbg("End of line, bye!");
return;
} else {
dbg("Unexpected character %c", current);
return;
}
Solution
Here are some things that may help you improve your program.
Use a state machine
The approach you currently have uses
For that reason, I'd recommend a more data-driven approach in which you write out a state machine and have the code execute according to the data within it rather than via a code-driven approach. This makes things more compact and easier to read.
An example of such an implementation for a parser is in this answer.
Consider using regular expressions
I've given this advice before. It's a lot easier in C++ because there's library support within the standard, but since it appears you're using POSIX-based libraries, you may already have a regular expression library available to you.
Consider using compiler tools
Another approach to consider is to use standard tools such as
Don't comment out debug statements
Just as with
Don't leak memory
There is, potentially, a
Avoid
Because you're initializing
Use a state machine
The approach you currently have uses
bool variables keeping track things such as whether a section has been found, whether a character is the last one and weaves the interpretation int a long series of nested if statements. That approach may work for small, simple grammars such as this, but it's not going to scale well as you move on to a larger set of things your program can parse. For that reason, I'd recommend a more data-driven approach in which you write out a state machine and have the code execute according to the data within it rather than via a code-driven approach. This makes things more compact and easier to read.
An example of such an implementation for a parser is in this answer.
Consider using regular expressions
I've given this advice before. It's a lot easier in C++ because there's library support within the standard, but since it appears you're using POSIX-based libraries, you may already have a regular expression library available to you.
Consider using compiler tools
Another approach to consider is to use standard tools such as
lex and yacc (or flex and bison) to create a much more readable and maintainable version of the parser. These tools have a steep learning curve, but once you've mastered them, you'll be better able to see and understand how to write unambiguous parsers from scratch as well. IMHO, this is time well spent.Don't comment out debug statements
Just as with
assert, I'd suggest that creating a version of dbg that either expands to nothing (for production code) or to a routine that prints debugging messages (for development) may be very useful. It's much easier than commenting and uncommenting such debugging statements during the debugging process.Don't leak memory
There is, potentially, a
malloc within the code but no corresponding free. I assume that's because it's intended that's because the caller is supposed to free it if it's allocated, but the problem is that the routine doesn't set name to NULL if it doesn't happen to allocate memory, making it incumbent on the caller to also pre-set the pointer to NULL. This would be easier to use if, instead, the pointer were a return value that would either be NULL or a pointer that the caller is expected to free.Avoid
ssize_t if possibleBecause you're initializing
match_pos with -1, I see why you've chosen to make it of type ssize_t (which is signed) versus size_t which might not be signed. However, this limits portability and there's no reason you couldn't use another sentinel value instead, such as SIZE_MAXContext
StackExchange Code Review Q#145881, answer score: 7
Revisions (0)
No revisions yet.