HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Small part of an INI parser

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
parserpartsmallini

Problem

I know there are probably thousands of INI parser implementations in C.

Nevertheless I want to implement one to learn the art of parsing grammar.

I just wrote a small piece of the INI parser and wanted to ask if I am on the right track:

``
void parse_line(const char *str, char **name) {
size_t length = strlen(str);
bool found_section = false;

for (size_t pos = 0; pos = length;

//dbg("Processing %c", current);

if (current == '[') {
size_t updated_pos = pos + 1;
ssize_t match_pos = -1;

if (is_last_ch) {
dbg("Reached end of line without closing ] (1)!");

return;
}

if (found_section) {
dbg("Already found a section!");

return;
}

//dbg("Found a [ at position %zu, looking for ] now", pos);

for (size_t x = updated_pos; x = updated_pos);

size_t length_of_section = match_pos - updated_pos;

//dbg("Length of name is %zu", length_of_section);

char *section_name = malloc(length_of_section + 1);
assert(section_name);

for (size_t x = 0; x < length_of_section; ++x) {
size_t i = x + updated_pos;

section_name[x] = str[i];
}

section_name[length_of_section] = 0;

// utilize
section_name` in some way...
//dbg("Section name is: %s", section_name);
*name = section_name;
// and then free() it somewhere

pos += length_of_section + 1;

assert(pos < length);

found_section = true;
} else if (current == ']') {
dbg("Found unmatched ] at position %zu", pos);

return;
} else if (current == ';') {
//dbg("End of line, bye!");

return;
} else {
dbg("Unexpected character %c", current);

return;
}

Solution

Here are some things that may help you improve your program.

Use a state machine

The approach you currently have uses bool variables keeping track things such as whether a section has been found, whether a character is the last one and weaves the interpretation int a long series of nested if statements. That approach may work for small, simple grammars such as this, but it's not going to scale well as you move on to a larger set of things your program can parse.

For that reason, I'd recommend a more data-driven approach in which you write out a state machine and have the code execute according to the data within it rather than via a code-driven approach. This makes things more compact and easier to read.

An example of such an implementation for a parser is in this answer.

Consider using regular expressions

I've given this advice before. It's a lot easier in C++ because there's library support within the standard, but since it appears you're using POSIX-based libraries, you may already have a regular expression library available to you.

Consider using compiler tools

Another approach to consider is to use standard tools such as lex and yacc (or flex and bison) to create a much more readable and maintainable version of the parser. These tools have a steep learning curve, but once you've mastered them, you'll be better able to see and understand how to write unambiguous parsers from scratch as well. IMHO, this is time well spent.

Don't comment out debug statements

Just as with assert, I'd suggest that creating a version of dbg that either expands to nothing (for production code) or to a routine that prints debugging messages (for development) may be very useful. It's much easier than commenting and uncommenting such debugging statements during the debugging process.

Don't leak memory

There is, potentially, a malloc within the code but no corresponding free. I assume that's because it's intended that's because the caller is supposed to free it if it's allocated, but the problem is that the routine doesn't set name to NULL if it doesn't happen to allocate memory, making it incumbent on the caller to also pre-set the pointer to NULL. This would be easier to use if, instead, the pointer were a return value that would either be NULL or a pointer that the caller is expected to free.

Avoid ssize_t if possible

Because you're initializing match_pos with -1, I see why you've chosen to make it of type ssize_t (which is signed) versus size_t which might not be signed. However, this limits portability and there's no reason you couldn't use another sentinel value instead, such as SIZE_MAX

Context

StackExchange Code Review Q#145881, answer score: 7

Revisions (0)

No revisions yet.