snippetcMinor
Using state transitions to filter C comments
Viewed 0 times
transitionsfilterusingstatecomments
Problem
This is my second attempt at K&R 1-23,
Write a program to remove all comments from a C program. Don't forget
to handle quoted strings and character constants properly. C comments
don't nest.
I was previously packing characters into various io buffers, and someone suggested this was not a good choice. Thought I'd try something more along the lines of a state machine:
```
#include
#define NORMAL 0
#define SINGLE_QUOTE 1
#define DOUBLE_QUOTE 2
#define SLASH 3
#define MULTI_COMMENT 4
#define INLINE_COMMENT 5
#define STAR 6
int state_from_normal(char prev_symbol, char symbol)
{
int state = NORMAL;
if (symbol == '\'' && prev_symbol != '\\') {
state = SINGLE_QUOTE;
} else if (symbol == '"') {
state = DOUBLE_QUOTE;
} else if (symbol == '/') {
state = SLASH;
}
return state;
}
int state_from_single_quote(char prev_symbol, char symbol)
{
int state = SINGLE_QUOTE;
if (symbol == '\'' && prev_symbol != '\\') {
state = NORMAL;
}
return state;
}
int state_from_double_quote(char prev_symbol, char symbol)
{
int state = DOUBLE_QUOTE;
if (symbol == '"' && prev_symbol != '\\') {
state = NORMAL;
}
return state;
}
int state_from_slash(char symbol)
{
int state = SLASH;
if (symbol == '*') {
state = MULTI_COMMENT;
} else if (symbol == '/') {
state = INLINE_COMMENT;
} else {
state = NORMAL;
}
return state;
}
int state_from_multi_comment(char symbol)
{
int state = MULTI_COMMENT;
if (symbol == '*') {
state = STAR;
}
return state;
}
int state_from_star(char symbol)
{
int state = STAR;
if (symbol == '/') {
state = NORMAL;
} else {
state = MULTI_COMMENT;
}
return state;
}
int state_from_inline_comment(char symbol)
{
int state = INLINE_COMMENT;
if (symbol == '\n') {
state = NORMAL;
}
return state;
}
int state_
Write a program to remove all comments from a C program. Don't forget
to handle quoted strings and character constants properly. C comments
don't nest.
I was previously packing characters into various io buffers, and someone suggested this was not a good choice. Thought I'd try something more along the lines of a state machine:
```
#include
#define NORMAL 0
#define SINGLE_QUOTE 1
#define DOUBLE_QUOTE 2
#define SLASH 3
#define MULTI_COMMENT 4
#define INLINE_COMMENT 5
#define STAR 6
int state_from_normal(char prev_symbol, char symbol)
{
int state = NORMAL;
if (symbol == '\'' && prev_symbol != '\\') {
state = SINGLE_QUOTE;
} else if (symbol == '"') {
state = DOUBLE_QUOTE;
} else if (symbol == '/') {
state = SLASH;
}
return state;
}
int state_from_single_quote(char prev_symbol, char symbol)
{
int state = SINGLE_QUOTE;
if (symbol == '\'' && prev_symbol != '\\') {
state = NORMAL;
}
return state;
}
int state_from_double_quote(char prev_symbol, char symbol)
{
int state = DOUBLE_QUOTE;
if (symbol == '"' && prev_symbol != '\\') {
state = NORMAL;
}
return state;
}
int state_from_slash(char symbol)
{
int state = SLASH;
if (symbol == '*') {
state = MULTI_COMMENT;
} else if (symbol == '/') {
state = INLINE_COMMENT;
} else {
state = NORMAL;
}
return state;
}
int state_from_multi_comment(char symbol)
{
int state = MULTI_COMMENT;
if (symbol == '*') {
state = STAR;
}
return state;
}
int state_from_star(char symbol)
{
int state = STAR;
if (symbol == '/') {
state = NORMAL;
} else {
state = MULTI_COMMENT;
}
return state;
}
int state_from_inline_comment(char symbol)
{
int state = INLINE_COMMENT;
if (symbol == '\n') {
state = NORMAL;
}
return state;
}
int state_
Solution
In all, this looks like pretty solid code. I have just a few suggestions that may help you improve your code.
Use an
The states are all related and not just standalone constants. For that reason, I'd recommend encapsulating them all in an enum:
Restructure or comment to make the code easier to read
The most significant feature in the code is a state machine. In order to understand a state machine, I typically need to know what's being processed (an input C program), what the states are (which are enumerated) and how the code transitions from state to state. The last bit is the part that is a little tough to decipher. It's probably mostly right, but it's hard to decode. For instance, see how long it takes you to answer the questions, "How does the code enter the
Don't forget about line continuation
The
Don't forget about trigraphs
Many people either don't use or don't know about trigraphs but they exist and, for better or worse, are still part of the language. This affects this particular program because the
Use an
enum for related constantsThe states are all related and not just standalone constants. For that reason, I'd recommend encapsulating them all in an enum:
enum { NORMAL, SINGLE_QUOTE, DOUBLE_QUOTE, SLASH, MULTI_COMMENT, INLINE_COMMENT, STAR } state_e;Restructure or comment to make the code easier to read
The most significant feature in the code is a state machine. In order to understand a state machine, I typically need to know what's being processed (an input C program), what the states are (which are enumerated) and how the code transitions from state to state. The last bit is the part that is a little tough to decipher. It's probably mostly right, but it's hard to decode. For instance, see how long it takes you to answer the questions, "How does the code enter the
SLASH state?", "How does it leave the SLASH state?"Don't forget about line continuation
The
\ character is a line continuation character in C. One effect it can have that affects this program is to continue a single-line comment:// BAD is never defined \
#define BAD 1Don't forget about trigraphs
Many people either don't use or don't know about trigraphs but they exist and, for better or worse, are still part of the language. This affects this particular program because the
??/ is the trigraph for \ which is the line continuation character. Related to the preceding comment, BAD is never defined in this code fragment:// are you surprised??/
#define BAD 1Code Snippets
enum { NORMAL, SINGLE_QUOTE, DOUBLE_QUOTE, SLASH, MULTI_COMMENT, INLINE_COMMENT, STAR } state_e;// BAD is never defined \
#define BAD 1// are you surprised??/
#define BAD 1Context
StackExchange Code Review Q#115149, answer score: 5
Revisions (0)
No revisions yet.