patterncMajor
Program that removes comments from C and C++ programs
Viewed 0 times
removesprogramthatprogramsandfromcomments
Problem
I have downloaded the latest version of the Linux kernel and I want to perform thorough analysis on it. I want to start by eliminating comments from all the source and header files that belong to it. I am using a code I wrote sometime ago and I want to use it recursively. In the past I tested it on my code and it worked fine, but since I want to use it on a wide scale, I want to know if the code is consistent and successful in all cases.
Can you help me determine if it has bugs that do not deal with removing comments and create mess in certain cases?
```
#include
#include
#include
int RemoveComments(const char filename, const char Output);
int main(void)
{
RemoveComments("file.cpp", "output.cpp");
return 0;
}
int RemoveComments(const char filename, const char Output)
{
#define NOT_COMMENT (!DOUBLESLASH_Comment && !ASTERISK_SLASH_Comment)
clock_t t1 = clock();
FILE input, output;
if (fopen_s(&input, filename, "r"))
{
perror(filename);
exit(1);
}
if (fopen_s(&output, Output, "w"))
{
perror(Output);
exit(1);
}
int c, d;
bool DOUBLESLASH_Comment = 0, ASTERISK_SLASH_Comment = 0, flag = 0;
int s_QUOTED = 0, d_QUOTED = 0;
while ((c = getc(input)) != EOF)
{
switch (c)
{
case '\\':
{
if (NOT_COMMENT)
{
if (flag == 1)
flag = 0;
else
flag = 1;
}
}break;
case '\'':
{
if (NOT_COMMENT && !d_QUOTED)
{
if (!flag)
{
s_QUOTED++;
}
}
}break;
case '"':
{
if (NOT_COMMENT && !flag)
{
if (!s_QUOTED)
{
d_QUOTED++;
}
}
}break;
case '/':
{
if (NOT_COMMENT && !d
Can you help me determine if it has bugs that do not deal with removing comments and create mess in certain cases?
```
#include
#include
#include
int RemoveComments(const char filename, const char Output);
int main(void)
{
RemoveComments("file.cpp", "output.cpp");
return 0;
}
int RemoveComments(const char filename, const char Output)
{
#define NOT_COMMENT (!DOUBLESLASH_Comment && !ASTERISK_SLASH_Comment)
clock_t t1 = clock();
FILE input, output;
if (fopen_s(&input, filename, "r"))
{
perror(filename);
exit(1);
}
if (fopen_s(&output, Output, "w"))
{
perror(Output);
exit(1);
}
int c, d;
bool DOUBLESLASH_Comment = 0, ASTERISK_SLASH_Comment = 0, flag = 0;
int s_QUOTED = 0, d_QUOTED = 0;
while ((c = getc(input)) != EOF)
{
switch (c)
{
case '\\':
{
if (NOT_COMMENT)
{
if (flag == 1)
flag = 0;
else
flag = 1;
}
}break;
case '\'':
{
if (NOT_COMMENT && !d_QUOTED)
{
if (!flag)
{
s_QUOTED++;
}
}
}break;
case '"':
{
if (NOT_COMMENT && !flag)
{
if (!s_QUOTED)
{
d_QUOTED++;
}
}
}break;
case '/':
{
if (NOT_COMMENT && !d
Solution
I see some things that may help you improve your code.
Use all required
The program uses the
Use required
As noted in http://en.cppreference.com/w/c/io/fopen :
As all bounds-checked functions, fopen_s is only guaranteed to be available if STDC_LIB_EXT1 is defined by the implementation and if the user defines STDC_WANT_LIB_EXT1 to the integer constant 1 before including .
Because it's an optional part of the C11 standard, it's also non-portable in a strict sense. For portability, you may wish instead to use the standard
Think of the user
There are a few user-oriented things I think could be improved. First, it would probably be better to have the filenames passed in via the command line rather than being hardcoded, and second, it would probably be good to allow the user to specify whether the timing is displayed or not.
Fix the bug
There is a problem with the parser at the moment. If we have lines like these:
the code should remove both comments, but at the moment, it only removes the first one.
Use better variable names
Variables named
Consider using named states
It would likely make the code easier to write correctly with named states. One convenient way to do that is to use an
Use one line per declaration
The code currently includes lines like this:
It makes things a little bit harder to read than if you put them on separate lines like this:
Additionally, while it's not strictly incorrect, it helps the reader if the code uses
Simplify your code
The code currently includes this compound
This can be simplified:
Use braces appropriately
In a number of cases, as with the
Return something useful from functions
The
Omit
When a C or C++ program reaches the end of
Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the
For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C99 and C++98) have maintained the same idea. We rely on automatically generated member functions in C++, and few people write explicit
So I advocate omitting it; others disagree (often vehemently!) In any case, if you encounter code that omits it, you'll know that it's explicitly supported by the standard and you'll know what it means.
Use all required
#includesThe program uses the
bool type and assert() but doesn't have the associated required includes. Fix that by adding this:#include
#include Use required
#definesAs noted in http://en.cppreference.com/w/c/io/fopen :
As all bounds-checked functions, fopen_s is only guaranteed to be available if STDC_LIB_EXT1 is defined by the implementation and if the user defines STDC_WANT_LIB_EXT1 to the integer constant 1 before including .
Because it's an optional part of the C11 standard, it's also non-portable in a strict sense. For portability, you may wish instead to use the standard
fopen and do your own error checking instead.Think of the user
There are a few user-oriented things I think could be improved. First, it would probably be better to have the filenames passed in via the command line rather than being hardcoded, and second, it would probably be good to allow the user to specify whether the timing is displayed or not.
Fix the bug
There is a problem with the parser at the moment. If we have lines like these:
/* comment one */// comment twothe code should remove both comments, but at the moment, it only removes the first one.
Use better variable names
Variables named
s_QUOTED and d_QUOTED are OK because their names suggest their meanings, but c, d and flag are not good names.Consider using named states
It would likely make the code easier to write correctly with named states. One convenient way to do that is to use an
enum. enum { INCODE, INQUOTE, INDBLQUOTE, INCOMMENT, GOTSLASH } state = INCODE;Use one line per declaration
The code currently includes lines like this:
bool DOUBLESLASH_Comment = 0, ASTERISK_SLASH_Comment = 0, flag = 0;It makes things a little bit harder to read than if you put them on separate lines like this:
bool DOUBLESLASH_Comment = 0;
bool ASTERISK_SLASH_Comment = 0;
bool flag = 0;Additionally, while it's not strictly incorrect, it helps the reader if the code uses
false and true rather than numeric values for boolean variables.Simplify your code
The code currently includes this compound
if statement:if (flag == 1)
flag = 0;
else
flag = 1;This can be simplified:
flag = !flag;Use braces appropriately
In a number of cases, as with the
if statement mentioned above, curly braces ({}) were not used, but they are used for each case statement. In fact, they're not needed in either case, but I'd recommend using braces for constructs such as for, while and if and omitting them for case.Return something useful from functions
The
RemoveComments function is currently written to return int but the only value it ever returns is 0. I'd recommend either returning something useful, such as, for instance a number of comments removed, or to change it to a void function.Omit
return 0When a C or C++ program reaches the end of
main the compiler will automatically generate code to return 0, so there is no need to put return 0; explicitly at the end of main. Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the
main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0.For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C99 and C++98) have maintained the same idea. We rely on automatically generated member functions in C++, and few people write explicit
return; statements at the end of a void function. Reasons against omitting seem to boil down to "it looks weird". If, like me, you're curious about the rationale for the change to the C standard read this question. Also note that in the early 1990s this was considered "sloppy practice" because it was undefined behavior (although widely supported) at the time. So I advocate omitting it; others disagree (often vehemently!) In any case, if you encounter code that omits it, you'll know that it's explicitly supported by the standard and you'll know what it means.
Code Snippets
#include <stdbool.h>
#include <assert.h>/* comment one */// comment twoenum { INCODE, INQUOTE, INDBLQUOTE, INCOMMENT, GOTSLASH } state = INCODE;bool DOUBLESLASH_Comment = 0, ASTERISK_SLASH_Comment = 0, flag = 0;bool DOUBLESLASH_Comment = 0;
bool ASTERISK_SLASH_Comment = 0;
bool flag = 0;Context
StackExchange Code Review Q#139267, answer score: 25
Revisions (0)
No revisions yet.