HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

K&R 1-23 Remove all C comments

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
removecommentsall

Problem

From K&R exercise 1-23,


Write a program to remove all comments from a C program. Don't forget
to handle quoted strings and character constants properly. C comments
don't nest.

My solution,

#include 
#include 
#include 

#define MAXLEN  10000   // max input length

int      get_input(char input[], int maxlen);
char * no_comments(char input[], int len);

_Bool comment_start(char input[], int idx);
_Bool   comment_end(char input[], int idx, char *comments, int cdx);

_Bool quote_start(char input[], int idx);
_Bool   quote_end(char input[], int idx);

int main(void)
{
    char input[MAXLEN];
    int len = get_input(input, MAXLEN);
    char *scrubbed = no_comments(input, len);

    printf("%s", scrubbed);
    free(scrubbed);
}

int get_input(char input[], int maxlen)
{
    int i;
    char c;

    for (i = 0; i  1 && comments[1] == '/');
    _Bool old_style = (cdx == 1 && cur == '*') || (cdx > 1 && comments[1] == '*');

    return (new_style && nxt == '\n') || (old_style && cdx > 2 && prev == '*' && cur == '/');
}

_Bool quote_start(char input[], int idx)
{
    return input[idx] == '"' && input[idx-1] != '\'';
}

_Bool quote_end(char input[], int idx)
{
    return input[idx] == '"' && input[idx-1] != '\\';
}

Solution

I recommend using Valgrind as the first step of the self-code review. Here it only reports one error when compiled with gcc on Linux:

$ gcc -O2 main.c -o main -std=c99 -Wall -Wextra  -g
 $ valgrind  --track-origins=yes ./main
 (...)
 ==24100== Conditional jump or move depends on uninitialised value(s)
==24100==    at 0x4009B4: comment_start (main.c:85)
==24100==    by 0x4009B4: no_comments (main.c:54)
==24100==    by 0x40062E: main (main.c:21)
==24100==  Uninitialised value was created by a stack allocation
==24100==    at 0x400620: main (main.c:20)
==24100==
==24100== Conditional jump or move depends on uninitialised value(s)
==24100==    at 0x4009CA: quote_start (main.c:102)
==24100==    by 0x4009CA: no_comments (main.c:62)
==24100==    by 0x40062E: main (main.c:21)
==24100==  Uninitialised value was created by a stack allocation
==24100==    at 0x400620: main (main.c:20)
==24100==
==24100== Conditional jump or move depends on uninitialised value(s)
==24100==    at 0x518C1B2: vfprintf (in /lib64/libc-2.21.so)
==24100==    by 0x51928E8: printf (in /lib64/libc-2.21.so)
==24100==    by 0x400640: main (main.c:23)
==24100==  Uninitialised value was created by a stack allocation
==24100==    at 0x400620: main (main.c:20)
=


The problem lies inside get_input(), exactly in this line:

input[++i] = '\0';


Add printing value of i after each iteration and after the pre-incrementation. You will see something like this:

i == 0
i == 1
i == 2
i == 3
i == 4
i == 5
i == 6
i == 7
i == 8
i == 9
i == 11


Position 10 is not set. However, later on in no_comments() you have a loop:

for (int idx = 0; idx <= len; idx++) {
    if (strlen(comments) == 0) { // outside comment

        if (!quoted && comment_start(input, idx)) {


Input array and index are passed to comment_start() and read there:

return input[idx] == '/' && (nxt == '*' || nxt == '/');


At some point you will read an element of input array that has not been initialized. It's an undefined behavior in C and it's quite dangerous.

Code Snippets

$ gcc -O2 main.c -o main -std=c99 -Wall -Wextra  -g
 $ valgrind  --track-origins=yes ./main
 (...)
 ==24100== Conditional jump or move depends on uninitialised value(s)
==24100==    at 0x4009B4: comment_start (main.c:85)
==24100==    by 0x4009B4: no_comments (main.c:54)
==24100==    by 0x40062E: main (main.c:21)
==24100==  Uninitialised value was created by a stack allocation
==24100==    at 0x400620: main (main.c:20)
==24100==
==24100== Conditional jump or move depends on uninitialised value(s)
==24100==    at 0x4009CA: quote_start (main.c:102)
==24100==    by 0x4009CA: no_comments (main.c:62)
==24100==    by 0x40062E: main (main.c:21)
==24100==  Uninitialised value was created by a stack allocation
==24100==    at 0x400620: main (main.c:20)
==24100==
==24100== Conditional jump or move depends on uninitialised value(s)
==24100==    at 0x518C1B2: vfprintf (in /lib64/libc-2.21.so)
==24100==    by 0x51928E8: printf (in /lib64/libc-2.21.so)
==24100==    by 0x400640: main (main.c:23)
==24100==  Uninitialised value was created by a stack allocation
==24100==    at 0x400620: main (main.c:20)
=
input[++i] = '\0';
i == 0
i == 1
i == 2
i == 3
i == 4
i == 5
i == 6
i == 7
i == 8
i == 9
i == 11
for (int idx = 0; idx <= len; idx++) {
    if (strlen(comments) == 0) { // outside comment

        if (!quoted && comment_start(input, idx)) {
return input[idx] == '/' && (nxt == '*' || nxt == '/');

Context

StackExchange Code Review Q#115073, answer score: 7

Revisions (0)

No revisions yet.