HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

No more filthy words

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filthywordsmore

Problem

Challenge

Given a list of words mixed with extra symbols. Write a program that will clean up the words from extra numbers and symbols.

Specifications

  • The first argument is a path to a file.



  • Each line includes a test case.



  • Each test case is a list of words.



  • Letters are both lowercase and uppercase, and mixed with extra symbols.



  • Print the words separated by spaces in lowercase letters.



Constraints

  • The length of a test case together with extra symbols can be in a range from 10 to 100 symbols.



  • The number of test cases is 40.



Input Sample


(--9Hello----World...--)

Can 0$9 ---you~

13What213are;11you-123+138doing7

Output Sample


hello world

can you

what are you doing

Source

Solution:

#include 
#include 
#include 
#include 
#include 

void to_lowercase(char * input) {
    for(int i = 0; input[i]; i++){
        input[i] = tolower(input[i]);
    }
}

char * sanitize(char * input) {
    char *sanitized = malloc(sizeof (char) * 1024);
    int iterator = 0;
    int character_value;
    bool wordMatched = false;

    for (int i = 0; i = 97 && character_value = 65 && character_value <= 90) {
            sanitized[iterator++] = input[i];
            wordMatched = true;
        } else if (wordMatched) {
            wordMatched = false;
            sanitized[iterator++] = ' ';
        }
    }

    sanitized[iterator] = '\0';
    to_lowercase(sanitized);
    return sanitized;
}

int main(int argc, const char * argv[]) {
    FILE *file = fopen(argv[1], "r");
    char line[1024];
    while (fgets(line, 1024, file)) {
        printf("%s\n", sanitize(line));
    }

    return 0;
}

Solution

Look here:

for (int i = 0; i < strlen(input); i++) {


Note that strlen(input) is O(n), proportional to the length of the input. That makes your algorithm O(n2), which is slower than it should be. If you need to call strlen(), make sure to call it just once. However, this problem is easily solvable without using strlen() at all.

Others have pointed out your memory leak. What should be done about it? The output is never longer than the input, right? Therefore, I would say that your best option is to overwrite the input with the sanitized output. There is no need to allocate any additional memory, and no need to worry about the buffer size. (If the caller wants to keep the original string, then the caller can duplicate it first.)

There is no need to write a to_lowercase() function that calls tolower() on each character in the string. Just call tolower() as part of the loop.

The character_value comparisons could be simplified to isalpha(character_value). Your naming style is inconsistent between character_value and wordMatched. I would change int character_value to char c.

With this code…

} else if (wordMatched) {
    wordMatched = false;
    sanitized[iterator++] = ' ';
}


… you output a space when transitioning from a letter to a non-letter. However, that would output a space corresponding to the 7 at the end of 13What213are;11you-123+138doing7. In my opinion, it would be better if it didn't output a space at the end.

Suggested solution

I've put some miscellaneous remarks in comments.

#include 
#include 
#include 
#include               // You missed this for printf(3)
#include 

/**
 * Replaces consecutive non-alphabetic characters in the input
 * string with a single space.  Non-alphabetic characters at the
 * beginning and end are trimmed off as well.  The remaining ASCII
 * letters are replaced with their lowercase counterparts.
 *
 * The input string will be overwritten.
 *
 * Returns the length of the sanitized output.
 */
size_t sanitize(char *s) {
    bool needSpace = false;
    char *out = s;
    for (char *in = s; *in != '\0'; in++) {
        assert(out  s) {
            needSpace = true;
        }
    }
    *out = '\0';
    return out - s;
}

int main(int argc, const char *argv[]) {
    char line[1024];
    FILE *file = stdin;        // Read from stdin if no filename given
    if (argc > 1 && !(file = fopen(argv[1], "r"))) {
        perror(argv[1]);       // Some error handling that you didn't have
        return EXIT_FAILURE;
    }
    while (fgets(line, sizeof(line), file)) {
        sanitize(line);
        puts(line);            // Don't need printf() when puts() will do
    }
}

Code Snippets

for (int i = 0; i < strlen(input); i++) {
} else if (wordMatched) {
    wordMatched = false;
    sanitized[iterator++] = ' ';
}
#include <assert.h>
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>              // You missed this for printf(3)
#include <stdlib.h>

/**
 * Replaces consecutive non-alphabetic characters in the input
 * string with a single space.  Non-alphabetic characters at the
 * beginning and end are trimmed off as well.  The remaining ASCII
 * letters are replaced with their lowercase counterparts.
 *
 * The input string will be overwritten.
 *
 * Returns the length of the sanitized output.
 */
size_t sanitize(char *s) {
    bool needSpace = false;
    char *out = s;
    for (char *in = s; *in != '\0'; in++) {
        assert(out <= in);
        if (isalpha(*in)) {
            if (needSpace) *out++ = ' ';
            needSpace = false;
            *out++ = tolower(*in);
        } else if (out > s) {
            needSpace = true;
        }
    }
    *out = '\0';
    return out - s;
}

int main(int argc, const char *argv[]) {
    char line[1024];
    FILE *file = stdin;        // Read from stdin if no filename given
    if (argc > 1 && !(file = fopen(argv[1], "r"))) {
        perror(argv[1]);       // Some error handling that you didn't have
        return EXIT_FAILURE;
    }
    while (fgets(line, sizeof(line), file)) {
        sanitize(line);
        puts(line);            // Don't need printf() when puts() will do
    }
}

Context

StackExchange Code Review Q#131730, answer score: 6

Revisions (0)

No revisions yet.