HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpMinor

Trim string of words based on character count

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
trimwordscharacterbasedcountstring

Problem

I needed a simple way to trim a string of words based on a max character count. I also needed to cut in-between words. Using WordPress, here is what I came up with, but I feel like it could be more efficient.

function count_words($content) {
    $partsArray = explode(' ', $content);
    return sizeof($partsArray);
}

function custom_length_excerpt_max_char($content, $max_chars_limit) {
    $count = 0;
    $word_limit;
    while (strlen($content) > $max_chars_limit) {
        $word_limit = count_words($content);
        $word_limit -= 1;
        $content = wp_trim_words($content, $word_limit, '');
        $count++;
        if ($count > 10) {
            break;
        }
    }
    if ($count > 0) {
        $content .= '…';
    }
    return $content;
}


I was originally using str_word_count to count the words but it was choking on HTML-encoded characters in the string.

Solution

Yes, it's extremely inefficient, because in every iteration you count the number of words twice: once in count_words, and then wp_trim_words internally has to count the words again.

Another problem is that the method won't work for content that is many words larger than the target limit length. For example with a content of 4000 chars and a limit of 100, due to the if ($count > 10) break check in the middle, the method will basically cut off the last 10 words and return a still very large content. Not sure if that was part of your plan.

I don't really know PHP. But here's a naive implementation that should be more efficient and work with long content too:

function is_letter($char) {
    return ctype_alpha($char) || in_array($char, ['-']);
}

function custom_length_excerpt_max_char($content, $max_chars_limit) {
    if (strlen($content) = 0 && is_letter($content{$cut_at}); --$cut_at);
    return substr($content, 0, $cut_at);
}

function test_custom_length_excerpt_max_char() {
    $sample = 'some sample-text some sample-text';
    for ($i = 5; $i < strlen($sample) + 10; ++$i) {
        echo custom_length_excerpt_max_char($sample, $i)."\n";
    }
}


Here's the logic behind the main function:

  • If the content is shorter than the limit, return it



  • Iterate backward from the target limit position, until you find something that's NOT a letter, and break the content there.



If you run the test function, the output is something like:

...
some
some
some sample-text
some sample-text
some sample-text
some sample-text
some sample-text
some sample-text some
some sample-text some
some sample-text some
...

Code Snippets

function is_letter($char) {
    return ctype_alpha($char) || in_array($char, ['-']);
}

function custom_length_excerpt_max_char($content, $max_chars_limit) {
    if (strlen($content) < $max_chars_limit) {
        return $content;
    }
    $cut_at = $max_chars_limit;
    for (; $cut_at >= 0 && is_letter($content{$cut_at}); --$cut_at);
    return substr($content, 0, $cut_at);
}

function test_custom_length_excerpt_max_char() {
    $sample = 'some sample-text some sample-text';
    for ($i = 5; $i < strlen($sample) + 10; ++$i) {
        echo custom_length_excerpt_max_char($sample, $i)."\n";
    }
}
...
some
some
some sample-text
some sample-text
some sample-text
some sample-text
some sample-text
some sample-text some
some sample-text some
some sample-text some
...

Context

StackExchange Code Review Q#48645, answer score: 6

Revisions (0)

No revisions yet.