HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Titleize words in a sentence but with some conditions

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
titleizewithwordsbutsentencesomeconditions

Problem

Below is the code I have written to capitalize all the words of a sentence except if

-
The words belong to the littleWords list.

-
The word would be capitalized if it's the first word of the sentence even if it is in the littleWords list.

def titleize(sentence)
    littleWords = ["end", "over", "and", "the"]
    words = sentence.split(/^(\w+)\b/)
    sentence = if words[2] 
        words[2].split(" ").map do |word| 
            littleWords.include?(word) ? (" " + word) : (" " + word.titleize)  
        end
    end
    words[1].titleize + (sentence||[]).join("")
end


SPEC

describe "titleize" do
it "capitalizes a word" do
  titleize("jaws").should == "Jaws"
end

it "capitalizes every word (aka title case)" do
  titleize("david copperfield").should == "David Copperfield"
end

it "doesn't capitalize 'little words' in a title" do
  titleize("war and peace").should == "War and Peace"
end

it "does capitalize 'little words' at the start of a title" do
  titleize("the bridge over the river kwai").should == "The Bridge over the River Kwai"
end
end


I am new to ruby/script and am coming from Java. The code above doesnt looks as nice and clean as I think could be done with ruby.

Solution

Rather than split and join the string, it'd be simpler to

  • Always capitalize the sentence itself, so it always starts with an uppercase letter



  • Pass a block to gsub, letting it do the filtering.



Also: a minor thing but unlike Java, Ruby favors snake_case rather than camelCase for names. So conventionally, it'd be little_words, not littleWords.

Here's a simplistic implementation:

def titleize(sentence)
  little_words = %w(end over and the)
  sentence.capitalize.gsub(/(\w+)/) do |word|
    little_words.include?(word) ? word : word.capitalize
  end
end


Of course, capitalize doesn't just make the leading letter uppercase, it also foribly downcases the rest of the string. So writing, say, "DNA and RNA" will incorrectly give you "Dna and Rna". It isn't fond of unicode characters either, so caveat emptor.

Basically, that "titleizing" a string is bit of hornet's nest. There are multiple schools of thought on how it should be done, some depending on context. And right when you think you have it, someone writes a URL or a name like "iPhone" in a sentence, and it comes out wrong anyway.

A slightly more clever - but still brittle! - solution might be:

def titleize(sentence)
  little_words = %w(end over and the)
  sentence.gsub(/\b(\p{Ll}+)\b/) do |word|
    # The following breaks codereview's syntax highlighting, but it's valid Ruby code.
    # I used a "full" if-else rather than a ternary just to keep the lines shorter.
    if Rather than split and join the string, it'd be simpler to

  • Always capitalize the sentence itself, so it always starts with an uppercase letter



  • Pass a block to gsub, letting it do the filtering.



Also: a minor thing but unlike Java, Ruby favors snake_case rather than camelCase for names. So conventionally, it'd be little_words, not littleWords.

Here's a simplistic implementation:

def titleize(sentence)
  little_words = %w(end over and the)
  sentence.capitalize.gsub(/(\w+)/) do |word|
    little_words.include?(word) ? word : word.capitalize
  end
end


Of course, capitalize doesn't just make the leading letter uppercase, it also foribly downcases the rest of the string. So writing, say, "DNA and RNA" will incorrectly give you "Dna and Rna". It isn't fond of unicode characters either, so caveat emptor.

Basically, that "titleizing" a string is bit of hornet's nest. There are multiple schools of thought on how it should be done, some depending on context. And right when you think you have it, someone writes a URL or a name like "iPhone" in a sentence, and it comes out wrong anyway.

A slightly more clever - but still brittle! - solution might be:

.empty? || !little_words.include?(word) word.capitalize else word end end end


The regex only matches words that are all-lowercase meaning "DNA" and "iPhone" will pass though undisturbed. But we can't capitalize the string in its entirety because it'd just make everything except the first word an all-lowercase word. So instead, we have the $ "magic" variable, which contains the string before the current match. If it's empty, we're at the start, and should capitalize the word even if it's on the small_words list.

But again, this isn't a great solution at all. It's just here to illustrate some regex voodoo.

And in any event this isn't a new problem. Here's a gem that sounds like it's a port of this Perl script, which, if nothing else, comes with an explanation of its working.

Edit: As tokland points out in the comments, using a
Set instead of an array for litte_words` would make for faster lookups; no searching necessary. It'd also be nicer to define the list of little words as a constant, rather than declare it as a local variable when the method's run:

LITTLE_WORDS = %w{ end over and the }
#=> ["end", "over", "and", "the"]

Code Snippets

def titleize(sentence)
  little_words = %w(end over and the)
  sentence.capitalize.gsub(/(\w+)/) do |word|
    little_words.include?(word) ? word : word.capitalize
  end
end
def titleize(sentence)
  little_words = %w(end over and the)
  sentence.gsub(/\b(\p{Ll}+)\b/) do |word|
    # The following breaks codereview's syntax highlighting, but it's valid Ruby code.
    # I used a "full" if-else rather than a ternary just to keep the lines shorter.
    if $`.empty? || !little_words.include?(word)
      word.capitalize
    else
      word
    end
  end
end

Context

StackExchange Code Review Q#79905, answer score: 5

Revisions (0)

No revisions yet.