patternrubyMinor
Titleize words in a sentence but with some conditions
Viewed 0 times
titleizewithwordsbutsentencesomeconditions
Problem
Below is the code I have written to capitalize all the words of a sentence except if
-
The words belong to the littleWords list.
-
The word would be capitalized if it's the first word of the sentence even if it is in the littleWords list.
SPEC
I am new to ruby/script and am coming from Java. The code above doesnt looks as nice and clean as I think could be done with ruby.
-
The words belong to the littleWords list.
-
The word would be capitalized if it's the first word of the sentence even if it is in the littleWords list.
def titleize(sentence)
littleWords = ["end", "over", "and", "the"]
words = sentence.split(/^(\w+)\b/)
sentence = if words[2]
words[2].split(" ").map do |word|
littleWords.include?(word) ? (" " + word) : (" " + word.titleize)
end
end
words[1].titleize + (sentence||[]).join("")
endSPEC
describe "titleize" do
it "capitalizes a word" do
titleize("jaws").should == "Jaws"
end
it "capitalizes every word (aka title case)" do
titleize("david copperfield").should == "David Copperfield"
end
it "doesn't capitalize 'little words' in a title" do
titleize("war and peace").should == "War and Peace"
end
it "does capitalize 'little words' at the start of a title" do
titleize("the bridge over the river kwai").should == "The Bridge over the River Kwai"
end
endI am new to ruby/script and am coming from Java. The code above doesnt looks as nice and clean as I think could be done with ruby.
Solution
Rather than split and join the string, it'd be simpler to
Also: a minor thing but unlike Java, Ruby favors
Here's a simplistic implementation:
Of course,
Basically, that "titleizing" a string is bit of hornet's nest. There are multiple schools of thought on how it should be done, some depending on context. And right when you think you have it, someone writes a URL or a name like "iPhone" in a sentence, and it comes out wrong anyway.
A slightly more clever - but still brittle! - solution might be:
The regex only matches words that are all-lowercase meaning "DNA" and "iPhone" will pass though undisturbed. But we can't
LITTLE_WORDS = %w{ end over and the }
#=> ["end", "over", "and", "the"]
- Always
capitalizethe sentence itself, so it always starts with an uppercase letter
- Pass a block to
gsub, letting it do the filtering.
Also: a minor thing but unlike Java, Ruby favors
snake_case rather than camelCase for names. So conventionally, it'd be little_words, not littleWords.Here's a simplistic implementation:
def titleize(sentence)
little_words = %w(end over and the)
sentence.capitalize.gsub(/(\w+)/) do |word|
little_words.include?(word) ? word : word.capitalize
end
endOf course,
capitalize doesn't just make the leading letter uppercase, it also foribly downcases the rest of the string. So writing, say, "DNA and RNA" will incorrectly give you "Dna and Rna". It isn't fond of unicode characters either, so caveat emptor.Basically, that "titleizing" a string is bit of hornet's nest. There are multiple schools of thought on how it should be done, some depending on context. And right when you think you have it, someone writes a URL or a name like "iPhone" in a sentence, and it comes out wrong anyway.
A slightly more clever - but still brittle! - solution might be:
def titleize(sentence)
little_words = %w(end over and the)
sentence.gsub(/\b(\p{Ll}+)\b/) do |word|
# The following breaks codereview's syntax highlighting, but it's valid Ruby code.
# I used a "full" if-else rather than a ternary just to keep the lines shorter.
if Rather than split and join the string, it'd be simpler to
- Always
capitalize the sentence itself, so it always starts with an uppercase letter
- Pass a block to
gsub, letting it do the filtering.
Also: a minor thing but unlike Java, Ruby favors snake_case rather than camelCase for names. So conventionally, it'd be little_words, not littleWords.
Here's a simplistic implementation:
def titleize(sentence)
little_words = %w(end over and the)
sentence.capitalize.gsub(/(\w+)/) do |word|
little_words.include?(word) ? word : word.capitalize
end
end
Of course, capitalize doesn't just make the leading letter uppercase, it also foribly downcases the rest of the string. So writing, say, "DNA and RNA" will incorrectly give you "Dna and Rna". It isn't fond of unicode characters either, so caveat emptor.
Basically, that "titleizing" a string is bit of hornet's nest. There are multiple schools of thought on how it should be done, some depending on context. And right when you think you have it, someone writes a URL or a name like "iPhone" in a sentence, and it comes out wrong anyway.
A slightly more clever - but still brittle! - solution might be:
.empty? || !little_words.include?(word)
word.capitalize
else
word
end
end
endThe regex only matches words that are all-lowercase meaning "DNA" and "iPhone" will pass though undisturbed. But we can't
capitalize the string in its entirety because it'd just make everything except the first word an all-lowercase word. So instead, we have the $ "magic" variable, which contains the string before the current match. If it's empty, we're at the start, and should capitalize the word even if it's on the small_words list.
But again, this isn't a great solution at all. It's just here to illustrate some regex voodoo.
And in any event this isn't a new problem. Here's a gem that sounds like it's a port of this Perl script, which, if nothing else, comes with an explanation of its working.
Edit: As tokland points out in the comments, using a Set instead of an array for litte_words` would make for faster lookups; no searching necessary. It'd also be nicer to define the list of little words as a constant, rather than declare it as a local variable when the method's run:LITTLE_WORDS = %w{ end over and the }
#=> ["end", "over", "and", "the"]
Code Snippets
def titleize(sentence)
little_words = %w(end over and the)
sentence.capitalize.gsub(/(\w+)/) do |word|
little_words.include?(word) ? word : word.capitalize
end
enddef titleize(sentence)
little_words = %w(end over and the)
sentence.gsub(/\b(\p{Ll}+)\b/) do |word|
# The following breaks codereview's syntax highlighting, but it's valid Ruby code.
# I used a "full" if-else rather than a ternary just to keep the lines shorter.
if $`.empty? || !little_words.include?(word)
word.capitalize
else
word
end
end
endContext
StackExchange Code Review Q#79905, answer score: 5
Revisions (0)
No revisions yet.