patternrubyModerate
Less-repetitive code for document-analyzer
Viewed 0 times
repetitivelessfordocumentcodeanalyzer
Problem
I've refactored my code such that every function has only one responsibility. Now I'd like like to work on making it DRY.
I'm noticing a lot of duplication in the
Should my one-statement functions be collapsed into one line?
I'm noticing a lot of duplication in the
DocumentAnalyzer class. In nearly every function, I make a reference to @document.Should my one-statement functions be collapsed into one line?
require 'set'
class Document
attr_reader :words
def initialize(words)
@words = words
end
end
class DocumentAnalyzer
def initialize(document)
@document = document
end
def unique_words
@document.words.to_set
end
def unique_words_count
unique_words.count
end
def word_count
@document.words.size
end
def total_characters
total = 0.0
@document.words.each { |word| total += word.size }
total
end
def average_word_length
total_characters / word_count
end
end
d = Document.new %w(here are some words)
da = DocumentAnalyzer.new(d)Solution
To my eyes, there should really only be one class:
As you noticed, the methods in
But if there's a reason for it to be separate class (I can't think of any), then there's nothing inherently wrong in referencing the same variable in several places.
DRY does not mean "Don't ever repeat anything, ever!". It means "Don't duplicate logic". Accessing a variable isn't in itself logic; logic is what you do with it.
So if you want to always upcase all the words (or something), you should make a method to do that for you, instead of repeating that bit of logic everywhere you need it. But even so, you'd still have to call your
And no, don't collapse your methods. A single line is still a method body. Collapsing code is practically only used when you make a completely empty method or class on purpose, and want to really highlight that it's intended to be empty. E.g.
You'll also see it in code that defines custom exception classes. The classes just inherit from an existing exception class, but they neither add or override anything; the point is just to define them as distinct classes. Hence, they're usually defined with
Document.As you noticed, the methods in
DocumentAnalyzer all use @document. The analyzer class is so tied up in Document (and its methods are so simple) that it probably shouldn't be a wholly separate class.But if there's a reason for it to be separate class (I can't think of any), then there's nothing inherently wrong in referencing the same variable in several places.
DRY does not mean "Don't ever repeat anything, ever!". It means "Don't duplicate logic". Accessing a variable isn't in itself logic; logic is what you do with it.
So if you want to always upcase all the words (or something), you should make a method to do that for you, instead of repeating that bit of logic everywhere you need it. But even so, you'd still have to call your
upcased_words method in all those places, so it's not like you can avoid any and all repetition.And no, don't collapse your methods. A single line is still a method body. Collapsing code is practically only used when you make a completely empty method or class on purpose, and want to really highlight that it's intended to be empty. E.g.
def no_op; end.You'll also see it in code that defines custom exception classes. The classes just inherit from an existing exception class, but they neither add or override anything; the point is just to define them as distinct classes. Hence, they're usually defined with
class CustomError
-
Your Document class could - in its current form - be replaced by a plain array, really. It doesn't really do anything as it is right now. I'd expect a class named Document to (for instance) have methods to load/read a file or something. Instead, you give it an array, and it stores that array. But if you already have that array, why bother?
-
You don't need Set. You can just do words.uniq
-
Your total_characters method is odd. Why does total start out as a float? It'll always be an integer (unless you've discovered fractional characters)
-
Speaking of, the summing up can be done with inject (aka reduce)
def total_characters
@document.words.inject(0) { |sum, word| sum += word.length }
end
-
Also, the method should probably be called character_count to stick to the convention established by word_count and unique_word_count. Neither of those is called total_...
-
You should probably memoize your methods. That is, store the return value after calculating it the first time, so it doesn't have to be calculated again later:
def total_characters
@total_characters ||= @document.words.inject(0) { |sum, word| sum += word.length }
end
This code will return the instance variable immediately unless it's nil or false. Otherwise, it'll actually evaluate the right-hand expression (i.e. sum up the character count), assign the result to the instance variable, and then return the value.
-
If you want to memoize return values, you should also freeze the words array. Otherwise you can do something like some_document.words.fill("foobar") after which your previously computed values stop making sense.
-
I might just skip the *_count methods, since they're not even shortcuts. Anyone can call some_document.words.count or some_document.unique_words.count. Heck, you could even skip unique_words since it's the same as words.uniq (which is even shorter!).
Basically, if the class itself doesn't rely on a method, and the logic is trivial, there's no reason to add that method (i.e. unique_words isn't used by the class, and is very trivial).
Edit As jwg points out in the comments, memoization might be a good reason to add a method "ahead of time". Still, if the class itself doesn't have an obvious use for the method, think twice before adding it.
Here's what I might write
class Document
attr_reader :words
def initialize(words)
@words = words.dup.freeze
end
def characters
@characters ||= words.flat_map { |word| word.chars.to_a }.freeze
end
def average_word_length
return 0 if words.empty? # avoid division by zero
characters.count / words.count.to_f
end
end
Again, I don't think the *_count methods add anything, since you can just write .count anyway:
doc = Document.new(%w(here are some words))
doc.words # => ["here", "are", "some", "words"]
doc.characters # => ["h", "e", "r", "e", "a", "r", "e", "s", "o", "m", "e", "w", "o", "r", "d", "s"]
doc.average_word_length # => 4.0
doc.words.count # => 4
doc.characters.count # => 16
doc.words.uniq # => ["here", "are", "some", "words"]
doc.words.uniq.count # => 4
As you can see, I've also left out the unique_words method, because it's so simple. However, if the words` array is very long, it might be more efficient to provide a memoized method for people to use. I.e.def unique_words
@unique_words ||= words.uniq
endCode Snippets
def total_characters
@document.words.inject(0) { |sum, word| sum += word.length }
enddef total_characters
@total_characters ||= @document.words.inject(0) { |sum, word| sum += word.length }
endclass Document
attr_reader :words
def initialize(words)
@words = words.dup.freeze
end
def characters
@characters ||= words.flat_map { |word| word.chars.to_a }.freeze
end
def average_word_length
return 0 if words.empty? # avoid division by zero
characters.count / words.count.to_f
end
enddoc = Document.new(%w(here are some words))
doc.words # => ["here", "are", "some", "words"]
doc.characters # => ["h", "e", "r", "e", "a", "r", "e", "s", "o", "m", "e", "w", "o", "r", "d", "s"]
doc.average_word_length # => 4.0
doc.words.count # => 4
doc.characters.count # => 16
doc.words.uniq # => ["here", "are", "some", "words"]
doc.words.uniq.count # => 4def unique_words
@unique_words ||= words.uniq
endContext
StackExchange Code Review Q#55567, answer score: 10
Revisions (0)
No revisions yet.