patternrubyMinor
Simple fuzzy text search algorithm
Viewed 0 times
simplesearchtextalgorithmfuzzy
Problem
Basically, this code takes a keyword and an array of strings, then sorts them based on two things: the number of characters shared with the key, and the distance between them.(As a side note, would this properly be called a sort, not a search?)
Specifically I would like to know if any part of it could be made more efficient and/or elegant, but general critique is of course welcome.
def search(key, list)
# Format for everything pushed to out:
# Should this be a class?
# {
# val: the string in question,
# shared_chars: [{char: shared char of key list elem,
# index: index of shared char,
# kindex: index of shared char in key
# distance: how close the shared chars are}]
# }
out = []
key = key.downcase
list.each { |l|
l.downcase!
shared_chars = []
keyi = 0
key.each_char { |c|
index = /#{c}/ =~ l
if index != nil
shared_chars.push char: c, index: index, kindex: keyi
end
# Inelegant?
keyi += 1
}
# Calulate total distance between shared_chars
distance = 0
odistance = 0
shared_chars.each_index { |i|
unless i == shared_chars.length-1
distance += (shared_chars[i+1][:index] - shared_chars[i][:index]).abs
odistance += (shared_chars[i+1][:kindex] - shared_chars[i][:kindex]).abs
end
}
distance -= odistance
distance = distance.abs
out.push val: l, shared_chars: shared_chars, distance: distance
}
out.sort_by! { |e|
# 100 is arbitrary
100 + e[:distance] - e[:shared_chars].length*3
}
out
endSpecifically I would like to know if any part of it could be made more efficient and/or elegant, but general critique is of course welcome.
Solution
- Blocks longer than one line are conventionally written using
do…endrather than braces.
-
Code of the form
out = []
list.each { |l| ... out.push(something) }… would be better expressed as
out = list.collect { |l| something }Besides being slightly more compact, there is a subtle difference in thinking: the former feels like "do this, then this, then this to build a result"; the latter says "transform each element like this", and it's easier to see that there is a one-to-one correspondence between input elements and output elements.
-
Within that block,
out = list.collect do |l|
# Calling l.downcase! would have the side-effect of modifying
# strings in the original list, which is impolite.
l = l.downcase
shared_chars = []
key.split.each_with_index do |c, keyi|
index = l.index(c)
# Actually, char:c is never used and could be eliminated
shared_chars.push(char: c, index: index, kindex: keyi) if index
end
distance, kdistance = 0, 0
shared_chars.each_cons(2) do |a, b|
distance += (b[:index] - a[:index]).abs
kdistance += (b[:kindex] - a[:kindex]).abs
end
{ val: l, shared_chars: shared_chars, distance: (distance - odistance).abs }
end-
Furthermore,
out.sort_by! { |e| ... }
outcould just be
out.sort_by { |e| ... }(
.sort_by is defined because an Array is an Enumerable.)-
I think that the whole function would be better as one chained expression:
def search(key, list)
key = key.downcase
list.collect do |l|
...
end.sort_by do |e|
# There's no point in adding 100 to each element
e[:distance] - 3 * e[:shared_chars].length
end
endCode Snippets
out = []
list.each { |l| ... out.push(something) }out = list.collect { |l| something }out = list.collect do |l|
# Calling l.downcase! would have the side-effect of modifying
# strings in the original list, which is impolite.
l = l.downcase
shared_chars = []
key.split.each_with_index do |c, keyi|
index = l.index(c)
# Actually, char:c is never used and could be eliminated
shared_chars.push(char: c, index: index, kindex: keyi) if index
end
distance, kdistance = 0, 0
shared_chars.each_cons(2) do |a, b|
distance += (b[:index] - a[:index]).abs
kdistance += (b[:kindex] - a[:kindex]).abs
end
{ val: l, shared_chars: shared_chars, distance: (distance - odistance).abs }
endout.sort_by! { |e| ... }
outout.sort_by { |e| ... }Context
StackExchange Code Review Q#37971, answer score: 6
Revisions (0)
No revisions yet.