HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Simple fuzzy text search algorithm

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
simplesearchtextalgorithmfuzzy

Problem

Basically, this code takes a keyword and an array of strings, then sorts them based on two things: the number of characters shared with the key, and the distance between them.(As a side note, would this properly be called a sort, not a search?)

def search(key, list)
  # Format for everything pushed to out:
  # Should this be a class?    
  # { 
  #   val: the string in question, 
  #   shared_chars: [{char: shared char of key list elem, 
  #                   index: index of shared char,
  #                   kindex: index of shared char in key
  #                   distance: how close the shared chars are}]
  # }
  out = []
  key = key.downcase

  list.each { |l|
    l.downcase!
    shared_chars = []

    keyi = 0
    key.each_char { |c|
      index = /#{c}/ =~ l

      if index != nil
        shared_chars.push char: c, index: index, kindex: keyi
      end

      # Inelegant?
      keyi += 1
    }

    # Calulate total distance between shared_chars
    distance = 0
    odistance = 0

    shared_chars.each_index { |i|
      unless i == shared_chars.length-1
        distance += (shared_chars[i+1][:index] - shared_chars[i][:index]).abs
        odistance += (shared_chars[i+1][:kindex] - shared_chars[i][:kindex]).abs 
      end
    }

    distance -= odistance
    distance = distance.abs

    out.push val: l, shared_chars: shared_chars, distance: distance
  }

  out.sort_by! { |e|
    # 100 is arbitrary
    100 + e[:distance] - e[:shared_chars].length*3
  }

  out
end


Specifically I would like to know if any part of it could be made more efficient and/or elegant, but general critique is of course welcome.

Solution


  • Blocks longer than one line are conventionally written using doend rather than braces.



-
Code of the form

out = []
list.each { |l| ... out.push(something) }


… would be better expressed as

out = list.collect { |l| something }


Besides being slightly more compact, there is a subtle difference in thinking: the former feels like "do this, then this, then this to build a result"; the latter says "transform each element like this", and it's easier to see that there is a one-to-one correspondence between input elements and output elements.

-
Within that block,

out = list.collect do |l|
  # Calling l.downcase! would have the side-effect of modifying
  # strings in the original list, which is impolite.
  l = l.downcase

  shared_chars = []
  key.split.each_with_index do |c, keyi|
    index = l.index(c)
    # Actually, char:c is never used and could be eliminated
    shared_chars.push(char: c, index: index, kindex: keyi) if index
  end

  distance, kdistance = 0, 0
  shared_chars.each_cons(2) do |a, b|
    distance += (b[:index] - a[:index]).abs
    kdistance += (b[:kindex] - a[:kindex]).abs
  end

  { val: l, shared_chars: shared_chars, distance: (distance - odistance).abs }
end


-
Furthermore,

out.sort_by! { |e| ... }
out


could just be

out.sort_by { |e| ... }


(.sort_by is defined because an Array is an Enumerable.)

-
I think that the whole function would be better as one chained expression:

def search(key, list)
  key = key.downcase

  list.collect do |l|
    ...
  end.sort_by do |e|
    # There's no point in adding 100 to each element
    e[:distance] - 3 * e[:shared_chars].length
  end
end

Code Snippets

out = []
list.each { |l| ... out.push(something) }
out = list.collect { |l| something }
out = list.collect do |l|
  # Calling l.downcase! would have the side-effect of modifying
  # strings in the original list, which is impolite.
  l = l.downcase

  shared_chars = []
  key.split.each_with_index do |c, keyi|
    index = l.index(c)
    # Actually, char:c is never used and could be eliminated
    shared_chars.push(char: c, index: index, kindex: keyi) if index
  end

  distance, kdistance = 0, 0
  shared_chars.each_cons(2) do |a, b|
    distance += (b[:index] - a[:index]).abs
    kdistance += (b[:kindex] - a[:kindex]).abs
  end

  { val: l, shared_chars: shared_chars, distance: (distance - odistance).abs }
end
out.sort_by! { |e| ... }
out
out.sort_by { |e| ... }

Context

StackExchange Code Review Q#37971, answer score: 6

Revisions (0)

No revisions yet.