HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Removing list of words from a text file in Ruby

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fileremovingwordstextrubylistfrom

Problem

I have two files.

  • File 1. Has a list of all the dictionary words



  • File 2. Has a list of


all prepositions.

I want to remove all the prepositions from the dictionary.
I want to reduce the number of lines in my code and also make it more elegant, idiomatic and readable.

#!/usr/bin/env ruby

    path = "/Users/../Desktop/";

    file_original_wordlist = File.open("#{path}"  + "dictionary.txt",  "r")
    file_remove_wordlist = File.open("#{path}"  + "prepositions.txt", "r")

    # Need to initialize the variables else I get errors
    delete_word = false
    word_orig =  ''
    word_rem  =  ''
    count = 0
    file_original_wordlist.each_line do |line1|
      file_remove_wordlist.each_line do |line2|
        word_orig =  line1
        word_rem  =  line2 
        if word_orig.eql?(word_rem)
          puts "Deleting the word " + word_rem
          delete_word = true 
          count++
        end   
      end
      if delete_word == false
        File.open(path + "scrubbed_list.txt", "a") {|f| f.write(word_orig) }
      end
  # Need to reopen the file otherwise after the first iteration to start from the beginning 
      file_remove_wordlist = File.open("#{path}"  + "prepositions.txt", "r")
      delete_word = false
    end

    puts "Deleted " + count + " words in total"

Solution

Some notes:

-
I guess you come from imperative languages. Try to write in a more functional style (more expressions, less statements).

-
Use libraries (File) to manipulate paths.

-
This double each_line is bad news for performance: O(n*m). Avoid it by building a data structure that has O(1) checks for inclusion. I'd create a set of the prepositions (it's the smaller set). The overall performance is now O(n).

I'd write:

prepositions = open(File.join(path, "prepositions.txt")).lines.to_a 
words = open(File.join(path, "dictionary.txt")).lines.to_a
filtered_words = words - prepositions
File.write("dictionary_without_prepositions.txt", filtered_words.join)


If the input file dictionary.txt is very, very large, this is a more lazy aproach:

require 'set'
prepositions = open(File.join(path, "prepositions.txt")).lines.to_set

open("dictionary_without_prepositions.txt", "w") do |output| 
  open(File.join(path, "dictionary.txt")).lines.each do |line|
    unless prepositions.include?(line)
      output.write(line)
    end
  end
end

Code Snippets

prepositions = open(File.join(path, "prepositions.txt")).lines.to_a 
words = open(File.join(path, "dictionary.txt")).lines.to_a
filtered_words = words - prepositions
File.write("dictionary_without_prepositions.txt", filtered_words.join)
require 'set'
prepositions = open(File.join(path, "prepositions.txt")).lines.to_set

open("dictionary_without_prepositions.txt", "w") do |output| 
  open(File.join(path, "dictionary.txt")).lines.each do |line|
    unless prepositions.include?(line)
      output.write(line)
    end
  end
end

Context

StackExchange Code Review Q#42539, answer score: 4

Revisions (0)

No revisions yet.