HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

CSV File Parser in Ruby - Attempt No. 2

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fileparsercsvattemptruby

Problem

The following is my second attempt at a basic CSV parser in Ruby. I've addressed suggestions that I've got for my previous version here.

I have omitted all test cases - the code listing would be too long otherwise. You can look at the full code-base at GitHub if you want to.

As opposed to the last version, the current solution supports:

  • multiple different value delimiters,



  • values surrounded by quotes,



  • delimiters as a part of quoted values,



  • quotes as a part of quoted values.



Please note that my intentions are not to implement a full-featured CSV parser, but to improve my Ruby programming skills. I'll appreciate any suggestions.

csv_parser.rb

require_relative 'line_parser'

# An exception to report that the given delimiter is not supported.
#
class UnsupportedDelimiterException  The character to be used as delimiter. Defaults to
  # DEFAULT_DELIMITER. Must be one of SUPPORTED_DELIMITERS, otherwise
  # an UnsupportedDelimiterException is raised.
  #
  def initialize delimiter = DEFAULT_DELIMITER
    if not SUPPORTED_DELIMITERS.include? delimiter
      raise UnsupportedDelimiterException.new delimiter
    end
    @line_parser = LineParser.new(Lexer.new delimiter)
  end

  # Parses the given CSV file and returns the result as an array of rows.
  #
  def parse file
    rows = []
    headers = @line_parser.parse file.gets.chomp
    file.each do |line|
      values = {}
      headers.zip(@line_parser.parse line.chomp).each do |key, value|
        values[key] = value
      end
      rows  a hash containing the column -> value mapping
  #
  def initialize values
    @values = values
  end

  # Returns the value in the column given as method name, or null if
  # no such value exists.
  #
  def method_missing name, *args  
    @values[name.to_s]
  end
end


line_parser.rb

```
require_relative 'lexer'

class ParseError < RuntimeError
end

# CSV line parser.
#
class LineParser

# Initializes the parser with the given lexer instance.
#
def initialize

Solution

if not

It is very 'unruby' to use if not. Ruby has a special unless just for these cases:

unless last_seen_identifier
    values << ""
  end
  break


You can even shorten this further by using the one-liner shorthand:

values << "" unless last_seen_identifier
break


One more general guideline - although you can have an unless .. else .. end construct, it is not recommended. Better to reverse the conditional:

# not good
if not last_seen_identifier
  values << ""
  next
else
  last_seen_identifier = false
end

# not better
unless last_seen_identifier
  values << ""
  next
else
  last_seen_identifier = false
end

# better
if last_seen_identifier
  last_seen_identifier = false
else
  values << ""
  next
end

# best
values << "" unless last_seen_identifier
last_seen_identifier = false


Too Many Classes

Contrary to languages like Java or C#, Ruby is not statically typed, but rather duck typed, which means that you can pass any type anywhere, and the objects don't have to inherit from the same base class.

For example, the class Token adds no functionality, and it is not used anywhere in the code besides being inherited by a bunch of classes. It is totally redundant, and deleting it will not affect the code.

Furthermore, EOFToken and DelimiterToken don't contain any intrinsic functionality, and according to ruby's idioms, a better solution would be to simply pass symbols (:eof and :delimiter) rather than classes.

The last token (IdentifierToken) holds the current value, but you can even skip that, simply passing the value itself instead.

This will make the token iteration look like this:

tokens.each do |token|
  case token
    when :eof
      values << "" unless last_seen_identifier
      break
    when :delimiter
      values << "" unless last_seen_identifier
      last_seen_identifier = false
    else
      if last_seen_identifier
        raise ParseError, "Unexpected identifier - a delimiter was expected."
      end
      last_seen_identifier = true
      values << token
  end
end


Assertions?

Assertions are used by some languages to make some development checks where a condition is assumed to be always true. This methodology is not longer en-vogue in most places, and even where it is - it is also disabled in production:


Most languages allow assertions to be enabled or disabled globally,
and sometimes independently. Assertions are often enabled during
development and disabled during final testing and on release to the
customer. Not checking assertions avoids the cost of evaluating the
assertions while (assuming the assertions are free of side effects)
still producing the same result under normal conditions. Under
abnormal conditions, disabling assertion checking can mean that a
program that would have aborted will continue to run. This is
sometimes preferable.

In your code the need for the assertion seems a bit artificial, since it checks that the first character is exactly the character pushed back by the caller:

...
  when quotes
    stream.unshift char
    get_quoted_identifier stream
...

char = stream.shift
assert { char == quotes }


that being the case, I would argue that you can remove the unshift/shift tango, and simply assume that the first char is quotes:

...
  when quotes
    get_quoted_identifier stream
...

char = quotes

Code Snippets

unless last_seen_identifier
    values << ""
  end
  break
values << "" unless last_seen_identifier
break
# not good
if not last_seen_identifier
  values << ""
  next
else
  last_seen_identifier = false
end

# not better
unless last_seen_identifier
  values << ""
  next
else
  last_seen_identifier = false
end

# better
if last_seen_identifier
  last_seen_identifier = false
else
  values << ""
  next
end

# best
values << "" unless last_seen_identifier
last_seen_identifier = false
tokens.each do |token|
  case token
    when :eof
      values << "" unless last_seen_identifier
      break
    when :delimiter
      values << "" unless last_seen_identifier
      last_seen_identifier = false
    else
      if last_seen_identifier
        raise ParseError, "Unexpected identifier - a delimiter was expected."
      end
      last_seen_identifier = true
      values << token
  end
end
...
  when quotes
    stream.unshift char
    get_quoted_identifier stream
...

char = stream.shift
assert { char == quotes }

Context

StackExchange Code Review Q#71094, answer score: 4

Revisions (0)

No revisions yet.