patternrubyMinor
CSV File Parser in Ruby - Attempt No. 2
Viewed 0 times
fileparsercsvattemptruby
Problem
The following is my second attempt at a basic CSV parser in Ruby. I've addressed suggestions that I've got for my previous version here.
I have omitted all test cases - the code listing would be too long otherwise. You can look at the full code-base at GitHub if you want to.
As opposed to the last version, the current solution supports:
Please note that my intentions are not to implement a full-featured CSV parser, but to improve my Ruby programming skills. I'll appreciate any suggestions.
csv_parser.rb
line_parser.rb
```
require_relative 'lexer'
class ParseError < RuntimeError
end
# CSV line parser.
#
class LineParser
# Initializes the parser with the given lexer instance.
#
def initialize
I have omitted all test cases - the code listing would be too long otherwise. You can look at the full code-base at GitHub if you want to.
As opposed to the last version, the current solution supports:
- multiple different value delimiters,
- values surrounded by quotes,
- delimiters as a part of quoted values,
- quotes as a part of quoted values.
Please note that my intentions are not to implement a full-featured CSV parser, but to improve my Ruby programming skills. I'll appreciate any suggestions.
csv_parser.rb
require_relative 'line_parser'
# An exception to report that the given delimiter is not supported.
#
class UnsupportedDelimiterException The character to be used as delimiter. Defaults to
# DEFAULT_DELIMITER. Must be one of SUPPORTED_DELIMITERS, otherwise
# an UnsupportedDelimiterException is raised.
#
def initialize delimiter = DEFAULT_DELIMITER
if not SUPPORTED_DELIMITERS.include? delimiter
raise UnsupportedDelimiterException.new delimiter
end
@line_parser = LineParser.new(Lexer.new delimiter)
end
# Parses the given CSV file and returns the result as an array of rows.
#
def parse file
rows = []
headers = @line_parser.parse file.gets.chomp
file.each do |line|
values = {}
headers.zip(@line_parser.parse line.chomp).each do |key, value|
values[key] = value
end
rows a hash containing the column -> value mapping
#
def initialize values
@values = values
end
# Returns the value in the column given as method name, or null if
# no such value exists.
#
def method_missing name, *args
@values[name.to_s]
end
endline_parser.rb
```
require_relative 'lexer'
class ParseError < RuntimeError
end
# CSV line parser.
#
class LineParser
# Initializes the parser with the given lexer instance.
#
def initialize
Solution
if notIt is very 'unruby' to use
if not. Ruby has a special unless just for these cases:unless last_seen_identifier
values << ""
end
breakYou can even shorten this further by using the one-liner shorthand:
values << "" unless last_seen_identifier
breakOne more general guideline - although you can have an
unless .. else .. end construct, it is not recommended. Better to reverse the conditional:# not good
if not last_seen_identifier
values << ""
next
else
last_seen_identifier = false
end
# not better
unless last_seen_identifier
values << ""
next
else
last_seen_identifier = false
end
# better
if last_seen_identifier
last_seen_identifier = false
else
values << ""
next
end
# best
values << "" unless last_seen_identifier
last_seen_identifier = falseToo Many Classes
Contrary to languages like Java or C#, Ruby is not statically typed, but rather duck typed, which means that you can pass any type anywhere, and the objects don't have to inherit from the same base class.
For example, the class
Token adds no functionality, and it is not used anywhere in the code besides being inherited by a bunch of classes. It is totally redundant, and deleting it will not affect the code.Furthermore,
EOFToken and DelimiterToken don't contain any intrinsic functionality, and according to ruby's idioms, a better solution would be to simply pass symbols (:eof and :delimiter) rather than classes.The last token (
IdentifierToken) holds the current value, but you can even skip that, simply passing the value itself instead.This will make the token iteration look like this:
tokens.each do |token|
case token
when :eof
values << "" unless last_seen_identifier
break
when :delimiter
values << "" unless last_seen_identifier
last_seen_identifier = false
else
if last_seen_identifier
raise ParseError, "Unexpected identifier - a delimiter was expected."
end
last_seen_identifier = true
values << token
end
endAssertions?
Assertions are used by some languages to make some development checks where a condition is assumed to be always true. This methodology is not longer en-vogue in most places, and even where it is - it is also disabled in production:
Most languages allow assertions to be enabled or disabled globally,
and sometimes independently. Assertions are often enabled during
development and disabled during final testing and on release to the
customer. Not checking assertions avoids the cost of evaluating the
assertions while (assuming the assertions are free of side effects)
still producing the same result under normal conditions. Under
abnormal conditions, disabling assertion checking can mean that a
program that would have aborted will continue to run. This is
sometimes preferable.
In your code the need for the assertion seems a bit artificial, since it checks that the first character is exactly the character pushed back by the caller:
...
when quotes
stream.unshift char
get_quoted_identifier stream
...
char = stream.shift
assert { char == quotes }that being the case, I would argue that you can remove the unshift/shift tango, and simply assume that the first char is
quotes:...
when quotes
get_quoted_identifier stream
...
char = quotesCode Snippets
unless last_seen_identifier
values << ""
end
breakvalues << "" unless last_seen_identifier
break# not good
if not last_seen_identifier
values << ""
next
else
last_seen_identifier = false
end
# not better
unless last_seen_identifier
values << ""
next
else
last_seen_identifier = false
end
# better
if last_seen_identifier
last_seen_identifier = false
else
values << ""
next
end
# best
values << "" unless last_seen_identifier
last_seen_identifier = falsetokens.each do |token|
case token
when :eof
values << "" unless last_seen_identifier
break
when :delimiter
values << "" unless last_seen_identifier
last_seen_identifier = false
else
if last_seen_identifier
raise ParseError, "Unexpected identifier - a delimiter was expected."
end
last_seen_identifier = true
values << token
end
end...
when quotes
stream.unshift char
get_quoted_identifier stream
...
char = stream.shift
assert { char == quotes }Context
StackExchange Code Review Q#71094, answer score: 4
Revisions (0)
No revisions yet.