patternrubyMinor
Look through a string and return the most frequent character (Ruby)
Viewed 0 times
frequentthereturnlookcharacterrubythroughandstringmost
Problem
I want to determine which separator is used in a csv file. CSV.foreach will return something like this:
The code beneath does the trick, but something better must exist. I find it annoying to have the need for sep_count. Do you know of a method that returns the most frequent of the characters from SEPERATORS?
EDIT:
Based on your awesome answers I got the 1-liner that I asked for:
I have also come up with this piece of code that determines both col_sep and row_sep:
By using the full code we ensure that it is always the first line that gets used, and we also set the row_sep. Feel free to comment if you think anything could be improved further.
["something1;something2;something3"]The code beneath does the trick, but something better must exist. I find it annoying to have the need for sep_count. Do you know of a method that returns the most frequent of the characters from SEPERATORS?
SEPERATORS = [";", ","]
CSV.foreach(@file, @config) do |header|
sep_count = 0
SEPERATORS.each do |seperator|
if header.first.scan(/#{seperator}/).count > sep_count
@config[:col_sep] = seperator
sep_count = header.first.scan(/#{seperator}/).count
end
end
break
endEDIT:
Based on your awesome answers I got the 1-liner that I asked for:
@config[:col_sep] = %w(; ,).sort_by { |separator| File.open(@file).first(1).join.count(separator) }.lastI have also come up with this piece of code that determines both col_sep and row_sep:
first_line = ""
File.open(@file) do |file|
file.each_char do |char|
first_line << char
if "\r\n".include?(char)
@config[:row_sep] = first_line.scan(/\n$|\r$/).first
break
end
end
end
@config[:col_sep] = %w(; ,).sort_by { |separator| first_line.count(separator) }.lastBy using the full code we ensure that it is always the first line that gets used, and we also set the row_sep. Feel free to comment if you think anything could be improved further.
Solution
You can get the most common separator in
But as you have noticed
You probably need to determine the separator in a preprocessing step before actually doing the CSV processing.
You could just do something like
This might be quite slow if your file is large because you have to read the whole thing into memory. In that case you might want to just look at the first line of the file, and guess the separator from that. To do this, assuming
Note that you should use the block form to ensure that the file gets closed.
If you need to be line-separator agnostic, I don't think there's a built-in way to do so (although you can change the line separator, that assumes you know it in advance). You might try something like
header with a one-liner like this:most_common = SEPARATORS.sort_by{|separator| header.count(separator)}.lastBut as you have noticed
CSV.foreach attempts to split up the rows, assuming by default that the separator is a comma.You probably need to determine the separator in a preprocessing step before actually doing the CSV processing.
You could just do something like
contents = File.read(@file)
@config[:col_sep] = %w(; ,).sort_by{|separator| contents.count(separator)}.last
CSV.parse(contents, @config) do |row|
...
end
# or use the returned array of arrays
rows = CSV.parse(contents, @config)This might be quite slow if your file is large because you have to read the whole thing into memory. In that case you might want to just look at the first line of the file, and guess the separator from that. To do this, assuming
\n is your line separator:first_line = File.open(@file) do |file|
file.first
endNote that you should use the block form to ensure that the file gets closed.
If you need to be line-separator agnostic, I don't think there's a built-in way to do so (although you can change the line separator, that assumes you know it in advance). You might try something like
first_line = ""
File.open(@file) do |file|
file.each_char do |char|
break if "\r\n".include?(char)
first_line << char
end
endCode Snippets
most_common = SEPARATORS.sort_by{|separator| header.count(separator)}.lastcontents = File.read(@file)
@config[:col_sep] = %w(; ,).sort_by{|separator| contents.count(separator)}.last
CSV.parse(contents, @config) do |row|
...
end
# or use the returned array of arrays
rows = CSV.parse(contents, @config)first_line = File.open(@file) do |file|
file.first
endfirst_line = ""
File.open(@file) do |file|
file.each_char do |char|
break if "\r\n".include?(char)
first_line << char
end
endContext
StackExchange Code Review Q#19555, answer score: 5
Revisions (0)
No revisions yet.