patternrubyMinor
Splitting log file into smaller files
Viewed 0 times
filelogintosplittingfilessmaller
Problem
I had a file that looked like this:
that I wanted to split into smaller files using the ip address after "logger"
This is what I came up with:
Suggestions, warnings, improvements are very welcome :)
One thing I noticed is that the new files are created in the same directory as the original file, not at the current working directory (so ./logsplitter.rb ../log.log creates files in the .. directory).
Thank you
[edit: typo]
Mar 06 22:00:00 [10.251.132.246] logger: 10.64.69.219 - - [06/Mar/2011:22:.....
Mar 06 22:00:00 [10.251.132.246] logger: 10.98.137.116 - - [06/Mar/2011:22:0....that I wanted to split into smaller files using the ip address after "logger"
This is what I came up with:
file = ARGV.shift
split_file = {}
pattern = /logger: ([^\s]*)/
File.open(file, 'r') do |f|
f.each do |l|
match = l[pattern]
if match
list = split_file[$1]
list = [] if list == nil
list << l
split_file[$1] = list
end
end
end
split_file.each_pair do |k, v|
File.open("#{file}.#{k}", "a+") do |f|
v.each do |l|
f.print l
end
end
endSuggestions, warnings, improvements are very welcome :)
One thing I noticed is that the new files are created in the same directory as the original file, not at the current working directory (so ./logsplitter.rb ../log.log creates files in the .. directory).
Thank you
[edit: typo]
Solution
First of all it is a pretty wide-spread convention in ruby to use 2 spaces for indendation not 4. Personally I don't care, but there are some ruby developers who will complain when seeing code indented with 4 spaces, so you'll have an easier time just going with the stream.
Unless there is a good reason to mutate
First of all you should avoid using magic variables. Using MatchData objects is more robust than using magic variables. As an example consider this scenario:
Assume that you decide you want to do some processing on the line before storing it in
However this code is broken. Since
Further the whole code can be simplified by using a very useful feature of ruby hashes: default blocks. Hashes in ruby allow you to specify a block which is executed when a key is not found. This way you can create hash of arrays which you can just append to without having to make sure the array exists.
For this you need to change the initialization of
One thing I noticed is that the new files are created in the same directory as the original file, not at the current working directory (so ./logsplitter.rb ../log.log creates files in the .. directory).
If you want to avoid that use
Speaking of this line: I don't see why you use "a+" instead of just "a" as the opening mode - you never read from it.
file = ARGV.shiftUnless there is a good reason to mutate
ARGV (which in this case doesn't seem to be the case), I'd recommend not using mutating operations. file = ARGV[0] will work perfectly fine here.match = l[pattern]
if match
list = split_file[$1]
list = [] if list == nil
list << l
split_file[$1] = list
endFirst of all you should avoid using magic variables. Using MatchData objects is more robust than using magic variables. As an example consider this scenario:
Assume that you decide you want to do some processing on the line before storing it in
split_file. For this you decide to use gsub. Now your code looks like this:match = l[pattern]
if match
list = split_file[$1]
list = [] if list == nil
list << l.gsub( /some_regex/, "some replacement")
split_file[$1] = list
endHowever this code is broken. Since
gsub also sets $1, $1 now no longer contains what you think it does and split_file[$1] will not work as expected. This kind of bug can't happen if you use [1] on a match data object instead.Further the whole code can be simplified by using a very useful feature of ruby hashes: default blocks. Hashes in ruby allow you to specify a block which is executed when a key is not found. This way you can create hash of arrays which you can just append to without having to make sure the array exists.
For this you need to change the initialization of
split_file from split_file = {} to split_file = Hash.new {|h,k| h[k] = [] }. Then you can replace the above code with:match = l.match(pattern)
if match
split_file[ match[1] ] << l
endOne thing I noticed is that the new files are created in the same directory as the original file, not at the current working directory (so ./logsplitter.rb ../log.log creates files in the .. directory).
If you want to avoid that use
File.basename to extract only the name of the file without the directory from the given path and then build the path of the file to be created from that. I.e.:File.open("#{ File.basename(file) }.#{k}", "a+") do |f|Speaking of this line: I don't see why you use "a+" instead of just "a" as the opening mode - you never read from it.
Code Snippets
file = ARGV.shiftmatch = l[pattern]
if match
list = split_file[$1]
list = [] if list == nil
list << l
split_file[$1] = list
endmatch = l[pattern]
if match
list = split_file[$1]
list = [] if list == nil
list << l.gsub( /some_regex/, "some replacement")
split_file[$1] = list
endmatch = l.match(pattern)
if match
split_file[ match[1] ] << l
endFile.open("#{ File.basename(file) }.#{k}", "a+") do |f|Context
StackExchange Code Review Q#1196, answer score: 4
Revisions (0)
No revisions yet.