HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Splitting log file into smaller files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filelogintosplittingfilessmaller

Problem

I had a file that looked like this:

Mar 06 22:00:00 [10.251.132.246] logger: 10.64.69.219 - - [06/Mar/2011:22:.....
Mar 06 22:00:00 [10.251.132.246] logger: 10.98.137.116 - - [06/Mar/2011:22:0....


that I wanted to split into smaller files using the ip address after "logger"

This is what I came up with:

file = ARGV.shift
split_file = {}
pattern = /logger: ([^\s]*)/
File.open(file, 'r') do |f|
    f.each do |l|
        match = l[pattern]
        if match
            list = split_file[$1]
            list = [] if list == nil
            list << l
            split_file[$1] = list
        end
    end
end

split_file.each_pair do |k, v|
    File.open("#{file}.#{k}", "a+") do |f|
        v.each do |l|
            f.print l
        end
    end
end


Suggestions, warnings, improvements are very welcome :)

One thing I noticed is that the new files are created in the same directory as the original file, not at the current working directory (so ./logsplitter.rb ../log.log creates files in the .. directory).

Thank you

[edit: typo]

Solution

First of all it is a pretty wide-spread convention in ruby to use 2 spaces for indendation not 4. Personally I don't care, but there are some ruby developers who will complain when seeing code indented with 4 spaces, so you'll have an easier time just going with the stream.

file = ARGV.shift


Unless there is a good reason to mutate ARGV (which in this case doesn't seem to be the case), I'd recommend not using mutating operations. file = ARGV[0] will work perfectly fine here.

match = l[pattern]
if match
    list = split_file[$1]
    list = [] if list == nil
    list << l
    split_file[$1] = list
end


First of all you should avoid using magic variables. Using MatchData objects is more robust than using magic variables. As an example consider this scenario:

Assume that you decide you want to do some processing on the line before storing it in split_file. For this you decide to use gsub. Now your code looks like this:

match = l[pattern]
if match
    list = split_file[$1]
    list = [] if list == nil
    list << l.gsub( /some_regex/, "some replacement")
    split_file[$1] = list
end


However this code is broken. Since gsub also sets $1, $1 now no longer contains what you think it does and split_file[$1] will not work as expected. This kind of bug can't happen if you use [1] on a match data object instead.

Further the whole code can be simplified by using a very useful feature of ruby hashes: default blocks. Hashes in ruby allow you to specify a block which is executed when a key is not found. This way you can create hash of arrays which you can just append to without having to make sure the array exists.

For this you need to change the initialization of split_file from split_file = {} to split_file = Hash.new {|h,k| h[k] = [] }. Then you can replace the above code with:

match = l.match(pattern)
if match
  split_file[ match[1] ] << l
end



One thing I noticed is that the new files are created in the same directory as the original file, not at the current working directory (so ./logsplitter.rb ../log.log creates files in the .. directory).

If you want to avoid that use File.basename to extract only the name of the file without the directory from the given path and then build the path of the file to be created from that. I.e.:

File.open("#{ File.basename(file) }.#{k}", "a+") do |f|


Speaking of this line: I don't see why you use "a+" instead of just "a" as the opening mode - you never read from it.

Code Snippets

file = ARGV.shift
match = l[pattern]
if match
    list = split_file[$1]
    list = [] if list == nil
    list << l
    split_file[$1] = list
end
match = l[pattern]
if match
    list = split_file[$1]
    list = [] if list == nil
    list << l.gsub( /some_regex/, "some replacement")
    split_file[$1] = list
end
match = l.match(pattern)
if match
  split_file[ match[1] ] << l
end
File.open("#{ File.basename(file) }.#{k}", "a+") do |f|

Context

StackExchange Code Review Q#1196, answer score: 4

Revisions (0)

No revisions yet.