patternrubyModerate
Ruby script on-all-nodes-run not only for teaching
Viewed 0 times
scriptnodesallteachingforrubynotonlyrun
Problem
I wrote the following Ruby script several years ago and have been using it often ever since on a Bioinformatics computer cluster.
It pulls out a list of hosts from the Torque queuing system
A nice feature: results are printed immediately for the host that is next in the order.
I would like to use it as an example for a Ruby workshop. Could you please suggest best-practice and design pattern improvements?
```
#!/usr/bin/ruby
EXCLUDE = [/girkelab/, /biocluster/, /parrot/, /owl/]
require "open3"
# Non-interactive, no password asking, and seasonable timeouts
SSH_OPTIONS = ["-o PreferredAuthentications=publickey,hostbased,gssapi,gssapi-with-mic",
"-o ForwardX11=no",
"-o BatchMode=yes",
"-o SetupTimeOut=5",
"-o ServerAliveInterval=5",
"-o ServerAliveCountMax=2"
].join(" ")
SSH = "/usr/bin/ssh #{SSH_OPTIONS}"
MKDIR = "/bin/mkdir"
raise "Pleae give this command at least one argument" if ARGV.size < 1
COMMAND = ARGV[0..-1].join(' ')
output_o = {}
output_e = {}
IO_CONNECTIONS_TO_REMOTE_PROCESSES = {}
def on_all_nodes(&block)
nodes = []
Kernel.open('|qnodes | grep -v "^ " | grep -v "^$"') do |f|
while line = f.gets
i = line.split(' ').first
nodes.push(i) if EXCLUDE.select{|x| i =~ x}.empty?
end
end
nodes.sort.each {|n| block.call(n)}
end
# Create processes
on_all_nodes do |node|
stdin, stdout, stderr = Open3.popen3("#{SSH} #{node} \"#{COMMAND}\"")
IO_CONNECTIONS_TO_REMOTE_PROCESSES[node] = [stdin, stdout, stderr]
end
has_remote_errors = false
# Collect results
on_all_nodes do |node|
stdin, stdout, stderr = IO_CONNECTIONS_TO_REMOTE_PROCESSES[node]
stdin.close
e_thread = Thread.new do
while line = stderr.gets
line.chomp!
STDERR.puts "#{node} ERROR: #{line}"
It pulls out a list of hosts from the Torque queuing system
qnodes. It ssh'es and runs a command on all the nodes. Then it prints the output and/or errors in a defined order (alphabetical sort of the hostnames).A nice feature: results are printed immediately for the host that is next in the order.
I would like to use it as an example for a Ruby workshop. Could you please suggest best-practice and design pattern improvements?
```
#!/usr/bin/ruby
EXCLUDE = [/girkelab/, /biocluster/, /parrot/, /owl/]
require "open3"
# Non-interactive, no password asking, and seasonable timeouts
SSH_OPTIONS = ["-o PreferredAuthentications=publickey,hostbased,gssapi,gssapi-with-mic",
"-o ForwardX11=no",
"-o BatchMode=yes",
"-o SetupTimeOut=5",
"-o ServerAliveInterval=5",
"-o ServerAliveCountMax=2"
].join(" ")
SSH = "/usr/bin/ssh #{SSH_OPTIONS}"
MKDIR = "/bin/mkdir"
raise "Pleae give this command at least one argument" if ARGV.size < 1
COMMAND = ARGV[0..-1].join(' ')
output_o = {}
output_e = {}
IO_CONNECTIONS_TO_REMOTE_PROCESSES = {}
def on_all_nodes(&block)
nodes = []
Kernel.open('|qnodes | grep -v "^ " | grep -v "^$"') do |f|
while line = f.gets
i = line.split(' ').first
nodes.push(i) if EXCLUDE.select{|x| i =~ x}.empty?
end
end
nodes.sort.each {|n| block.call(n)}
end
# Create processes
on_all_nodes do |node|
stdin, stdout, stderr = Open3.popen3("#{SSH} #{node} \"#{COMMAND}\"")
IO_CONNECTIONS_TO_REMOTE_PROCESSES[node] = [stdin, stdout, stderr]
end
has_remote_errors = false
# Collect results
on_all_nodes do |node|
stdin, stdout, stderr = IO_CONNECTIONS_TO_REMOTE_PROCESSES[node]
stdin.close
e_thread = Thread.new do
while line = stderr.gets
line.chomp!
STDERR.puts "#{node} ERROR: #{line}"
Solution
First of all, take a look at the
Assuming you can't (or won't) use
-
To find out whether none of the elements in an array meet a condition, negating
I would recommend that instead of taking a block and yielding each element, you just return
Note that this will break if
If you make
If you heeded my above advice about
This way you removed the complexity of first putting everything into a hash and then getting it out again.
Again, this can be written more idiomatically as
This is never used. I can only assume that it's a left over of previous iterations of the code, which is no longer necessary. Obviously it should be removed.
I don't see any benefit of doing it this way. Just do:
Note that
Net::SSH library. I don't have much experience with it, so I don't know whether it supports all the options you need. But if it does, using it might be more robust than using the command line utility (you wouldn't have to worry about whether ssh is installed in the place you expect (or at all) and you wouldn't have to worry about escaping the arguments).Assuming you can't (or won't) use
Net::SSH, you should at least replace /usr/bin/ssh with just ssh, so at least it still works if ssh is installed in another location in the PATH.nodes = []
Kernel.open('|qnodes | grep -v "^ " | grep -v "^$"') do |f|
while line = f.gets
i = line.split(' ').first
nodes.push(i) if EXCLUDE.select{|x| i =~ x}.empty?
end
end- When you initialize an empty array and then append to it in a loop, that is often a good sign you want to use
mapand/orselectinstead.
line = f.getsis a bit of an anti-pattern in ruby. TheIOclass already has methods to iterate a file line-wise.
-
To find out whether none of the elements in an array meet a condition, negating
any? seems more idiomatic than building an array with select and check whether it's empty.nodes = Kernel.open('|qnodes | grep -v "^ " | grep -v "^$"') do |f|
f.lines.map do |line|
line.split(' ').first
end.reject do |i|
EXCLUDE.any? {|x| i =~ x}
end
endnodes.sort.each {|n| block.call(n)}I would recommend that instead of taking a block and yielding each element, you just return
nodes.sort and rename the function to all_nodes. This way you can use all_nodes.each to execute code on all nodes, but you could also use all_nodes.map or all_nodes.select when it makes sense.Open3.popen3("#{SSH} #{node} \"#{COMMAND}\"")Note that this will break if
COMMAND contains double quotes itself. Generally trying to escape command line arguments by surrounding them with quotes is a bad idea. system and open3 accept multiple arguments exactly to avoid this.If you make
SSH an array (with one element per argument) instead of a string, you can use the multiple-arguments version of popen3 and can thus avoid the fickle solution of adding quote around COMMAND to escape spaces, i.e.:Open3.popen3(*(SSH + [node, COMMAND]))IO_CONNECTIONS_TO_REMOTE_PROCESSES = {}
# ...
on_all_nodes do |node|
stdin, stdout, stderr = Open3.popen3("#{SSH} #{node} \"#{COMMAND}\"")
IO_CONNECTIONS_TO_REMOTE_PROCESSES[node] = [stdin, stdout, stderr]
endIf you heeded my above advice about
all_nodes you can simplify this using map. I also would suggest not using a Hash here. If you use an array instead, the nodes will stay in the order in which you inserted them, which will mean that you can iterate over that array instead of invoking all_nodes again.has_remote_errors = false
all_nodes.map do |node|
[node, Open3.popen3(*(SSH + [node, COMMAND]))]
end.each do |node, (stdin, stdout, stderr)|
stdin.close
ethread = # ...
# ...
endThis way you removed the complexity of first putting everything into a hash and then getting it out again.
while line = stderr.getsAgain, this can be written more idiomatically as
stderr.each_line do |line|. Same with stdout.first = trueThis is never used. I can only assume that it's a left over of previous iterations of the code, which is no longer necessary. Obviously it should be removed.
# Let the threads finish
t1 = nil
t2 = nil
while [t1, t2].include? nil
if t1.nil?
t1 = e_thread.join(0.1) # Gives 1/10 of a second to STDERR
end
if t2.nil?
t2 = o_thread.join(0.1) # Give 1/10 of a second to STDOUT
end
endI don't see any benefit of doing it this way. Just do:
e_thread.join
o_thread.joinNote that
joining on one thread does not mean that the other threads stop running - only the main thread does, but that's perfectly okay as you want that anyway.Code Snippets
nodes = []
Kernel.open('|qnodes | grep -v "^ " | grep -v "^$"') do |f|
while line = f.gets
i = line.split(' ').first
nodes.push(i) if EXCLUDE.select{|x| i =~ x}.empty?
end
endnodes = Kernel.open('|qnodes | grep -v "^ " | grep -v "^$"') do |f|
f.lines.map do |line|
line.split(' ').first
end.reject do |i|
EXCLUDE.any? {|x| i =~ x}
end
endnodes.sort.each {|n| block.call(n)}Open3.popen3("#{SSH} #{node} \"#{COMMAND}\"")Open3.popen3(*(SSH + [node, COMMAND]))Context
StackExchange Code Review Q#1180, answer score: 10
Revisions (0)
No revisions yet.