snippetrubyMinor
Ruby format analyser
Viewed 0 times
analyserformatruby
Problem
I have a requirement to validate file names related to architecture after they are uploaded. Once they have been uploaded I must warn the user if the file name is not standards compliant.
What's in a name
To be standards compliant a file name must consist of 7 parts after the extension is removed from the name, and:
Parts 2, 3, 5, and 6 must each be an abbreviation in a predefined set of values. Validating them is a simple matter of looking up if the abbreviate exists.
I wrote a single class for each part. For the sake of brevity I included only one class out of four. But assume all four are identical. The only difference is the constant of acceptable abbreviations.
```
class FileName
attr_reader :name
def self.valid?(name)
new(name).valid?
end
# Valid file name
# ABCD-ARQ-AP-0022-ACS-LOC-R00.jpeg
def initialize(name)
# Split individual parts into an array, ignoring .extension
@name = name.split('.').first.split('-')
end
def valid?
name.length == 7 &&
project_code_valid? &&
discipline_valid? &&
phase_valid? &&
document_number_valid? &&
subject_valid? &&
level_valid? &&
revision_valid?
end
def project_code
@project_code ||= name[0]
end
def project_code_valid?
project_code !~ /\P{Alnum}/ && project_code.length == 4
end
def discipline
@discipline ||= name[1]
end
def discipline_valid?
Discipline.value_valid?(discipline)
end
def phase
@phase
What's in a name
To be standards compliant a file name must consist of 7 parts after the extension is removed from the name, and:
- part 1 is the project code; an arbitrary set of letters (including diacritics, Ã, Â, etc) and numbers.
- part 2 is the discipline that the file relates to.
- part 3 is the project phase.
- part 4 is a 4-digit document number in the format of xxxx (0001, 0002, etc...)
- part 5 is the subject that the document relates to.
- part 6 is the floor that the project relates to.
- part 7 is the revision number; the format is RXX (R00, R01, etc...)
- parts must be in said order.
Parts 2, 3, 5, and 6 must each be an abbreviation in a predefined set of values. Validating them is a simple matter of looking up if the abbreviate exists.
I wrote a single class for each part. For the sake of brevity I included only one class out of four. But assume all four are identical. The only difference is the constant of acceptable abbreviations.
```
class FileName
attr_reader :name
def self.valid?(name)
new(name).valid?
end
# Valid file name
# ABCD-ARQ-AP-0022-ACS-LOC-R00.jpeg
def initialize(name)
# Split individual parts into an array, ignoring .extension
@name = name.split('.').first.split('-')
end
def valid?
name.length == 7 &&
project_code_valid? &&
discipline_valid? &&
phase_valid? &&
document_number_valid? &&
subject_valid? &&
level_valid? &&
revision_valid?
end
def project_code
@project_code ||= name[0]
end
def project_code_valid?
project_code !~ /\P{Alnum}/ && project_code.length == 4
end
def discipline
@discipline ||= name[1]
end
def discipline_valid?
Discipline.value_valid?(discipline)
end
def phase
@phase
Solution
I don't know if you really need to create classes for checking each of the substrings in the file name prefix. After all, there are only two types of checks that need to be made: against a list or matching a regex. Consider a simple, straighforward approach like this:
.
.
.
The way I've displayed the error messages may not be what you want, but that would not be difficult to change. Note that, when a file name has an invalid format, I've listed all the reasons it is invalid.
When matching a substring against a regex, notice that the length of the substring is checked by including start/end anchors and avoiding the use of
For validity checks that involve a list of possible values, I've made the list an array of the values from your hashes, as the keys did not appear to be used. If the keys are needed, those arrays could be replaced with hashes.
A potential problem with this approach is that it's not very flexible. If, for example, a validity check were changed to involve something other than matching a list or a regex, it might be difficult to alter the code to accommodate it.
I initially considered a different approach that offered greater flexibility. It retained the array of hashes,
This module contains the validity checks that could not be done from the information in
This saves all those methods in the class instance variable
Note that methods can be added to or deleted from the module (or renamed), with no need to alter any of the other code.
A variant of this approach would be create a subclass of the main class for each of these custom checks, and then use the hook Class#inherited to build the array
FNAME_SECTION = [
{offset: 0, name: "Project code" , regex: /^\p{Alnum}{4}$/ },
{offset: 1, name: "Discipline" , list: ['ACE', 'ARQ'] },
{offset: 2, name: "Project phase" , list: ['AP', 'BP'] },
{offset: 3, name: "Document number", regex: /^\d{4}$/ },
{offset: 4, name: "Subject" , list: ['ACS', 'BCS'] },
{offset: 5, name: "Level" , list: ['LOC', 'KOV'] },
{offset: 6, name: "Revision" , regex: /^R\d{2}$/ }
].
def fname_valid?(fname)
@groups = fname.split('.').first.split('-')
if @groups.size != FNAME_SECTION.size
puts "Filename should have #{FNAME_SECTION.size} groups, but has #{@groups.size}"
return nil
end
err = []
FNAME_SECTION.each_with_index do |h,i|
str = @groups[h[:offset]]
if h.key?(:list)
err << i unless h[:list].include?(str)
elsif h.key?(:regex)
err << i unless str =~ h[:regex]
else
err << i
end
end
if err.empty?
puts "File name prefix is valid"
return true
end
puts "File name prefix is invalid"
err.each {|i| puts loc_msg(i)}
return false
end.
private
def loc_msg(i)
" Error in group offset #{FNAME_SECTION[i][:offset]} (#{FNAME_SECTION[i][:name]})"
end.
fname_valid?('ABCD-ARQ-AP-0022-ACS-LOC-R00.jpeg')
# File name prefix is valid
fname_valid?('ABC7-ACE-CP-002a-BCS-LOc-R000.jpeg')
# File name prefix is invalid
# Error in group offset 2 (Project phase)
# Error in group offset 3 (Document number)
# Error in group offset 5 (Level)
# Error in group offset 6 (Revision)
fname_valid?('ABCD-ARQ-AP-0022-ACS-LOC.jpeg')
# Filename prefix should have 7 groups, but has 6The way I've displayed the error messages may not be what you want, but that would not be difficult to change. Note that, when a file name has an invalid format, I've listed all the reasons it is invalid.
When matching a substring against a regex, notice that the length of the substring is checked by including start/end anchors and avoiding the use of
re+, re* and re?.For validity checks that involve a list of possible values, I've made the list an array of the values from your hashes, as the keys did not appear to be used. If the keys are needed, those arrays could be replaced with hashes.
A potential problem with this approach is that it's not very flexible. If, for example, a validity check were changed to involve something other than matching a list or a regex, it might be difficult to alter the code to accommodate it.
I initially considered a different approach that offered greater flexibility. It retained the array of hashes,
FNAME_SECTION, possibly changed somewhat, but also had a module that looked something like this:module CustomValidityChecks
def document_number_valid?
...
end
def revision_valid?
...
end
endThis module contains the validity checks that could not be done from the information in
FNAME_SECTION alone. The following is executed in the main class, when it is parsed:@custom_validity_checks = CustomValidityChecks.instance_methods(false)This saves all those methods in the class instance variable
@custom_validity_checks. One could then use the earlier approach to make the validity checks that draw only on the information in FNAME_SECTION, and cycle through @custom_validity_checks to perform the others:@custom_validity_checks.each { |m| send(m) }Note that methods can be added to or deleted from the module (or renamed), with no need to alter any of the other code.
A variant of this approach would be create a subclass of the main class for each of these custom checks, and then use the hook Class#inherited to build the array
@custom_validity_checks.Code Snippets
FNAME_SECTION = [
{offset: 0, name: "Project code" , regex: /^\p{Alnum}{4}$/ },
{offset: 1, name: "Discipline" , list: ['ACE', 'ARQ'] },
{offset: 2, name: "Project phase" , list: ['AP', 'BP'] },
{offset: 3, name: "Document number", regex: /^\d{4}$/ },
{offset: 4, name: "Subject" , list: ['ACS', 'BCS'] },
{offset: 5, name: "Level" , list: ['LOC', 'KOV'] },
{offset: 6, name: "Revision" , regex: /^R\d{2}$/ }
]def fname_valid?(fname)
@groups = fname.split('.').first.split('-')
if @groups.size != FNAME_SECTION.size
puts "Filename should have #{FNAME_SECTION.size} groups, but has #{@groups.size}"
return nil
end
err = []
FNAME_SECTION.each_with_index do |h,i|
str = @groups[h[:offset]]
if h.key?(:list)
err << i unless h[:list].include?(str)
elsif h.key?(:regex)
err << i unless str =~ h[:regex]
else
err << i
end
end
if err.empty?
puts "File name prefix is valid"
return true
end
puts "File name prefix is invalid"
err.each {|i| puts loc_msg(i)}
return false
endprivate
def loc_msg(i)
" Error in group offset #{FNAME_SECTION[i][:offset]} (#{FNAME_SECTION[i][:name]})"
endfname_valid?('ABCD-ARQ-AP-0022-ACS-LOC-R00.jpeg')
# File name prefix is valid
fname_valid?('ABC7-ACE-CP-002a-BCS-LOc-R000.jpeg')
# File name prefix is invalid
# Error in group offset 2 (Project phase)
# Error in group offset 3 (Document number)
# Error in group offset 5 (Level)
# Error in group offset 6 (Revision)
fname_valid?('ABCD-ARQ-AP-0022-ACS-LOC.jpeg')
# Filename prefix should have 7 groups, but has 6module CustomValidityChecks
def document_number_valid?
...
end
def revision_valid?
...
end
endContext
StackExchange Code Review Q#44223, answer score: 9
Revisions (0)
No revisions yet.