HiveBrain v1.2.0
Get Started
← Back to all entries
snippetrubyMinor

Ruby format analyser

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
analyserformatruby

Problem

I have a requirement to validate file names related to architecture after they are uploaded. Once they have been uploaded I must warn the user if the file name is not standards compliant.

What's in a name

To be standards compliant a file name must consist of 7 parts after the extension is removed from the name, and:

  • part 1 is the project code; an arbitrary set of letters (including diacritics, Ã, Â, etc) and numbers.



  • part 2 is the discipline that the file relates to.



  • part 3 is the project phase.



  • part 4 is a 4-digit document number in the format of xxxx (0001, 0002, etc...)



  • part 5 is the subject that the document relates to.



  • part 6 is the floor that the project relates to.



  • part 7 is the revision number; the format is RXX (R00, R01, etc...)



  • parts must be in said order.



Parts 2, 3, 5, and 6 must each be an abbreviation in a predefined set of values. Validating them is a simple matter of looking up if the abbreviate exists.

I wrote a single class for each part. For the sake of brevity I included only one class out of four. But assume all four are identical. The only difference is the constant of acceptable abbreviations.

```
class FileName
attr_reader :name

def self.valid?(name)
new(name).valid?
end

# Valid file name
# ABCD-ARQ-AP-0022-ACS-LOC-R00.jpeg

def initialize(name)
# Split individual parts into an array, ignoring .extension
@name = name.split('.').first.split('-')
end

def valid?
name.length == 7 &&
project_code_valid? &&
discipline_valid? &&
phase_valid? &&
document_number_valid? &&
subject_valid? &&
level_valid? &&
revision_valid?
end

def project_code
@project_code ||= name[0]
end

def project_code_valid?
project_code !~ /\P{Alnum}/ && project_code.length == 4
end

def discipline
@discipline ||= name[1]
end

def discipline_valid?
Discipline.value_valid?(discipline)
end

def phase
@phase

Solution

I don't know if you really need to create classes for checking each of the substrings in the file name prefix. After all, there are only two types of checks that need to be made: against a list or matching a regex. Consider a simple, straighforward approach like this:

FNAME_SECTION = [
  {offset: 0, name: "Project code"   , regex: /^\p{Alnum}{4}$/ },
  {offset: 1, name: "Discipline"     , list: ['ACE', 'ARQ']    },
  {offset: 2, name: "Project phase"  , list: ['AP', 'BP']      },
  {offset: 3, name: "Document number", regex: /^\d{4}$/        }, 
  {offset: 4, name: "Subject"        , list: ['ACS', 'BCS']    },
  {offset: 5, name: "Level"          , list: ['LOC', 'KOV']    },
  {offset: 6, name: "Revision"       , regex: /^R\d{2}$/       }     
  ]


.

def fname_valid?(fname)
  @groups = fname.split('.').first.split('-')
  if @groups.size != FNAME_SECTION.size
    puts "Filename should have #{FNAME_SECTION.size} groups, but has #{@groups.size}"
    return nil
  end      

  err = []

  FNAME_SECTION.each_with_index do |h,i|
    str = @groups[h[:offset]] 
    if h.key?(:list)
      err << i unless h[:list].include?(str)
    elsif h.key?(:regex)
      err << i unless str =~ h[:regex]
    else
      err << i
    end 
  end

  if err.empty?
    puts "File name prefix is valid"
    return true
  end

  puts "File name prefix is invalid"
  err.each {|i| puts loc_msg(i)}
  return false
end


.

private

def loc_msg(i)
  "  Error in group offset #{FNAME_SECTION[i][:offset]} (#{FNAME_SECTION[i][:name]})"
end


.

fname_valid?('ABCD-ARQ-AP-0022-ACS-LOC-R00.jpeg')
  # File name prefix is valid
fname_valid?('ABC7-ACE-CP-002a-BCS-LOc-R000.jpeg')
  # File name prefix is invalid
  #   Error in group offset 2 (Project phase)
  #   Error in group offset 3 (Document number)
  #   Error in group offset 5 (Level)
  #   Error in group offset 6 (Revision)
fname_valid?('ABCD-ARQ-AP-0022-ACS-LOC.jpeg')
  # Filename prefix should have 7 groups, but has 6


The way I've displayed the error messages may not be what you want, but that would not be difficult to change. Note that, when a file name has an invalid format, I've listed all the reasons it is invalid.

When matching a substring against a regex, notice that the length of the substring is checked by including start/end anchors and avoiding the use of re+, re* and re?.

For validity checks that involve a list of possible values, I've made the list an array of the values from your hashes, as the keys did not appear to be used. If the keys are needed, those arrays could be replaced with hashes.

A potential problem with this approach is that it's not very flexible. If, for example, a validity check were changed to involve something other than matching a list or a regex, it might be difficult to alter the code to accommodate it.

I initially considered a different approach that offered greater flexibility. It retained the array of hashes, FNAME_SECTION, possibly changed somewhat, but also had a module that looked something like this:

module CustomValidityChecks
  def document_number_valid?
    ...
  end

  def revision_valid?
    ...
  end
end


This module contains the validity checks that could not be done from the information in FNAME_SECTION alone. The following is executed in the main class, when it is parsed:

@custom_validity_checks = CustomValidityChecks.instance_methods(false)


This saves all those methods in the class instance variable @custom_validity_checks. One could then use the earlier approach to make the validity checks that draw only on the information in FNAME_SECTION, and cycle through @custom_validity_checks to perform the others:

@custom_validity_checks.each { |m| send(m) }


Note that methods can be added to or deleted from the module (or renamed), with no need to alter any of the other code.

A variant of this approach would be create a subclass of the main class for each of these custom checks, and then use the hook Class#inherited to build the array @custom_validity_checks.

Code Snippets

FNAME_SECTION = [
  {offset: 0, name: "Project code"   , regex: /^\p{Alnum}{4}$/ },
  {offset: 1, name: "Discipline"     , list: ['ACE', 'ARQ']    },
  {offset: 2, name: "Project phase"  , list: ['AP', 'BP']      },
  {offset: 3, name: "Document number", regex: /^\d{4}$/        }, 
  {offset: 4, name: "Subject"        , list: ['ACS', 'BCS']    },
  {offset: 5, name: "Level"          , list: ['LOC', 'KOV']    },
  {offset: 6, name: "Revision"       , regex: /^R\d{2}$/       }     
  ]
def fname_valid?(fname)
  @groups = fname.split('.').first.split('-')
  if @groups.size != FNAME_SECTION.size
    puts "Filename should have #{FNAME_SECTION.size} groups, but has #{@groups.size}"
    return nil
  end      

  err = []

  FNAME_SECTION.each_with_index do |h,i|
    str = @groups[h[:offset]] 
    if h.key?(:list)
      err << i unless h[:list].include?(str)
    elsif h.key?(:regex)
      err << i unless str =~ h[:regex]
    else
      err << i
    end 
  end

  if err.empty?
    puts "File name prefix is valid"
    return true
  end

  puts "File name prefix is invalid"
  err.each {|i| puts loc_msg(i)}
  return false
end
private

def loc_msg(i)
  "  Error in group offset #{FNAME_SECTION[i][:offset]} (#{FNAME_SECTION[i][:name]})"
end
fname_valid?('ABCD-ARQ-AP-0022-ACS-LOC-R00.jpeg')
  # File name prefix is valid
fname_valid?('ABC7-ACE-CP-002a-BCS-LOc-R000.jpeg')
  # File name prefix is invalid
  #   Error in group offset 2 (Project phase)
  #   Error in group offset 3 (Document number)
  #   Error in group offset 5 (Level)
  #   Error in group offset 6 (Revision)
fname_valid?('ABCD-ARQ-AP-0022-ACS-LOC.jpeg')
  # Filename prefix should have 7 groups, but has 6
module CustomValidityChecks
  def document_number_valid?
    ...
  end

  def revision_valid?
    ...
  end
end

Context

StackExchange Code Review Q#44223, answer score: 9

Revisions (0)

No revisions yet.