patternModerate

Copy directories while changing Unicode filenames to ASCII

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

whileunicodecopyfilenamesasciichangingdirectories

Problem

I created a short Perl 6 script copyfnameascii.pl to copy a file hierarchy I have, applying an URL Decoding to the names of folders and removing non-ASCII characters from the names of files.

$ cd workingdirectory
$ perl6-m ../copyfnameascii.pl project copy

It is best illustrated by the following diagram:

$ tree
                             .
                             ├── copy
                             │   ├── folder 1
$ tree                       │   │   ├── cr.txt
.                            │   │   └── es.txt
└── project                  │   └── folder 2
    ├── folder%201           │       ├── ai.txt
    │   ├── čř.txt           │       └── zy.txt
    │   └── ěš.txt   ~*>     └── project
    └── folder%202               ├── folder%201
        ├── áí.txt               │   ├── čř.txt
        └── žý.txt               │   └── ěš.txt
                                 └── folder%202
                                     ├── áí.txt
                                     └── žý.txt

I just started using Perl. I'd like to hear your advice how to make the code generally better; more readable, idiomatic, and so on. The script does work; there is just a few things I am not satisfied with.

```
use Inline::Perl5;
my $p5 = Inline::Perl5.new;
$p5.use('Text::Unidecode');

sub transformdirname($dirname is copy) {
$dirname.=subst( /:i \%(**2) /, { chr(:16(~$0)) }, :g ); # undo URL encoding
$dirname.=subst( /\//, '_', :g ); # make it a valid UNIX filename
return $dirname;
}

sub transformfilename($filename) {
return $p5.call('Text::Unidecode::unidecode', $filename); # make it ASCII at all costs
}

sub MAIN($fromdir, $todir) {
for dir($fromdir) -> $subdir {
unless $subdir ~~ :d {
next;
}
my $tosubdir = $todir ~ '/' ~ transformdirname($subdir.basename);
say $tosubdir;
mkdir($tosubdir);
for dir($subdir) -> $file {
my $tofile = $tosubdir ~ '/' ~ transformfilename($file.basename);
if $tofile.path ~~ :e {

Solution

My overall take was "Nice!". :)

Given your exchange with Patrick J S above (not to mention my lack of familiarity with the wrinkles discussed in the relevant RFCs) I'm not going to comment at all on the technical-compliance-with-standards aspects of your decoding logic.

Simplify Inline::Perl5 code

use Inline::Perl5;
my $p5 = Inline::Perl5.new;
$p5.use('Text::Unidecode');
.
.
sub transformfilename($filename) {
  return $p5.call('Text::Unidecode::unidecode', $filename);
}

That was old style. Write this way instead:

use Text::Unidecode:from;

sub transformfilename($filename) {
  return Text::Unidecode::unidecode($filename);
}

Don't repeat yourself?

sub transformdirname($dirname is copy) {
  $dirname.=subst( /:i \%(**2) /, { chr(:16(~$0)) }, :g );
  $dirname.=subst( /\//, '_', :g );
  return $dirname;
}

This has 'dirname' in the sub's name plus four mentions of $dirname. Sometimes repeats are valuable, perhaps making code easier to read or change, but in this case I think the reverse applies. I'd write:

sub transformdirname($_ is copy) {
  .=subst( /:i \%(**2) /, { chr(:16(~$0)) }, :g )
  .=subst( /\//, '_', :g )
}

transformfilename has a similar issue:

sub transformfilename($filename) {
  return $p5.call('Text::Unidecode::unidecode', $filename);
}

('filename' is written three times.) In addition it's a one line routine introducing the abstract "transform" and I'm going to assume the transformation is unlikely to change, so abstracting it out to a sub is questionable. In addition there's another several 'file's in the call so instead I'd elide the sub altogether and eliminate all the repeats of 'file'. We'll see that later in the final version where I've made some additional changes.

Comments

Imo good code means good comments. You've written good comments like:

sub transformdirname($dirname is copy) {
  $dirname.=subst( /:i \%(**2) /, { chr(:16(~$0)) }, :g ); # undo URL encoding
  $dirname.=subst( /\//, '_', :g ); # make it a valid UNIX filename
  return $dirname;
}

In Raku you can effortlessly turn such ordinary end-of-line comments (that start with #) into Pod declarator comments (that start with #| or #= and appear immediately before or after a declaration):

#| undo URL encoding and make it a valid UNIX filename
sub transformdirname($_ is copy) {
  .=subst( /:i \%(**2) /, { chr(:16(~$0)) }, :g );
  .=subst( /\//, '_', :g );
}

Now documentation tools can automatically extract documentation of the sub because this:

say &transformdirname.WHY;

now displays:

undo URL encoding and make it a valid UNIX filename

(Why is the method that extracts such doc called WHY? It's to remind folk that by far the most important thing to explain in code comments is WHY -- why the code has been written and/or why it's written the way it is.)

I note you've used MAIN:

sub MAIN($fromdir, $todir) {

As you know, this automatically generates a nice usage message. What you may not know is that this can be combined with the Pod declarator blocks discussed above to great effect. Checkout The Cool subset of MAIN for examples.

Final version

I've made many more changes to be the way I write things. As I wrote at the start, your original code looks nice and many coding issues are subjective.

use Text::Unidecode:from;

sub MAIN ($fromdir, $todir) {

  for $fromdir.dir {

    next unless .d;

    say my $tosubdir = "$todir/&transformdirname(.basename)";
    mkdir $tosubdir;

    for .dir {

      # make filename ASCII at all costs:
      my $tofile = "$tosubdir/{.basename.&Text::Unidecode::unidecode}";

      if $tofile.path.e { die "Will not overwrite $tofile" }

      copy $_, $tofile;

    }
  }
}

# undo URL encoding and make it a valid UNIX filename
sub transformdirname($_ is copy) {
  .=subst( /'%' (**2)/, { chr(:16(~$0)) }, :g )
  .=subst( /\//, '_', :g )
}

Code Snippets

use Inline::Perl5;
my $p5 = Inline::Perl5.new;
$p5.use('Text::Unidecode');
.
.
sub transformfilename($filename) {
  return $p5.call('Text::Unidecode::unidecode', $filename);
}

use Text::Unidecode:from<Perl5>;

sub transformfilename($filename) {
  return Text::Unidecode::unidecode($filename);
}

sub transformdirname($dirname is copy) {
  $dirname.=subst( /:i \%(<[0..9A..F]>**2) /, { chr(:16(~$0)) }, :g );
  $dirname.=subst( /\//, '_', :g );
  return $dirname;
}

sub transformdirname($_ is copy) {
  .=subst( /:i \%(<[0..9A..F]>**2) /, { chr(:16(~$0)) }, :g )
  .=subst( /\//, '_', :g )
}

sub transformfilename($filename) {
  return $p5.call('Text::Unidecode::unidecode', $filename);
}

Context

StackExchange Code Review Q#71918, answer score: 11

Revisions (0)

No revisions yet.