HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Deleting most recent files by parsing filename

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
deletingrecentfilenamefilesparsingmost

Problem

I have hundreds of .mp3 files in a single directory of the same naming format, title_YYYY-MM-DD.mp3, with maybe 30 different titles.
Here is an example two different titles:

vision_am_2015-08-04.mp3
vision_am_2015-08-03.mp3
vision_am_2015-07-31.mp3
vision_am_2015-07-30.mp3

lum_pro_2015-08-04.mp3
lum_pro_2015-08-03.mp3
lum_pro_2015-08-01.mp3
lum_pro_2015-07-31.mp3
lum_pro_2015-07-30.mp3
lum_pro_2015-07-29.mp3
lum_pro_2015-07-28.mp3
lum_pro_2015-07-27.mp3


I need to keep X number of most recent files for each title. I figured that since the date format is YYYY-MM-DD, after building a data structure for the files, I can make sure the files are sorted in descending order. Then iterate through them. Then safely delete with confidence each file after the Xth iteration.

Here is my idea:

my $num_to_keep = 2; # or get from @ARGV
$num_to_keep = $num_to_keep - 1;

my $dir = "/home/mp3files";
opendir my $DH, "$dir" or die "$! not open";
my $dateRE = qr/\d{4}-\d{2}-\d{2}/;
my $fileRE = qr/^.+_$dateRE\.mp3$/; # only mp3s
my @files = sort grep {/$fileRE/ && -f "$dir/$_"} readdir($DH);
close $DH;

my %hash = ();
for my $file (reverse @files) {
    my ($fname) = $file =~ m/(.*)?_$dateRE/;
    push(@{ $hash{$fname} }, $file);
}

for my $fname (sort keys %hash) {
    my @files = @{$hash{$fname}};
    print "\n\n\nFILE: $fname $num_to_keep) {
            unlink "$dir/$files[$i]";   
        }else{
            print "\t\t\t\tI will keep this file $files[$i]\n";
        }
    }
}


This is working as I expected, but since I am using this to delete large numbers of files regularly, I would like an expert take on this. I do not want to accidentally delete wrong files. Plus, I am interested in any general improvements or more elegant solutions.

Solution

You can reduce number of loops, sorts, and matches, so this should perform faster,

my $num_to_keep = 2;

my $dir = "/home/mp3files";
opendir my $DH, $dir or die "$! $dir";

# only mp3s
my $fileRE = qr/(.+) _ (\d{4}-\d{2}-\d{2}) \.mp3$/x;

my %count;
# files to delete
my @files = map {
  my $basename = $_->[1];
  (++$count{$basename} > $num_to_keep) ? $_->[0] : ();
}
sort {
  $b->[2] cmp $a->[2] # sort descending by date
}
map {
  my @match = /$fileRE/;
  (@match && -f "$dir/$_") ? ["$dir/$_", @match] : ();
}
readdir($DH);

close $DH;
unlink(@files);

Code Snippets

my $num_to_keep = 2;

my $dir = "/home/mp3files";
opendir my $DH, $dir or die "$! $dir";

# only mp3s
my $fileRE = qr/(.+) _ (\d{4}-\d{2}-\d{2}) \.mp3$/x;

my %count;
# files to delete
my @files = map {
  my $basename = $_->[1];
  (++$count{$basename} > $num_to_keep) ? $_->[0] : ();
}
sort {
  $b->[2] cmp $a->[2] # sort descending by date
}
map {
  my @match = /$fileRE/;
  (@match && -f "$dir/$_") ? ["$dir/$_", @match] : ();
}
readdir($DH);

close $DH;
unlink(@files);

Context

StackExchange Code Review Q#99049, answer score: 4

Revisions (0)

No revisions yet.