HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Convert a mediawiki to dokuwiki script

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
convertdokuwikiscriptmediawiki

Problem

I have quickly wrote a Perl script to convert a mediawiki to a dokuwiki. I know there's an existing script to make that, but I would have a simpler usage code and some use of the API from mediawiki. It was also a good occassion to use pQuery module. I know I can improve some things and I'm going to make a generic command and use Moo and MooX::Options for the next version. I may also write a parser to convert the syntax, but I did make a first version quickly and an easy script some can easily maintain for no Perl programmers.

I would like to know what you think about the code:

```
#!/usr/bin/env perl

use strict;
use warnings;

use IO::File;

use HTTP::Request::Common qw(POST);
use LWP::UserAgent;

use MediaWiki::API;
use pQuery;
use Text::Unidecode;

sub generate_file_name {
my $title = shift;

$title =~ s/\s/_/g;
$title =~s/'/_/g;

return unidecode($title) . '.txt';
}

sub set_content_file {
my ( $file, $content ) = @_;

my $fh = IO::File->new($file, "w");
if ( defined($fh) ) {
print $fh $content;

undef $fh;
}

return;
}

sub mediawiki_login {
my $mediawiki = MediaWiki::API->new({
api_url => 'http://vim-fr.org/api.php'
});

$mediawiki->login({
lgname => 'login',
lgpassword => 'mypassword'
}) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details};

return $mediawiki;
}

my $mediawiki = mediawiki_login();
my $all_articles = $mediawiki->list({
action => 'query',
list => 'allpages'
});
my $user_agent = LWP::UserAgent->new();

foreach my $page (@{$all_articles}) {
my $article = $mediawiki->get_page({
title => $page->{title}
});

my $request = POST 'http://johbuc6.coconia.net/mediawiki2dokuwiki.php', [ mediawiki => $article->{'*'} ];
my $response = $user_agent->request($request);
my $file = generate_file_name($page->{title});

print 'Export de la page ' . $page->{title} . " dans le fichier $fil

Solution

This looks fairly good, which is why I'll be fairly harsh.

generate_file_name

Even with single arguments, I'd suggest using a line like my ($title) = @_; rather than shift: Using @_ is the dominant idiom, and it will be less confusing for non-Perl programmers.

The two substitutions can be performed in one go: s/[\s']/_/g – but why are you removing unacceptable characters before the unidecode step, which might introduce them. I find it interesting that you do not prevent characters like ", \, /. It might be more sensible to specifically allow a certain set of characters rather than disallowing some other set (where you are bound to forget something important).

But why are you even using Text::Unidecode? Not only is this module extremely limited in what it can do, but most filesystems also have some degree of support for Unicode filenames. If we ignore this last point, I would write:

sub generate_file_name {
    my ($title) = @_;
    my $ascii_title = unidecode($title);    # transliterate Unicode
    $ascii_title =~  s/[^a-zA-z0-9.-]+/_/g;  # remove unwanted characters
    return $ascii_title;
}


set_content_file

The first problem is the word order in this function's name, it should be set_file_contents. But this still isn't a terribly good name, I'd rather say write_file. By the way, a function with this name and functionality is already provided by File::Slurp, but we might as well write it ourselves.

Using the object oriented IO::File instead of the builtin open is OK. I don't like it, but it probably makes it easier for non-Perl programmers to understand.

But there is one problem: While a file may only contain bytes, the input string may contain Unicode: You have to encode the string either explicitly, or through a PerlIO encoding layer.

The undef $fh is highly unusual. The $fh is a lexical variable – once the variable goes out of scope, the contained data's refcount is decremented and the object destroyed. Destroying a file handle will close it. undef $fh resets the scalar, which will effectively do the same thing. So it is a very confusing way of doing nothing, or saying $fh->close. Leave it.

Instead of ignoring any errors when opening a file handle fails, you should probably terminate an error or at least output a warning message.

Performing the generate_file_name transformation should happen inside this code, the details of storage are irrelevant for the other code.

We should not overwrite a file if our normalization produces the same name twice. Instead: detect this error, and at the very least produce an error message.

sub write_file {
    my ($file, $contents) = @_;
    # technically, we have a concurrency bug here between the `-e` and the `open`, but I'd ignore it.
    die qq(The file "$file" already exists. Will not overwrite) if -e $file;
    open my $fh, "html);
}
catch {
    warn $_;
};


mediawiki_login

Does this function add any abstraction to the code? I think not, considering also its proximity to its only point of use.

main code

For some reason, you are using HTTP::Request::Common instead of the available LWP::UserAgent methods (which admittedly are just a wrapper).

my $response = $user_agent->post('http://johbuc6.coconia.net/mediawiki2dokuwiki.php', [ mediawiki => $article->{'*'} ]);


You should also at least check for success:

if (not $response->is_success) {
    warn qq(Request for article "$page->{title}" failed: ), $response->status_line;
    next;
}

Code Snippets

sub generate_file_name {
    my ($title) = @_;
    my $ascii_title = unidecode($title);    # transliterate Unicode
    $ascii_title =~  s/[^a-zA-z0-9.-]+/_/g;  # remove unwanted characters
    return $ascii_title;
}
sub write_file {
    my ($file, $contents) = @_;
    # technically, we have a concurrency bug here between the `-e` and the `open`, but I'd ignore it.
    die qq(The file "$file" already exists. Will not overwrite) if -e $file;
    open my $fh, "<:utf8", $file or die qq(Can't open "$file": $!);
    print { $fh } $contents;
    return;
}

...

use Try::Tiny;

...

try {
    write_file($title, pQuery($_)->html);
}
catch {
    warn $_;
};
my $response = $user_agent->post('http://johbuc6.coconia.net/mediawiki2dokuwiki.php', [ mediawiki => $article->{'*'} ]);
if (not $response->is_success) {
    warn qq(Request for article "$page->{title}" failed: ), $response->status_line;
    next;
}

Context

StackExchange Code Review Q#44707, answer score: 4

Revisions (0)

No revisions yet.