snippetMinor
Convert a mediawiki to dokuwiki script
Viewed 0 times
convertdokuwikiscriptmediawiki
Problem
I have quickly wrote a Perl script to convert a mediawiki to a dokuwiki. I know there's an existing script to make that, but I would have a simpler usage code and some use of the API from mediawiki. It was also a good occassion to use pQuery module. I know I can improve some things and I'm going to make a generic command and use Moo and MooX::Options for the next version. I may also write a parser to convert the syntax, but I did make a first version quickly and an easy script some can easily maintain for no Perl programmers.
I would like to know what you think about the code:
```
#!/usr/bin/env perl
use strict;
use warnings;
use IO::File;
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use MediaWiki::API;
use pQuery;
use Text::Unidecode;
sub generate_file_name {
my $title = shift;
$title =~ s/\s/_/g;
$title =~s/'/_/g;
return unidecode($title) . '.txt';
}
sub set_content_file {
my ( $file, $content ) = @_;
my $fh = IO::File->new($file, "w");
if ( defined($fh) ) {
print $fh $content;
undef $fh;
}
return;
}
sub mediawiki_login {
my $mediawiki = MediaWiki::API->new({
api_url => 'http://vim-fr.org/api.php'
});
$mediawiki->login({
lgname => 'login',
lgpassword => 'mypassword'
}) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details};
return $mediawiki;
}
my $mediawiki = mediawiki_login();
my $all_articles = $mediawiki->list({
action => 'query',
list => 'allpages'
});
my $user_agent = LWP::UserAgent->new();
foreach my $page (@{$all_articles}) {
my $article = $mediawiki->get_page({
title => $page->{title}
});
my $request = POST 'http://johbuc6.coconia.net/mediawiki2dokuwiki.php', [ mediawiki => $article->{'*'} ];
my $response = $user_agent->request($request);
my $file = generate_file_name($page->{title});
print 'Export de la page ' . $page->{title} . " dans le fichier $fil
I would like to know what you think about the code:
```
#!/usr/bin/env perl
use strict;
use warnings;
use IO::File;
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use MediaWiki::API;
use pQuery;
use Text::Unidecode;
sub generate_file_name {
my $title = shift;
$title =~ s/\s/_/g;
$title =~s/'/_/g;
return unidecode($title) . '.txt';
}
sub set_content_file {
my ( $file, $content ) = @_;
my $fh = IO::File->new($file, "w");
if ( defined($fh) ) {
print $fh $content;
undef $fh;
}
return;
}
sub mediawiki_login {
my $mediawiki = MediaWiki::API->new({
api_url => 'http://vim-fr.org/api.php'
});
$mediawiki->login({
lgname => 'login',
lgpassword => 'mypassword'
}) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details};
return $mediawiki;
}
my $mediawiki = mediawiki_login();
my $all_articles = $mediawiki->list({
action => 'query',
list => 'allpages'
});
my $user_agent = LWP::UserAgent->new();
foreach my $page (@{$all_articles}) {
my $article = $mediawiki->get_page({
title => $page->{title}
});
my $request = POST 'http://johbuc6.coconia.net/mediawiki2dokuwiki.php', [ mediawiki => $article->{'*'} ];
my $response = $user_agent->request($request);
my $file = generate_file_name($page->{title});
print 'Export de la page ' . $page->{title} . " dans le fichier $fil
Solution
This looks fairly good, which is why I'll be fairly harsh.
Even with single arguments, I'd suggest using a line like
The two substitutions can be performed in one go:
But why are you even using
The first problem is the word order in this function's name, it should be
Using the object oriented
But there is one problem: While a file may only contain bytes, the input string may contain Unicode: You have to encode the string either explicitly, or through a PerlIO encoding layer.
The
Instead of ignoring any errors when opening a file handle fails, you should probably terminate an error or at least output a warning message.
Performing the
We should not overwrite a file if our normalization produces the same name twice. Instead: detect this error, and at the very least produce an error message.
Does this function add any abstraction to the code? I think not, considering also its proximity to its only point of use.
main code
For some reason, you are using
You should also at least check for success:
generate_file_nameEven with single arguments, I'd suggest using a line like
my ($title) = @_; rather than shift: Using @_ is the dominant idiom, and it will be less confusing for non-Perl programmers.The two substitutions can be performed in one go:
s/[\s']/_/g – but why are you removing unacceptable characters before the unidecode step, which might introduce them. I find it interesting that you do not prevent characters like ", \, /. It might be more sensible to specifically allow a certain set of characters rather than disallowing some other set (where you are bound to forget something important).But why are you even using
Text::Unidecode? Not only is this module extremely limited in what it can do, but most filesystems also have some degree of support for Unicode filenames. If we ignore this last point, I would write:sub generate_file_name {
my ($title) = @_;
my $ascii_title = unidecode($title); # transliterate Unicode
$ascii_title =~ s/[^a-zA-z0-9.-]+/_/g; # remove unwanted characters
return $ascii_title;
}set_content_fileThe first problem is the word order in this function's name, it should be
set_file_contents. But this still isn't a terribly good name, I'd rather say write_file. By the way, a function with this name and functionality is already provided by File::Slurp, but we might as well write it ourselves.Using the object oriented
IO::File instead of the builtin open is OK. I don't like it, but it probably makes it easier for non-Perl programmers to understand.But there is one problem: While a file may only contain bytes, the input string may contain Unicode: You have to encode the string either explicitly, or through a PerlIO encoding layer.
The
undef $fh is highly unusual. The $fh is a lexical variable – once the variable goes out of scope, the contained data's refcount is decremented and the object destroyed. Destroying a file handle will close it. undef $fh resets the scalar, which will effectively do the same thing. So it is a very confusing way of doing nothing, or saying $fh->close. Leave it.Instead of ignoring any errors when opening a file handle fails, you should probably terminate an error or at least output a warning message.
Performing the
generate_file_name transformation should happen inside this code, the details of storage are irrelevant for the other code.We should not overwrite a file if our normalization produces the same name twice. Instead: detect this error, and at the very least produce an error message.
sub write_file {
my ($file, $contents) = @_;
# technically, we have a concurrency bug here between the `-e` and the `open`, but I'd ignore it.
die qq(The file "$file" already exists. Will not overwrite) if -e $file;
open my $fh, "html);
}
catch {
warn $_;
};mediawiki_loginDoes this function add any abstraction to the code? I think not, considering also its proximity to its only point of use.
main code
For some reason, you are using
HTTP::Request::Common instead of the available LWP::UserAgent methods (which admittedly are just a wrapper).my $response = $user_agent->post('http://johbuc6.coconia.net/mediawiki2dokuwiki.php', [ mediawiki => $article->{'*'} ]);You should also at least check for success:
if (not $response->is_success) {
warn qq(Request for article "$page->{title}" failed: ), $response->status_line;
next;
}Code Snippets
sub generate_file_name {
my ($title) = @_;
my $ascii_title = unidecode($title); # transliterate Unicode
$ascii_title =~ s/[^a-zA-z0-9.-]+/_/g; # remove unwanted characters
return $ascii_title;
}sub write_file {
my ($file, $contents) = @_;
# technically, we have a concurrency bug here between the `-e` and the `open`, but I'd ignore it.
die qq(The file "$file" already exists. Will not overwrite) if -e $file;
open my $fh, "<:utf8", $file or die qq(Can't open "$file": $!);
print { $fh } $contents;
return;
}
...
use Try::Tiny;
...
try {
write_file($title, pQuery($_)->html);
}
catch {
warn $_;
};my $response = $user_agent->post('http://johbuc6.coconia.net/mediawiki2dokuwiki.php', [ mediawiki => $article->{'*'} ]);if (not $response->is_success) {
warn qq(Request for article "$page->{title}" failed: ), $response->status_line;
next;
}Context
StackExchange Code Review Q#44707, answer score: 4
Revisions (0)
No revisions yet.