patternMinor
Data structure to RDF
Viewed 0 times
datastructurerdf
Problem
I believe any data structure can be converted to RDF, and I wrote the
subroutine below which sort of does this, at least for arbitrarily
complex hashes.
However, I'm unhappy with it because:
-
-
Where I comment "hopefully it's referring to a scalar at this
point!", I'm not actually convinced. Are there better ways to check?
-
it as a parameter, but is there an easier way?
Other notes:
-
In my specific use case, I'm using
convert JSON data to a hash, which I then feed to the subroutine
below. I then print out
-
subroutine doesn't. Is there another "prepackaged" solution I could
use?
-
I realize what I'm doing is a very minimal form of RDF.
`=item hash2rdf($ref,$name="")
Given a scalar/array/hash reference $ref, returns RDF triplets recursively.
If $name is given, assigns name $name to $ref. Otherwise, creates a name
Pretty much does what var_dump() does, only in RDF
=cut
sub hash2rdf {
my($ref,$name) = @_;
my($type) = ref($ref);
# if no type at all, just return self (after cleanup)
unless ($type) {
$ref=~s/\n/\r/isg;
$ref=~s/\"/\\"/isg;
return $ref;
}
# name I will give $ref if not given
unless ($name) {$name = "REF".++$hash2rdf_count;}
# TODO: making $hash2rdf_count global is bad
# TODO: making @triplets global is unacceptably bad (but just testing now)
# my(@triplets);
if ($type eq "ARRAY") {
# interim var
my(@l) = @{$ref};
# push triplets for my children
for $i (0..$#l) {push(@triplets, [$name, $i, hash2rdf($l[$i])]);}
# return the name I gav
subroutine below which sort of does this, at least for arbitrarily
complex hashes.
However, I'm unhappy with it because:
-
$hash2rdf_count should not be global. I don't see an easy fix for this. -
Where I comment "hopefully it's referring to a scalar at this
point!", I'm not actually convinced. Are there better ways to check?
-
@triplets should not be global. I think I can fix this by passing it as a parameter, but is there an easier way?
Other notes:
-
In my specific use case, I'm using
JSON::from_json($data) to convert JSON data to a hash, which I then feed to the subroutine
below. I then print out
@triplets and create an SQLite3 DB out of it. So it's really: JSON2hash, hash2rdf, rdf2sql. The entire program: https://github.com/barrycarter/bcapps/blob/master/bc-get-discogs.pl-
JSON::GRDDL failed with an out-of-memory error on my data. My subroutine doesn't. Is there another "prepackaged" solution I could
use?
-
I realize what I'm doing is a very minimal form of RDF.
`=item hash2rdf($ref,$name="")
Given a scalar/array/hash reference $ref, returns RDF triplets recursively.
If $name is given, assigns name $name to $ref. Otherwise, creates a name
Pretty much does what var_dump() does, only in RDF
=cut
sub hash2rdf {
my($ref,$name) = @_;
my($type) = ref($ref);
# if no type at all, just return self (after cleanup)
unless ($type) {
$ref=~s/\n/\r/isg;
$ref=~s/\"/\\"/isg;
return $ref;
}
# name I will give $ref if not given
unless ($name) {$name = "REF".++$hash2rdf_count;}
# TODO: making $hash2rdf_count global is bad
# TODO: making @triplets global is unacceptably bad (but just testing now)
# my(@triplets);
if ($type eq "ARRAY") {
# interim var
my(@l) = @{$ref};
# push triplets for my children
for $i (0..$#l) {push(@triplets, [$name, $i, hash2rdf($l[$i])]);}
# return the name I gav
Solution
I'm not sure what you're trying to do, so I'll just review the code itself, not what you're trying to do.
I looked into the whole script on GitHub and was shocked to see that you didn't
I think it's very nice that you've used POD documentation for your script.
Let's look at this excerpt for a moment:
Suggested refactoring:
Now let's take a look at
-
Never use the
or
-
If you only want to overwrite
What to do about the
A better solution (since perl 5.10) is to use state variables:
A state variable belongs to a certain scope like
However, it may be better to take this as another argument (will be discussed later).
By now we know how a line like
With a statement modifier:
Without:
And then there is a variant with
The same holds for the
Note that I'm not using
How can we solve the problem that
This means that we have to change the way how we call ourselves when recursing:
On the outside, we call the subroutine like
Eww, that is ugly (out-arguments always bother me). Let's rename
```
sub ref2rdf {
my ($data, $name) = @_;
my $count = 0;
my @triplets;
ref2rdf_inner($data, \@triplets, $name, \$count);
return @triplets;
}
sub ref2rdf_inner {
my ($ref, $triplets, $name, $count_ref) = @_;
...
$name //= "REF" . ++$$count_ref;
...
ref2rdf_inner($ref->[$i], $triple
I looked into the whole script on GitHub and was shocked to see that you didn't
use strict; use warnings. This would usually abort any review, but let's continue for the sake of the argument.I think it's very nice that you've used POD documentation for your script.
Let's look at this excerpt for a moment:
unless ($type) {
$ref=~s/\n/\r/isg;
$ref=~s/\"/\\"/isg;
return $ref;
}ref $foocan be false, but a reference (consider$foo = bless \$bar, "0"). Use eitherlength ref $fooorreftype $foo, wherereftypeis imported fromScalar::Util.
unlessis widely considered to be half a design error.if (not ...)is cleared most of the time.
- On each of your regexes you specify the flags
/isg.
- The
/idoes case insensitive matching but you don't match anything that has a case. As this flag reduces performance, you should use it sparingly.
- The
/sflag causes the.to match any character including newlines. It is generally a good idea to use this flag (the default behavior of.is better written as[^\n]). However, you do not use the.metacharacter, so/sis useless here.
- In the pattern,
"does not have to be escaped as it isn't a metacharacter.
- Use proper spacing around operators (usually a single space) to improve readability.
Suggested refactoring:
if (not length $type) {
$ref =~ s/\n/\r/g;
$ref =~ s/"/\\"/g;
return $ref;
}Now let's take a look at
unless ($name) {$name = "REF".++$hash2rdf_count;}:- As per Larry's style guide, you shouldn't use a semicolon if the closing brace is on the same line.
-
Never use the
conditional (...) {BLOCK} form on a single line. Either put the statement on a line of its own, or use the statement modifier form:$name = "REF" . ++$hash2rdf_count unless length $name;or
if (not length $name) {
$name = "REF" . ++$hash2rdf_count;
}-
If you only want to overwrite
$name if it is undef, you could either test for definedness: if (not defined $name) or since perl 5.10, you can use the defined-or operator //:$name //= "REF" . ++$hash2rdf_count;What to do about the
$has2rdf_count? The simplest solution that does not involve globals is to declare it outside of the subroutine:my $hash2rdf_count = 0;
sub hash2rdf {A better solution (since perl 5.10) is to use state variables:
use feture 'state';
...
state $hash2rdf_count = 0;
$name //= "REF" . ++$hash2rdf_count;A state variable belongs to a certain scope like
my variables, but is initialized only at the first execution.However, it may be better to take this as another argument (will be discussed later).
my(@l) = @{$ref} there is no need to perform a copy of the array, and even if you feel the need to do that, please don't use a bad variable name like @l. Especially the charcterl l should be avoided on its own. Also, variable names don't have to be so short. Given an array reference, we can access an entry like $ref->[$i] and get the highest index like $#$ref (or with curlies: $#{$ref}).By now we know how a line like
for $i (0..$#l) {push(@triplets, [$name, $i, hash2rdf($l[$i])]);} can be improved. But I would like you to call builtins like push without parens: They aren't subroutines, but operators.With a statement modifier:
push @triplets, [$name, $_, hash2rdf($ref->[$_])] for 0 .. $#$ref;Without:
for my $i (0 .. $#ref) {
push @triplets, [$name, $i, hash2rdf($ref->[$i])];
}And then there is a variant with
map:push @triplets, map { [$name, $_, hash2rdf($ref->[$_])] } 0 .. $#$ref;The same holds for the
HASH-reference case. Recommended refactoring there:for my $key (keys %$ref) {
push @triplets, [$name, $key, hash2rdf($ref->{$key})];
}Note that I'm not using
$i here, as the hash key won't generally be an integer.How can we solve the problem that
@triplets is a global? We take a reference to that array as another argument:sub hash2rdf {
my ($ref, $triplets, $name) = @_;
...
push @$triplets, ...This means that we have to change the way how we call ourselves when recursing:
hash2rdf($ref->[$i], $triplets)On the outside, we call the subroutine like
hash2rdf(\@releases, \my @triplets, "root");Eww, that is ugly (out-arguments always bother me). Let's rename
hash2rdf to hash2rdf_inner, and create a wrapper. This is also a good opportunity to pass $has2rdf_count as an argument, and to rename the wrapper to something better.```
sub ref2rdf {
my ($data, $name) = @_;
my $count = 0;
my @triplets;
ref2rdf_inner($data, \@triplets, $name, \$count);
return @triplets;
}
sub ref2rdf_inner {
my ($ref, $triplets, $name, $count_ref) = @_;
...
$name //= "REF" . ++$$count_ref;
...
ref2rdf_inner($ref->[$i], $triple
Code Snippets
unless ($type) {
$ref=~s/\n/\r/isg;
$ref=~s/\"/\\"/isg;
return $ref;
}if (not length $type) {
$ref =~ s/\n/\r/g;
$ref =~ s/"/\\"/g;
return $ref;
}$name = "REF" . ++$hash2rdf_count unless length $name;if (not length $name) {
$name = "REF" . ++$hash2rdf_count;
}$name //= "REF" . ++$hash2rdf_count;Context
StackExchange Code Review Q#37882, answer score: 2
Revisions (0)
No revisions yet.