HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Data structure to RDF

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
datastructurerdf

Problem

I believe any data structure can be converted to RDF, and I wrote the
subroutine below which sort of does this, at least for arbitrarily
complex hashes.

However, I'm unhappy with it because:

-
$hash2rdf_count should not be global. I don't see an easy fix for this.

-
Where I comment "hopefully it's referring to a scalar at this
point!", I'm not actually convinced. Are there better ways to check?

-
@triplets should not be global. I think I can fix this by passing
it as a parameter, but is there an easier way?

Other notes:

-
In my specific use case, I'm using JSON::from_json($data) to
convert JSON data to a hash, which I then feed to the subroutine
below. I then print out @triplets and create an SQLite3 DB out of it. So it's really: JSON2hash, hash2rdf, rdf2sql. The entire program: https://github.com/barrycarter/bcapps/blob/master/bc-get-discogs.pl

-
JSON::GRDDL failed with an out-of-memory error on my data. My
subroutine doesn't. Is there another "prepackaged" solution I could
use?

-
I realize what I'm doing is a very minimal form of RDF.

`=item hash2rdf($ref,$name="")

Given a scalar/array/hash reference $ref, returns RDF triplets recursively.

If $name is given, assigns name $name to $ref. Otherwise, creates a name

Pretty much does what var_dump() does, only in RDF

=cut

sub hash2rdf {
my($ref,$name) = @_;
my($type) = ref($ref);

# if no type at all, just return self (after cleanup)
unless ($type) {
$ref=~s/\n/\r/isg;
$ref=~s/\"/\\"/isg;
return $ref;
}

# name I will give $ref if not given
unless ($name) {$name = "REF".++$hash2rdf_count;}
# TODO: making $hash2rdf_count global is bad
# TODO: making @triplets global is unacceptably bad (but just testing now)
# my(@triplets);

if ($type eq "ARRAY") {
# interim var
my(@l) = @{$ref};
# push triplets for my children
for $i (0..$#l) {push(@triplets, [$name, $i, hash2rdf($l[$i])]);}
# return the name I gav

Solution

I'm not sure what you're trying to do, so I'll just review the code itself, not what you're trying to do.

I looked into the whole script on GitHub and was shocked to see that you didn't use strict; use warnings. This would usually abort any review, but let's continue for the sake of the argument.

I think it's very nice that you've used POD documentation for your script.

Let's look at this excerpt for a moment:

unless ($type) { 
    $ref=~s/\n/\r/isg; 
    $ref=~s/\"/\\"/isg; 
    return $ref; 
}


  • ref $foo can be false, but a reference (consider $foo = bless \$bar, "0"). Use either length ref $foo or reftype $foo, where reftype is imported from Scalar::Util.



  • unless is widely considered to be half a design error. if (not ...) is cleared most of the time.



  • On each of your regexes you specify the flags /isg.



  • The /i does case insensitive matching but you don't match anything that has a case. As this flag reduces performance, you should use it sparingly.



  • The /s flag causes the . to match any character including newlines. It is generally a good idea to use this flag (the default behavior of . is better written as [^\n]). However, you do not use the . metacharacter, so /s is useless here.



  • In the pattern, " does not have to be escaped as it isn't a metacharacter.



  • Use proper spacing around operators (usually a single space) to improve readability.



Suggested refactoring:

if (not length $type) { 
    $ref =~ s/\n/\r/g; 
    $ref =~ s/"/\\"/g; 
    return $ref; 
}


Now let's take a look at unless ($name) {$name = "REF".++$hash2rdf_count;}:

  • As per Larry's style guide, you shouldn't use a semicolon if the closing brace is on the same line.



-
Never use the conditional (...) {BLOCK} form on a single line. Either put the statement on a line of its own, or use the statement modifier form:

$name = "REF" . ++$hash2rdf_count unless length $name;


or

if (not length $name) {
    $name = "REF" . ++$hash2rdf_count;
}


-
If you only want to overwrite $name if it is undef, you could either test for definedness: if (not defined $name) or since perl 5.10, you can use the defined-or operator //:

$name //= "REF" . ++$hash2rdf_count;


What to do about the $has2rdf_count? The simplest solution that does not involve globals is to declare it outside of the subroutine:

my $hash2rdf_count = 0;
sub hash2rdf {


A better solution (since perl 5.10) is to use state variables:

use feture 'state';
...
state $hash2rdf_count = 0;
$name //= "REF" . ++$hash2rdf_count;


A state variable belongs to a certain scope like my variables, but is initialized only at the first execution.

However, it may be better to take this as another argument (will be discussed later).

my(@l) = @{$ref} there is no need to perform a copy of the array, and even if you feel the need to do that, please don't use a bad variable name like @l. Especially the charcterl l should be avoided on its own. Also, variable names don't have to be so short. Given an array reference, we can access an entry like $ref->[$i] and get the highest index like $#$ref (or with curlies: $#{$ref}).

By now we know how a line like for $i (0..$#l) {push(@triplets, [$name, $i, hash2rdf($l[$i])]);} can be improved. But I would like you to call builtins like push without parens: They aren't subroutines, but operators.

With a statement modifier:

push @triplets, [$name, $_, hash2rdf($ref->[$_])] for 0 .. $#$ref;


Without:

for my $i (0 .. $#ref) {
    push @triplets, [$name, $i, hash2rdf($ref->[$i])];
}


And then there is a variant with map:

push @triplets, map { [$name, $_, hash2rdf($ref->[$_])] } 0 .. $#$ref;


The same holds for the HASH-reference case. Recommended refactoring there:

for my $key (keys %$ref) {
    push @triplets, [$name, $key, hash2rdf($ref->{$key})];
}


Note that I'm not using $i here, as the hash key won't generally be an integer.

How can we solve the problem that @triplets is a global? We take a reference to that array as another argument:

sub hash2rdf {
    my ($ref, $triplets, $name) = @_;
    ...
    push @$triplets, ...


This means that we have to change the way how we call ourselves when recursing:

hash2rdf($ref->[$i], $triplets)


On the outside, we call the subroutine like

hash2rdf(\@releases, \my @triplets, "root");


Eww, that is ugly (out-arguments always bother me). Let's rename hash2rdf to hash2rdf_inner, and create a wrapper. This is also a good opportunity to pass $has2rdf_count as an argument, and to rename the wrapper to something better.

```
sub ref2rdf {
my ($data, $name) = @_;
my $count = 0;
my @triplets;
ref2rdf_inner($data, \@triplets, $name, \$count);
return @triplets;
}

sub ref2rdf_inner {
my ($ref, $triplets, $name, $count_ref) = @_;
...
$name //= "REF" . ++$$count_ref;
...
ref2rdf_inner($ref->[$i], $triple

Code Snippets

unless ($type) { 
    $ref=~s/\n/\r/isg; 
    $ref=~s/\"/\\"/isg; 
    return $ref; 
}
if (not length $type) { 
    $ref =~ s/\n/\r/g; 
    $ref =~ s/"/\\"/g; 
    return $ref; 
}
$name = "REF" . ++$hash2rdf_count unless length $name;
if (not length $name) {
    $name = "REF" . ++$hash2rdf_count;
}
$name //= "REF" . ++$hash2rdf_count;

Context

StackExchange Code Review Q#37882, answer score: 2

Revisions (0)

No revisions yet.