patternphpMinor
Simplifying the building of an adjacency array starting from a flat structure
Viewed 0 times
startingthearraybuildingflatstructuresimplifyingadjacencyfrom
Problem
Purpose: to make an adjacency PHP array with
This is the starting array where
I.e. F (
This should be dynamic allowing arbitrary nesting levels and ids for parent categories should be computed using
I ended up with this, it works but it's not efficient and I'd like suggestions to make it simpler and faster:
The loop over
```
$result = []; // Result adjacency list
foreach ($categories as $category) { // Loop over rows
$id = $idize(current($category)); // this is the id of the leaf node i.e. "F"
$path = []; // Array of parent names used for computing ids for non-leaf nodes
$current = 0;
$levels = count($category) - 1;
while ($name = next($category)) {
$path[] = $name;
// id of the current node computed using parent names if not a leaf
$uniqid = $current === ($levels - 1) ? $id : $idize($path);
$new = [];
$new['id'] = $uniqid;
$new['name'] = $
parent_id, name, level and root_id (the id of the root category) starting from a "flat" table-like structure.This is the starting array where
name1, name2 and name3 represents the category name and nesting level:$categories = [
['id' => 1, 'name1' => 'A', 'name2' => 'B', 'name3' => 'C'],
['id' => 2, 'name1' => 'A', 'name2' => 'B', 'name3' => 'D'],
['id' => 3, 'name1' => 'A', 'name2' => 'E', 'name3' => 'F'],
['id' => 4, 'name1' => 'G', 'name2' => 'H', 'name3' => 'I'],
['id' => 5, 'name1' => 'G', 'name2' => 'H', 'name3' => 'E'],
['id' => 6, 'name1' => 'X', 'name2' => 'Y', 'name3' => null],
['id' => 7, 'name1' => 'Z'],
];I.e. F (
level 2) is child of E (level 1) that is child of A (level 0, root category). The id key always refers to the leaf node (F).This should be dynamic allowing arbitrary nesting levels and ids for parent categories should be computed using
md5.I ended up with this, it works but it's not efficient and I'd like suggestions to make it simpler and faster:
// compute id using md5, imploding if array.
// i.e. passing ['A', 'E', 'F'] would result in md5('aef')
$idize = function ($path) {
return is_array($path)
? md5(implode(null, array_map('strtolower', $path)))
: md5(strtolower($path));
};The loop over
$categories:```
$result = []; // Result adjacency list
foreach ($categories as $category) { // Loop over rows
$id = $idize(current($category)); // this is the id of the leaf node i.e. "F"
$path = []; // Array of parent names used for computing ids for non-leaf nodes
$current = 0;
$levels = count($category) - 1;
while ($name = next($category)) {
$path[] = $name;
// id of the current node computed using parent names if not a leaf
$uniqid = $current === ($levels - 1) ? $id : $idize($path);
$new = [];
$new['id'] = $uniqid;
$new['name'] = $
Solution
First off, not having any experience in this topic I found it took a bit of time to get into the code. In particular, there were a few things that weren't very obvious:
In a general sense I would suggest that the goal should be to make code easy for others to follow and maintain -- or for yourself if you haven't looked at it for a year a more. The time taken to simplify and document your code for this purpose will pay for itself over and over again.
Code snippets and discussion...
The above is pretty similar. I've added a loop to provide timing efforts -- as the original request was for speed or efficiency I was surprised not to see something like this in the posted code.
Other than the comments regarding handling of null items there isn't much different there. Some values are being initialized for the purpose of squeezing efficiency out of the calculations to follow. Depending on your own coding needs you may want to be more concerned about the value types used during initialization -- this code can provide empty strings instead of null values, for example, which could conflict with downstream code trying to process the resulting output.
In the preceding the main efficiency introduced is the reduction in calls to "time consuming" functions. In particular, notice that there is now exactly one MD5 call per node. We figure out what type of node it is and then get it's ID value.
This is achieved by some fairly intensive state management. For example, each category, or route, has only one root node key. Once the first key is calculated it is stored for reuse. Similarly, the current node will become the next node's parent so keep the current ID for reuse.
This also reduces a lot of fussing around with fancy array work to pull out the needed pieces under various conditions. Basically, tracking work performed and reusing it instead of redoing it can be much more efficient. However, you can trade one type of complexity for another in the process if you aren't careful. Key point, you can do a lot of simple assignments and comparisons in the time it will take for a complex function call.
Another "large" efficiency gain is introduced by tracking the keys for nodes and paths that have already been
- The question discusses the lowercase path (e.g. abd) being used for key generation but it is the ID field that is used for leaf nodes.
- The proper handling of a null value in the adjacency data would be good to know. I've trimmed nulls out as the original code appears not to have a leaf if there is a trailing null. This changes one of the ID values in the result.
In a general sense I would suggest that the goal should be to make code easy for others to follow and maintain -- or for yourself if you haven't looked at it for a year a more. The time taken to simplify and document your code for this purpose will pay for itself over and over again.
Code snippets and discussion...
$start = microtime(TRUE);
$categories = [
['id' => 1, 'name1' => 'A', 'name2' => 'B', 'name3' => 'C'],
['id' => 2, 'name1' => 'A', 'name2' => 'B', 'name3' => 'D'],
['id' => 3, 'name1' => 'A', 'name2' => 'E', 'name3' => 'F'],
['id' => 4, 'name1' => 'G', 'name2' => 'H', 'name3' => 'I'],
['id' => 5, 'name1' => 'G', 'name2' => 'H', 'name3' => 'E'],
['id' => 6, 'name1' => 'X', 'name2' => 'Y', 'name3' => null],
['id' => 7, 'name1' => 'Z'],
];
for ($i=0; $i<1 /*0000*/ ; $i++)
{
$result = array();
$done = array();The above is pretty similar. I've added a loop to provide timing efforts -- as the original request was for speed or efficiency I was surprised not to see something like this in the posted code.
// Process all catetories. Null nodes are filtered out (see comment)
// -----------------------------------------------------------------
foreach ($categories as $rkey => $route)
{
$mypath = '';
$nodeid = null;
$rootkey = '';
// Collapse nulls out of a path set... this changes the value for
// path ID 6 (i.e. XY) node Y as the null acts as the leaf but is
// not shown in the result of the original code. I chose to trim
// the null value and have Y be a leaf. One could also leave the
// null in but simply skip null nodes instead to get same result.
// ---------------------------------------------------------------
$route = array_filter($route);
$routelen = count($route) - 1;
$parentkey = null;Other than the comments regarding handling of null items there isn't much different there. Some values are being initialized for the purpose of squeezing efficiency out of the calculations to follow. Depending on your own coding needs you may want to be more concerned about the value types used during initialization -- this code can provide empty strings instead of null values, for example, which could conflict with downstream code trying to process the resulting output.
$leaf = 0;
foreach ($route as $node => $val)
{
// Keep the nodeid for use in leaf key generation
// ----------------------------------------------
if ( $nodeid === null )
{
$nodeid = $val;
continue;
}
// Path efforts, parent path is prior path before adjustment
// ---------------------------------------------------------
$parent = $mypath;
$level = strlen($parent);
if ( $level + 1 == $routelen)
$leaf = 1;
$mypath .= strtolower($val);
// Original code uses the id field value for leaf key generation
// -------------------------------------------------------------
$key = $leaf ? md5($nodeid) : md5($mypath);
if ( !strlen($rootkey) )
$rootkey = $key;
// Short cut previous captured paths. Reduces effort for sub-path
// repeats.
// --------------------------------------------------------------
if ( isset($done[$key]) )
{
$parentkey = $key;
continue;
}In the preceding the main efficiency introduced is the reduction in calls to "time consuming" functions. In particular, notice that there is now exactly one MD5 call per node. We figure out what type of node it is and then get it's ID value.
This is achieved by some fairly intensive state management. For example, each category, or route, has only one root node key. Once the first key is calculated it is stored for reuse. Similarly, the current node will become the next node's parent so keep the current ID for reuse.
This also reduces a lot of fussing around with fancy array work to pull out the needed pieces under various conditions. Basically, tracking work performed and reusing it instead of redoing it can be much more efficient. However, you can trade one type of complexity for another in the process if you aren't careful. Key point, you can do a lot of simple assignments and comparisons in the time it will take for a complex function call.
Another "large" efficiency gain is introduced by tracking the keys for nodes and paths that have already been
Code Snippets
$start = microtime(TRUE);
$categories = [
['id' => 1, 'name1' => 'A', 'name2' => 'B', 'name3' => 'C'],
['id' => 2, 'name1' => 'A', 'name2' => 'B', 'name3' => 'D'],
['id' => 3, 'name1' => 'A', 'name2' => 'E', 'name3' => 'F'],
['id' => 4, 'name1' => 'G', 'name2' => 'H', 'name3' => 'I'],
['id' => 5, 'name1' => 'G', 'name2' => 'H', 'name3' => 'E'],
['id' => 6, 'name1' => 'X', 'name2' => 'Y', 'name3' => null],
['id' => 7, 'name1' => 'Z'],
];
for ($i=0; $i<1 /*0000*/ ; $i++)
{
$result = array();
$done = array();// Process all catetories. Null nodes are filtered out (see comment)
// -----------------------------------------------------------------
foreach ($categories as $rkey => $route)
{
$mypath = '';
$nodeid = null;
$rootkey = '';
// Collapse nulls out of a path set... this changes the value for
// path ID 6 (i.e. XY) node Y as the null acts as the leaf but is
// not shown in the result of the original code. I chose to trim
// the null value and have Y be a leaf. One could also leave the
// null in but simply skip null nodes instead to get same result.
// ---------------------------------------------------------------
$route = array_filter($route);
$routelen = count($route) - 1;
$parentkey = null;$leaf = 0;
foreach ($route as $node => $val)
{
// Keep the nodeid for use in leaf key generation
// ----------------------------------------------
if ( $nodeid === null )
{
$nodeid = $val;
continue;
}
// Path efforts, parent path is prior path before adjustment
// ---------------------------------------------------------
$parent = $mypath;
$level = strlen($parent);
if ( $level + 1 == $routelen)
$leaf = 1;
$mypath .= strtolower($val);
// Original code uses the id field value for leaf key generation
// -------------------------------------------------------------
$key = $leaf ? md5($nodeid) : md5($mypath);
if ( !strlen($rootkey) )
$rootkey = $key;
// Short cut previous captured paths. Reduces effort for sub-path
// repeats.
// --------------------------------------------------------------
if ( isset($done[$key]) )
{
$parentkey = $key;
continue;
}// Notice, next parent value to be set to currend key below, so
// debug output precedes that step.
// ------------------------------------------------------------
if ( 1 )
{
echo "val: $val / path: $mypath (leaf? $leaf)<br>\n";
echo "id: $key<br>\n";
echo "level: $level<br>\n";
echo "parentkey: $parentkey<br>\n";
echo "rootkey: $rootkey<br>\n";
echo "<br>\n";
}
// Temporary storage
// -----------------
$new = [];
$new['id'] = $key;
$new['name'] = $val;
$new['level'] = $level;
$new['parent_id'] = $parentkey;
$new['root_id'] = $rootkey;
// Set next parent to current id to avoid calcuation, keep result.
// ---------------------------------------------------------------
$parentkey = $key;
$result[$key] = $new;
$done[$key] = 1;
}
}
}
$end = microtime(TRUE);
$elapsed = $end - $start;
echo "<br>\n";
echo "Start time: $start<br>\n";
echo "End time: $end<br>\n";
echo "Elapsed: " . sprintf("%0.7f",$elapsed) . "<br>\n";val: A / path: a (leaf? 0)
id: 0cc175b9c0f1b6a831c399e269772661
level: 0
parentkey:
rootkey: 0cc175b9c0f1b6a831c399e269772661
val: B / path: ab (leaf? 0)
id: 187ef4436122d1cc2f40dc2b92f0eba0
level: 1
parentkey: 0cc175b9c0f1b6a831c399e269772661
rootkey: 0cc175b9c0f1b6a831c399e269772661
val: C / path: abc (leaf? 1)
id: c4ca4238a0b923820dcc509a6f75849b
level: 2
parentkey: 187ef4436122d1cc2f40dc2b92f0eba0
rootkey: 0cc175b9c0f1b6a831c399e269772661
val: D / path: abd (leaf? 1)
id: c81e728d9d4c2f636f067f89cc14862c
level: 2
parentkey: 187ef4436122d1cc2f40dc2b92f0eba0
rootkey: 0cc175b9c0f1b6a831c399e269772661
val: E / path: ae (leaf? 0)
id: b6bb43df4525b928a105fb5741bddbea
level: 1
parentkey: 0cc175b9c0f1b6a831c399e269772661
rootkey: 0cc175b9c0f1b6a831c399e269772661
val: F / path: aef (leaf? 1)
id: eccbc87e4b5ce2fe28308fd9f2a7baf3
level: 2
parentkey: b6bb43df4525b928a105fb5741bddbea
rootkey: 0cc175b9c0f1b6a831c399e269772661
val: G / path: g (leaf? 0)
id: b2f5ff47436671b6e533d8dc3614845d
level: 0
parentkey:
rootkey: b2f5ff47436671b6e533d8dc3614845d
val: H / path: gh (leaf? 0)
id: 19b19ffc30caef1c9376cd2982992a59
level: 1
parentkey: b2f5ff47436671b6e533d8dc3614845d
rootkey: b2f5ff47436671b6e533d8dc3614845d
val: I / path: ghi (leaf? 1)
id: a87ff679a2f3e71d9181a67b7542122c
level: 2
parentkey: 19b19ffc30caef1c9376cd2982992a59
rootkey: b2f5ff47436671b6e533d8dc3614845d
val: E / path: ghe (leaf? 1)
id: e4da3b7fbbce2345d7772b0674a318d5
level: 2
parentkey: 19b19ffc30caef1c9376cd2982992a59
rootkey: b2f5ff47436671b6e533d8dc3614845d
val: X / path: x (leaf? 0)
id: 9dd4e461268c8034f5c8564e155c67a6
level: 0
parentkey:
rootkey: 9dd4e461268c8034f5c8564e155c67a6
val: Y / path: xy (leaf? 1)
id: 1679091c5a880faf6fb5e6087eb1b2dc
level: 1
parentkey: 9dd4e461268c8034f5c8564e155c67a6
rootkey: 9dd4e461268c8034f5c8564e155c67a6
val: Z / path: z (leaf? 1)
id: 8f14e45fceea167a5a36dedd4bea2543
level: 0
parentkey:
rootkey: 8f14e45fceea167a5a36dedd4bea2543
Start time: 1450924394.531
End time: 1450924394.531
Elapsed: 0.0000000Context
StackExchange Code Review Q#63992, answer score: 2
Revisions (0)
No revisions yet.