patternMinor
Rebol Function to Categorize Items in a Block
Viewed 0 times
rebolcategorizeblockfunctionitems
Problem
I have a function that iterates through a block, tests each value against a given function and is placed in a block associated with the product of that function.
An example:
Results in:
For its
NB. applying
Review questions I have are:
-
Is there a more efficient way to evaluate each iteration step?
-
Are there any potential optimization possibilities for common comparisons? e.g. by first letter, etc.
-
This is supposed to be a general-purpose function—are there any glaring cases this function fails? Any enhancement suggestions?
The Code
CATEGORIZE items test
An example:
categorize [
"Aardvark" "Bison"
"Antelope" "Baboon"
"Anaconda" "Basking Shark"
] [first item]
Results in:
[
#"A" [
"Aardvark"
"Antelope"
"Anaconda"
]
#"B" [
"Bison"
"Baboon"
"Basking Shark"
]
]
For its
test argument, a function! or a block! is accepted—a block! is transformed into a function with a single argument item.NB. applying
new-line before returning the block neatens the resultant block—for aesthetic purposes only.Review questions I have are:
-
Is there a more efficient way to evaluate each iteration step?
-
Are there any potential optimization possibilities for common comparisons? e.g. by first letter, etc.
-
This is supposed to be a general-purpose function—are there any glaring cases this function fails? Any enhancement suggestions?
The Code
Rebol [
Title: "Categorize"
Date: 11-Mar-2014
Author: "Christopher Ross-Gill"
Type: 'module
Exports: [categorize]
]
categorize: func [items [block!] test [any-function! block!] /local out value target][
out: copy []
if block? :test [test: func [item] :test]
foreach item items [
value: test item
unless target: select out value [
repend out [value target: copy []]
]
append/only target item
]
foreach [value items] out [new-line/all items true]
new-line/all/skip out true 2
]
Solution
It's a useful function, and exists in underscore.js as groupBy. So I'd probably call it
The one thing I'll say about the interface in general, is that it's a bit too bad that what's coming back here is a
Though perhaps rather than being a separate data type that
Anyway, that's the only major point I'd bring up. Minor stuff might be that you're using
Using
Performance-wise, the difference is negligible. Remember that the scan only happens once here:
delta-time [foo: func [a b /local c d] [c: 10 d: 20 return a + b + c + d] loop 10000 [foo 1 2 3 4]]
== 0:00:00.009
delta-time [foo: function [a b] [c: 10 d: 20 return a + b + c + d] loop 10000 [foo 1 2 3 4]]
== 0:00:00.009022
In fact, the performance difference is generally lost in the noise. You can repeat such tests and find the FUNC version taking longer than the FUNCTION version.
I've been scolded for using
group-by. It might be useful to go function by function in underscore and use their names, possibly even putting them in the namespace _ if requested...although that's a little bit ugly.The one thing I'll say about the interface in general, is that it's a bit too bad that what's coming back here is a
block! and not a map!. I feel like if you're reorganizing data in this pattern so that you can access it by key, then doing the linear searching both during the creation and later accesses isn't optimal.Though perhaps rather than being a separate data type that
map! should be a "hint" about an access pattern that is made on a block, like:foo: [a [b c] d [e f] ... xyz [lmno pqrs]]
hint foo [map-access]
data: select foo 'd
Anyway, that's the only major point I'd bring up. Minor stuff might be that you're using
func and /local:categorize: func [items [block!] test [any-function! block!] /local out value target][
Using
function and letting the locals be scanned for, is more readable and maintainable:categorize: function [items [block!] test [any-function! block!]] [
Performance-wise, the difference is negligible. Remember that the scan only happens once here:
delta-time [foo: func [a b /local c d] [c: 10 d: 20 return a + b + c + d] loop 10000 [foo 1 2 3 4]]
== 0:00:00.009
delta-time [foo: function [a b] [c: 10 d: 20 return a + b + c + d] loop 10000 [foo 1 2 3 4]]
== 0:00:00.009022
In fact, the performance difference is generally lost in the noise. You can repeat such tests and find the FUNC version taking longer than the FUNCTION version.
out: copy []
I've been scolded for using
copy [] to create blocks, because you're "wasting a series" as a template, when that series is never used. More relevantly, I think, is that it's a habit prone to accidentally leaving off the copy and suffering the consequences. I have a proposal out to say that block~ none would create an empty block and there would be generators for all the types of that form. But until that happens you might want to use make block! here.Context
StackExchange Code Review Q#54260, answer score: 3
Revisions (0)
No revisions yet.