patternMinor
Performing non-production analytics on data
Viewed 0 times
performingproductionnondataanalytics
Problem
I have the following Lua script which I use against my redis environment to perform some non-production analytics on my data.
My dataset contains hundreds of millions of records, hence I want to make sure that the approach that I am taking is optimal for query performance and best coding practices.
``
local function genericAgregations(table, key, size)
if table[key] then
table[key] = table[key] + size
else
table[key] = size
end
end
-- For users only, sort the table by Key (size) and print return it
local function printTopUsers(t)
local str = "["
local sortedTable = {}
local maxResults = 25
for key in pairs(t) do
sortedTable[#sortedTable+1] = key
end
table.sort(sortedTable, function(a,b)
return a > b
end)
if #sortedTable < 25 then
maxResults = #sortedTable
end
for i=1, maxResults do
str = str .. sortedTable[i] .. ":" .. t[sortedTable[i]] .. ","
end
local formatted = str .. "]"
return formatted
end
-- For other fields just return the table
local function printTable(t)
local str = "["
for key,value in pairs(t) do
str = str .. key .. ":" .. value .. ","
end
local formatted = str .. "]"
return formatted
end
-- Return all the keys in redis that match a given pattern (Yes, its non-production)
-- Where the keys are:
My dataset contains hundreds of millions of records, hence I want to make sure that the approach that I am taking is optimal for query performance and best coding practices.
``
-- Tables to store the aggregations
local aUsers = {} -- [sizeInBytes] = userID (Table is created the other way around to allow for sorting)
local aFileTypes = {} -- [fileTypeID] = sizeInBytes
local aFileTag = {} -- [fileTagID] = sizeInBytes
-- Global Count of Records and Sum of Sizes in Bytes
local sumCount = 0
local sumSize = 0
-- For users 'sizeBytes' is the Table[ID] to allow sorting. Collissions are not a big deal
local function userAgregations(table, key, size)
table[size] = key
end
-- For other agregations the fieldID is the key mapped against the size` agregationlocal function genericAgregations(table, key, size)
if table[key] then
table[key] = table[key] + size
else
table[key] = size
end
end
-- For users only, sort the table by Key (size) and print return it
local function printTopUsers(t)
local str = "["
local sortedTable = {}
local maxResults = 25
for key in pairs(t) do
sortedTable[#sortedTable+1] = key
end
table.sort(sortedTable, function(a,b)
return a > b
end)
if #sortedTable < 25 then
maxResults = #sortedTable
end
for i=1, maxResults do
str = str .. sortedTable[i] .. ":" .. t[sortedTable[i]] .. ","
end
local formatted = str .. "]"
return formatted
end
-- For other fields just return the table
local function printTable(t)
local str = "["
for key,value in pairs(t) do
str = str .. key .. ":" .. value .. ","
end
local formatted = str .. "]"
return formatted
end
-- Return all the keys in redis that match a given pattern (Yes, its non-production)
-- Where the keys are:
Solution
There are a few little things that can be improved:
If using LuaJIT, this function might be optimized away, but in regular Lua there's no reason to make a function that just does an assignment like this. Just do it straight in your code; table assignments are supposed to be understandable.
In genericAgregations (should be genericAggregations!) you're performing one table access too many:
In printTopUsers, don't use a custom function in table.sort to sort in descending order. Sort normally and then perform the numeric for backwards.
Also, instead of concatenating str, create a temporary table, insert the bits of the string in it and in the end generate the string at once with table.concat. It's faster than multiple concatenations.
Same in printTable.
As for the redis-specific stuff, I have no knowledge but I hope the little tweaks above are useful!
-- For users 'sizeBytes' is the Table[ID] to allow sorting. Collissions are not a big deal
local function userAgregations(table, key, size)
table[size] = key
endIf using LuaJIT, this function might be optimized away, but in regular Lua there's no reason to make a function that just does an assignment like this. Just do it straight in your code; table assignments are supposed to be understandable.
In genericAgregations (should be genericAggregations!) you're performing one table access too many:
local function genericAgregations(table, key, size)
local value = table[key]
table[key] = value and (value + size) or size
endIn printTopUsers, don't use a custom function in table.sort to sort in descending order. Sort normally and then perform the numeric for backwards.
Also, instead of concatenating str, create a temporary table, insert the bits of the string in it and in the end generate the string at once with table.concat. It's faster than multiple concatenations.
Same in printTable.
As for the redis-specific stuff, I have no knowledge but I hope the little tweaks above are useful!
Code Snippets
-- For users 'sizeBytes' is the Table[ID] to allow sorting. Collissions are not a big deal
local function userAgregations(table, key, size)
table[size] = key
endlocal function genericAgregations(table, key, size)
local value = table[key]
table[key] = value and (value + size) or size
endContext
StackExchange Code Review Q#42311, answer score: 8
Revisions (0)
No revisions yet.