HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Performing non-production analytics on data

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
performingproductionnondataanalytics

Problem

I have the following Lua script which I use against my redis environment to perform some non-production analytics on my data.

My dataset contains hundreds of millions of records, hence I want to make sure that the approach that I am taking is optimal for query performance and best coding practices.

``
-- Tables to store the aggregations
local aUsers = {} -- [sizeInBytes] = userID (Table is created the other way around to allow for sorting)
local aFileTypes = {} -- [fileTypeID] = sizeInBytes
local aFileTag = {} -- [fileTagID] = sizeInBytes

-- Global Count of Records and Sum of Sizes in Bytes
local sumCount = 0
local sumSize = 0

-- For users 'sizeBytes' is the Table[ID] to allow sorting. Collissions are not a big deal
local function userAgregations(table, key, size)
table[size] = key
end

-- For other agregations the
fieldID is the key mapped against the size` agregation
local function genericAgregations(table, key, size)
if table[key] then
table[key] = table[key] + size
else
table[key] = size
end
end

-- For users only, sort the table by Key (size) and print return it
local function printTopUsers(t)
local str = "["
local sortedTable = {}
local maxResults = 25

for key in pairs(t) do
sortedTable[#sortedTable+1] = key
end

table.sort(sortedTable, function(a,b)
return a > b
end)

if #sortedTable < 25 then
maxResults = #sortedTable
end

for i=1, maxResults do
str = str .. sortedTable[i] .. ":" .. t[sortedTable[i]] .. ","
end

local formatted = str .. "]"
return formatted
end

-- For other fields just return the table
local function printTable(t)
local str = "["
for key,value in pairs(t) do
str = str .. key .. ":" .. value .. ","
end

local formatted = str .. "]"
return formatted
end

-- Return all the keys in redis that match a given pattern (Yes, its non-production)
-- Where the keys are:

Solution

There are a few little things that can be improved:

-- For users 'sizeBytes' is the Table[ID] to allow sorting. Collissions are not a big deal
local function userAgregations(table, key, size)
   table[size] = key
end


If using LuaJIT, this function might be optimized away, but in regular Lua there's no reason to make a function that just does an assignment like this. Just do it straight in your code; table assignments are supposed to be understandable.

In genericAgregations (should be genericAggregations!) you're performing one table access too many:

local function genericAgregations(table, key, size)
   local value = table[key]
   table[key] = value and (value + size) or size
end


In printTopUsers, don't use a custom function in table.sort to sort in descending order. Sort normally and then perform the numeric for backwards.

Also, instead of concatenating str, create a temporary table, insert the bits of the string in it and in the end generate the string at once with table.concat. It's faster than multiple concatenations.

Same in printTable.

As for the redis-specific stuff, I have no knowledge but I hope the little tweaks above are useful!

Code Snippets

-- For users 'sizeBytes' is the Table[ID] to allow sorting. Collissions are not a big deal
local function userAgregations(table, key, size)
   table[size] = key
end
local function genericAgregations(table, key, size)
   local value = table[key]
   table[key] = value and (value + size) or size
end

Context

StackExchange Code Review Q#42311, answer score: 8

Revisions (0)

No revisions yet.