patterncsharpMinor
ConstantQ transformation method
Viewed 0 times
constantqtransformationmethod
Problem
I've got this transform method with a triple-nested loop. The
Generate methods do their own caching so they are fast. K will have a worst-case value of 150. N will have a worst-case value of about 20000. It's that 200002 that's killing me. It looks to me like the innermost loop could be moved out, but would you store every possible sum?public static void Transform(IList x, int offset, int sampleRate, int binsPerOctave,
double minFrequency, double maxFrequency, out double[] magnitudes, out double maxMagnitude)
{
var Q = 1.0 / (Math.Pow(2.0, 1.0 / binsPerOctave) - 1.0);
var K = (int)Math.Ceiling(Math.Log(maxFrequency / minFrequency, 2.0) * binsPerOctave);
magnitudes = new double[K];
var maxN = (int)Math.Round(Q * sampleRate / minFrequency);
var halfN = -maxN / 2;
double[] wcs = GenerateWindowConstants(maxN);
maxMagnitude = double.NegativeInfinity;
for(int k = 0; k < K; k++)
{
var N = (int)Math.Round(Q * sampleRate / (minFrequency * Math.Pow(2.0, (double)k / binsPerOctave)));
Complex[] piqs = GeneratePiQConstants(N, Q);
var maxMag = double.NegativeInfinity;
for(int i = 0; i < N; i++)
{
Complex sum = new Complex();
for(int n = 0; n < maxN; n++)
{
var index = n + offset + halfN;
index = Math.Min(x.Count - 1, index);
index = Math.Max(0, index);
sum += (wcs[n] * x[index]) * piqs[(n + i) % N];
}
sum /= maxN;
maxMag = Math.Max(maxMag, sum.Magnitude);
}
magnitudes[k] = maxMag;
maxMagnitude = Math.Max(maxMagnitude, maxMag);
}
}Solution
It looks like based on the 2 inner loops where you are calculating maxMag you could refactor the code to utilize Parallel.ForEach() so you can start to use multiple cores for the math. Just put all of the individual maxMags into a concurrent dictionary. When your done, run your aggregation methods. You'll have to decide whether it is better as an array that gets copied, or whether to refactor them into a concurrent collection type. I hope this helps.
Edit/Update:
Per your comment relating to using all cores "However, the best improvement this would bring on typical hardware would be 8x. 8x isn't quite going to be enough for my application." (found under cgilmeanu's answer)
If utilizing all of the computer resources and taking an 8-fold gain won't get anywhere close, then ...
Answer = you need to horizontally scale across multiple computers until you reach the point where you are satisfied with the "timed" outcome.
Going to c, c++, or .net-under-the-hood should help make gains for you. However, commonly, the programming involved results in something complicated, hard to understand, and more extensive documentation. This can make it tough to be nimble and make changes. Also, if you can assume that the computational load/iterations may increase in the future, your extensive efforts in extremely fine tuning this item may be all for nothing since it was internally just vertical scaling by optimization.
Edit/Update:
Per your comment relating to using all cores "However, the best improvement this would bring on typical hardware would be 8x. 8x isn't quite going to be enough for my application." (found under cgilmeanu's answer)
If utilizing all of the computer resources and taking an 8-fold gain won't get anywhere close, then ...
Answer = you need to horizontally scale across multiple computers until you reach the point where you are satisfied with the "timed" outcome.
Going to c, c++, or .net-under-the-hood should help make gains for you. However, commonly, the programming involved results in something complicated, hard to understand, and more extensive documentation. This can make it tough to be nimble and make changes. Also, if you can assume that the computational load/iterations may increase in the future, your extensive efforts in extremely fine tuning this item may be all for nothing since it was internally just vertical scaling by optimization.
Context
StackExchange Code Review Q#10653, answer score: 3
Revisions (0)
No revisions yet.