patternjavaMinor

Householder transformation

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

householdertransformationstackoverflow

Problem

I'm trying to make this Householder method run as quickly as possible. It runs fine now, but I'm still a bit new to java, is there anything you guys see to make this run faster?

public static ArrayList householder(double[][] A) {
    int m = A.length; // rows
    int n = A[0].length; // cols
    byte sign; 
    double sum, A0,tmp;
    ArrayList QR = new ArrayList<>();
    double[] v = new double[m];
    double[][] Q = identity(m);
    for(int k = 0; k < n; k++) {
        sum = 0;
        sign = 1;
        A0 = A[k][k];
        if(A0 < 0){
            sign = -1;
        }
        for(int i = k; i < m; i++) {
            sum += A[i][k]*A[i][k];
        }
        tmp = sign*Math.sqrt(sum);
        v[k] = tmp + A0;          
        tmp = Math.sqrt(2*(sum + A0*tmp));
        v[k] /= tmp;
        for(int i = k+1; i < m; i++) {
            v[i] = A[i][k]/tmp;
        }                        
        for(int j = 0; j < n; j++) {
            sum = 0;
            for(int i = k; i < m; i++) {
                sum += v[i]*A[i][j];
            }
            for(int i = k; i < m; i++) {
                A[i][j] -= 2*v[i]*sum;
            }
        }            
        for(int j = 0; j < m; j++) {
            sum = 0;
            for(int i = k; i < m; i++) {
                sum += v[i]*Q[i][j];
            }
            for(int i = k; i < m; i++) {
                Q[i][j] -= 2*v[i]*sum;
            }
        }                
    }
    QR.add(transpose(Q));
    QR.add(A);
    return QR;
}

I'm mostly concerned with the portion of the code that multiplies the vectors to get the Q and A matrices.

Here are some ideas I had but wasn't sure would help much.

-
if there's a better way to return the values than arraylist, like if double[][][] would be faster. I have had issues with recasting double[][][] arrays to double[][] however.

-
changing all the ints to shorts, i was concerned with casting

-
I only need sign to be a bit

-
should I write sum = 0.0?

-
should I

Solution

byte sign; 
double sum, A0,tmp;

Keep variables to the most inner scope where they can be defined.

sign = 1;
A0 = A[k][k];
if(A0 < 0){
    sign = -1;
}

sign can be assigned to a ternary expression

sign = A0 < 0 ? -1 : 1;

tmp = sign*Math.sqrt(sum);
v[k] = tmp + A0;          
tmp = Math.sqrt(2*(sum + A0*tmp));
v[k] /= tmp;

EDIT Oops screwed up, the maths on v[k] evaluation... too many times...

EDIT 2 changing tmp to have the right value for the inner loop.

I would reduce the calculation to this:

double sqrSum = sign*Math.sqrt(sum);
    double tmp = Math.sqrt(2*(sum + A0*sqrSum));        
    v[k] = (sqrSum  + A0) / tmp;

This is trying to store the value in v[k] just once, instead of accessing it twice.

if there's a better way to return the values than arraylist, like if
double[][][] would be faster. I have had issues with recasting
double[][][] arrays to double[][] however.

Well in the end you know that you are returning an Array with two entries, the first being transpose(Q), and the second being A. This is pretty simple to achieve both on the implementation and the caller as long the method is properly documented.

Return on implementation

return new double[][][]{
    transpose(Q),
    A
}

Use on caller

var result = householder(...);
var transpose = result[0]; //documentation says entry 0 has the transpose
var a = result[1]; //documnentation says entry 1 has A

Resulting code

public static double[][] householder(double[][] A) {
    int m = A.length; // rows
    int n = A[0].length; // cols
    double[] v = new double[m];
    double[][] Q = identity(m);
    for(int k = 0; k < n; k++) {
        double sum = 0;
        byte sign = A0 < 0 ? -1 : 1;
        double A0 = A[k][k];
        for(int i = k; i < m; i++) {
            sum += A[i][k]*A[i][k];
        }
        double sqrSum = sign*Math.sqrt(sum);
        double tmp = Math.sqrt(2*(sum + A0*sqrSum));        
        v[k] = (sqrSum  + A0) / tmp;
        for(int i = k+1; i < m; i++) {
            v[i] = A[i][k]/tmp;
        }                        
        for(int j = 0; j < n; j++) {
            sum = 0;
            for(int i = k; i < m; i++) {
                sum += v[i]*A[i][j];
            }
            for(int i = k; i < m; i++) {
                A[i][j] -= 2*v[i]*sum;
            }
        }            
        for(int j = 0; j < m; j++) {
            sum = 0;
            for(int i = k; i < m; i++) {
                sum += v[i]*Q[i][j];
            }
            for(int i = k; i < m; i++) {
                Q[i][j] -= 2*v[i]*sum;
            }
        }                
    }
    return new double[][][]{
        transpose(Q),
        A
    }
}

Changing all the ints to shorts, i was concerned with casting

Don't bother with it. In the end all of it will be running mostly on cpu registers which can hold 32 bits, incrementing 16 or 32 bits costs mostly the same and you shouldn't be bothered by that.

I only need sign to be a bit

This is almost effectively the same as evaluating sign to be +1 or -1 once. keeping this on a variable makes you code cleaner. Pre-optimization is the rule of all evil, first you need to make sure your code is maintainable, understandable, follows design principles, and optimally that you chose the best data structures and kept the algorithm running time to a minimum, although that also falls in the optimization hole is not as aggressive as going into the last details like you referring to.

should I write sum = 0.0?

Meh floating point is not my area of expertise but what I think is even if there is a difference on the representation between 0 and 0.0 which there isn't, you end up with slightly errors of accuracy.

should I initialize the for loop int i's and j's outside of the loop
to save memory?

No. Compilers do a great job optimizing trivial things for you, such as this and also the ones mentioned before. In the end the cpu will need to increment a variable, hopefully that variable will be on the cpu register or in cache, if not will be on the stack (ram). The variations between this can be different depending on how you write your code but it is the sort of things that you if you are truly worried about then you go into onto other languages like C and ultimately assembly.

On variable declaration and assignment.

This seems to be a topic a little bit out of your knowledge, let's get some of those doubts sorted.

Historically, in languages like C, variable names were declared at the beginning of a method, or scope. The reason being that if you didn't do that way the compiler would give you a warning. I suspect that this warning could be due to possible inefficiency on the usage of the stack, but I'm not totally positive for this (at least this seem to be a plausible reason for those warnings).

Thus, like I told you, things on the internals can change a little bit depending how/where you declare you variables. Compilers

Code Snippets

byte sign; 
double sum, A0,tmp;

sign = 1;
A0 = A[k][k];
if(A0 < 0){
    sign = -1;
}

sign = A0 < 0 ? -1 : 1;

tmp = sign*Math.sqrt(sum);
v[k] = tmp + A0;          
tmp = Math.sqrt(2*(sum + A0*tmp));
v[k] /= tmp;

double sqrSum = sign*Math.sqrt(sum);
    double tmp = Math.sqrt(2*(sum + A0*sqrSum));        
    v[k] = (sqrSum  + A0) / tmp;

Context

StackExchange Code Review Q#120978, answer score: 4

Revisions (0)

No revisions yet.