patternpythonModerate

Generating a 3D Point Cloud

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

generatingcloudpoint

Problem

I'm trying to produce a 3D point cloud from a depth image and some camera intrinsics. The image is 640x480, and is a NumPy array of bytes. The output is a (rows * columns) x 3 array of points.

I've gotten the function to work perfectly, but it's way too slow! (takes like 2 seconds per image to process). I'm wondering if there are any optimizations I can make before giving up and writing a C module.

def create_point_cloud(self, depth_image):
    shape = depth_image.shape;
    rows = shape[0];
    cols = shape[1];

    points = np.zeros((rows * cols, 3), np.float32);

    bytes_to_units = (1.0 / 256.0);

    # Linear iterator for convenience
    i = 0
    # For each pixel in the image...
    for r in xrange(0, rows):
        for c in xrange(0, cols):
            # Get the depth in bytes
            depth = depth_image[r, c, 0];

            # If the depth is 0x0 or 0xFF, its invalid.
            # By convention it should be replaced by a NaN depth.
            if(depth > 0 and depth < 255):
                # The true depth of the pixel in units
                z = depth * bytes_to_units;

                # Get the x, y, z coordinates in units of the pixel
                points[i, 0] = (c - self.cx) / self.fx * z;
                points[i, 1] = (r - self.cy) / self.fy * z;
                points[i, 2] = z
            else:
                # Invalid points have a NaN depth
                points[i, 2] = np.nan;
            i = i + 1
    return points

Solution

Review

-
No docstring. What does this function do? What parameters does it take? What shape must the depth_image parameter be? What does it return?

-
In Python, there is no need for a semi-colon at the end of a statement (unless another statement follows on the same line) and it is best to omit it.

-
It seems that depth_image is required to have three dimensions, but only the coordinate 0 is used on the third dimension. It would be simpler if the function took a two-dimensional depth image. A caller with a three-dimensional image could pass depth_image[...,0].

-
The result is returned as a linear array rather than in the natural shape as a cols × rows × 3 array. This loses information about the shape of the image that might be needed by some callers. There's a comment that says this is "for convenience" but if so, the caller can easily call numpy.reshape.

Vectorize

As always with Numpy performance problems, an important step is to scrutinize all the loops that run in the Python interpreter (the for and while loops) to see if they can be vectorized. A loop running inside Numpy is usually hundreds of times faster than the same loop running in native Python.

In this case it should be straightforward: instead of looping over pixels and doing some operations on each pixel, apply those operations to the whole image and let Numpy worry about looping over the pixels.

We only need the 0 plane on the third axis, as discussed in §1.3 above:

depth = depth_image[..., 0]

Let's make a mask array of valid pixels:

valid = (depth > 0) & (depth < 255)

The z-coordinates of the result can be computed by using numpy.where to distinguish between valid and invalid pixels:

z = np.where(valid, depth / 256.0, np.nan)

For the other coordinates of the result, we need row and column coordinates for each pixel, which can be generated using numpy.meshgrid:

rows, cols = depth_image.shape
c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)

Then we have:

x = np.where(valid, z * (c - self.cx) / self.fx, 0)
y = np.where(valid, z * (r - self.cy) / self.fy, 0)

The coordinate arrays can be stacked using numpy.dstack to produce the result:

return np.dstack((x, y, z))

If the caller really needs a linear array, they can flatten it using numpy.reshape.

Revised code

Putting this together:

def point_cloud(self, depth):
    """Transform a depth image into a point cloud with one point for each
    pixel in the image, using the camera transform for a camera
    centred at cx, cy with field of view fx, fy.

    depth is a 2-D ndarray with shape (rows, cols) containing
    depths from 1 to 254 inclusive. The result is a 3-D array with
    shape (rows, cols, 3). Pixels with invalid depth in the input have
    NaN for the z-coordinate in the result.

    """
    rows, cols = depth.shape
    c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)
    valid = (depth > 0) & (depth < 255)
    z = np.where(valid, depth / 256.0, np.nan)
    x = np.where(valid, z * (c - self.cx) / self.fx, 0)
    y = np.where(valid, z * (r - self.cy) / self.fy, 0)
    return np.dstack((x, y, z))

I find that this is about fifty times as fast as the original code.

Code Snippets

depth = depth_image[..., 0]

valid = (depth > 0) & (depth < 255)

z = np.where(valid, depth / 256.0, np.nan)

rows, cols = depth_image.shape
c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)

x = np.where(valid, z * (c - self.cx) / self.fx, 0)
y = np.where(valid, z * (r - self.cy) / self.fy, 0)

Context

StackExchange Code Review Q#79032, answer score: 19

Revisions (0)

No revisions yet.