HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Vectorizing a pixel-averaging operation in Numpy

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
numpypixelaveragingoperationvectorizing

Problem

I am reading from a file containing some segments (irregular parcels of the image) and trying to average the entire segment to have one pixel value. This is the code I use:

band = band[:,:,0] #take the first band of the image 

for i in range(numSegments): #for every segment
    tx = band[segments==i]   #select all the pixels in segment
    avg = np.average(tx)     #average the values
    band[segments==i] = avg  #write the average back into the image


I am omiting some transformation steps and code for printing running time from the snippet.

This takes quite sometime to run for even one band. Almost 1000 seconds. I was wondering if there is a way to vectorize this operation to make it faster?

Data:

Segment2009: an image of all the segments in the image.

This is what the segments look like:

Bands: 3000x3000 pixels, float32 values.

Full context:

workFolder = '/home/shaunak/Work/ChangeDet_2016/SLIC/003_lee_m0_alpha'
bandlist=os.path.join(workFolder,'bandlist.txt')
configfile = os.path.join(workFolder,'config.txt')

segmentfile = os.path.join(workFolder,'Segments2009')

#%% Load the bands -- can refer to subfolders in the bandlist
files = utilities.readBandList(bandlist)
destinations = []
for f in files:
    destinations.append(f.split('.')[0]+"_SP."+f.split('.')[1])

(lines,samples,bands) = utilities.readConfigImSizeBand(configfile)
#%% Superpixel file
segments = np.fromfile(segmentfile,dtype='float32')
segments = np.reshape(segments,(lines,samples))
numSegments = int(np.max(segments))

#%% simple avg
for idx,f in enumerate(files):
    band = np.fromfile(f,dtype='float32').reshape((lines,samples))
    start = time.time()
    for i in range(numSegments):
        tx = band[segments==i]
        avg = np.average(tx)
        band[segments==i] = avg

    band.tofile(destinations[idx])


I am writing the values back to the original after averaging. It is not necessary, also not the most expensive part -- and helps me visualize the results bett

Solution

The function scipy.ndimage.measurements.mean takes a labelled array and computes the mean of the values at each label. So instead of:

for i in range(numSegments):
    tx = band[segments==i]
    avg = np.average(tx)
    band[segments==i] = avg


you can write:

segment_mean = scipy.ndimage.measurements.mean(band, segments, range(numSegments))
band = segment_mean[segments]


Though I should add that the last operation here (the reconstruction of band) seems quite wasteful to me: all the information you need is already in the array segments and the array segment_mean (which has just one value per segment). Why do you then need to reconstruct the full array, filling each segment with its mean value? Could you not refactor the subsequent processing to use segments and segment_mean directly?

Update: you clarified the question to explain that writing the mean values back to band was just for visualization and is not an essential part of your application. In that case, you just need the one call to scipy.ndimage.measurements.mean.

Code Snippets

for i in range(numSegments):
    tx = band[segments==i]
    avg = np.average(tx)
    band[segments==i] = avg
segment_mean = scipy.ndimage.measurements.mean(band, segments, range(numSegments))
band = segment_mean[segments]

Context

StackExchange Code Review Q#145407, answer score: 3

Revisions (0)

No revisions yet.