HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Selecting indices of a cell array that strcmpi-match the contents of another cell array

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
indicescellthestrcmpiarraymatchcontentsthatanotherselecting

Problem

I have a Matlab structure with several fields, 3 of which are interesting here:

elec.label %Nx1 cell-array of strings
elec.chanpos %Nx3 matrix of doubles
elec.elecpos %Nx3 matrix of doubles


The label contains unique names of channels, while the chanpos and elecpos contain coordinates, in such a way that elec.label(1) has the name of the channel one, and elec.chanpos(1,:) and elec.elecpos(1,:) have the coordinates of channel one.

What I need is to create a similar structure, which is a subset of elec and contains labels and the two coordinate fields, and is based on a list of labels that I have. The labels I have do not necessarily match capitalization used in the elec.label array.

Here's how I implemented this:

list46={'fpz';'afz';};%Real list is really long, so I truncated it.
INDX46=[];
for i=1:length(elec.label)
    for j=1:length(list46)
        if(strcmpi(list46(j),elec.label(i)))
            INDX46=[INDX46; i;];
        end
    end
end

elec46.chanpos=elec.chanpos(INDX46,:);
elec46.elecpos=elec.elecpos(INDX46,:);
elec46.label=elec.label(INDX46,:);


The implementation I did, to me, feels as if I'm trying to program C using Matlab. I'm looking for a more efficient or at least more Matlabesque way to do this.

Solution

Appending to an array

INDX46=[INDX46; i;]; is a very slow way of appending to a vector. Instead, do INDX46(end+1) = i;.

When appending to the matrix in this preferred way, MATLAB actually extends the storage of the vector. MATLAB will double the underlying memory block size so that repeated appending will be optimal (O(log n) rather than O(n)). That is, most of the loop iterations the data will not need to be copied to extend the array.

In contrast, the first method of appending actually creates a new array, and copies the old array and the new value into it. The interpreter does not use the optimal O(log n) method of extending the array.

See this Q&A on Stack Overflow for an experiment that demonstrates the above.

Finding strings

There's no data in the OP that can be easily used to test modifications, but as I see it, the double loop is comparing each of the elements of list46 to each of the elements of elec.label to find the indices in the latter that contain an element of the former. This loop can be sped up greatly by first sorting alphabetically the two lists. With two sorted lists, one need to go through both lists only once to find all matches. The algorithm goes from O(nm) to O(n+m), with n and m the lengths of the two arrays.

MATLAB has a function ismember that does just this:

INDX46 = ismember(elec.label, list46);

Context

StackExchange Code Review Q#158110, answer score: 2

Revisions (0)

No revisions yet.