HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Counting Words in Files - MATLAB style

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
countingmatlabwordsstylefiles

Problem

For my Matlab class I'm taking, I was given the task to write a function ReadAndCountWords that takes in the name of a text file (specifically from this zip file) as an input argument and then prints out the words contained in that file in order of how many times the word occurs. The function doesn't have to produce any output through output arguments. A call to the function might produce a result like this:

`>> ReadAndCountWords('Speeches/Abraham_Lincoln_The_Gettysburg_Address.txt');
All words:
word: that count: 13
word: the count: 11
word: we count: 10
word: here count: 8
word: to count: 8
word: a count: 7
word: and count: 6
word: for count: 5
word: have count: 5
word: it count: 5
word: nation count: 5
word: of count: 5
word: dedicated count: 4
word: in count: 4
word: this count: 4
word: are count: 3
word: cannot count: 3
word: dead count: 3
word: great count: 3
word: is count: 3
word: people count: 3
word: shall count: 3
word: so count: 3
word: they count: 3
word: us count: 3
word: who count: 3
word: be count: 2
word: but count: 2
word: can count: 2
word: conceived count: 2
word: dedicate count: 2
word: devotion count: 2
word: far count: 2
word: from count: 2
word: gave count: 2
word: living count: 2
word: long count: 2
word: men count: 2
word: new

Solution

Rather than going through all of labels looking for the biggest in this line:

count = histc(labels, 1:max(labels))


you can pick this number off directly with numel(labels):

count = histc(labels, 1:numel(labels))


Alternatively, you can use accumarray:

count = accumarray(labels,1);


On this line in the loop

if(~isempty(words{i}) && ~any(strcmp(stopData, words{i})))


scanning through the stopData list on every iteration is expensive. Instead, you could use intersect to filter out the stopData before this print loop.

Rather than exist to see if a variable has been passed in,

if (exist('stopFile', 'var'))


I prefer to use nargin.

if (nargin < 2)

Code Snippets

count = histc(labels, 1:max(labels))
count = histc(labels, 1:numel(labels))
count = accumarray(labels,1);
if(~isempty(words{i}) && ~any(strcmp(stopData, words{i})))
if (exist('stopFile', 'var'))

Context

StackExchange Code Review Q#69547, answer score: 6

Revisions (0)

No revisions yet.